Web conference notes, 2019.08.08

Attendees

Thank you for the report @rf-, do we need an emergency release? Or part of 0.4.0?

Consensus that we can wait for 0.4.0.

Action Item: Brady Law from Lyft will propose PR of log-rotation idea, target readiness to discuss for next call 8/22.

Current major use-cases of APIs:

Nearish real-time (what happened last hour? last day?)
True historical backfill (what happened last week, last month, last 6 months, etc.)
Comparison (what happened last weekend vs. a couple weekends ago?)

Lime: long timescale queries (e.g. far back in time) could theoretically be cached.

Lyft:

difficult to predict how we will be queried, especially in the backfill case.
do we want to design around use-cases (current) or around the data itself, and what makes sense for storage/serve
Brady will propose PR of log-rotation idea, target readiness to discuss for next call 8/22.

Generally, want to make API more performant and easier to implement over longer time ranges.

Solution options include:

Fix query window to some interval (e.g. UTC hour) - model like a rotating logfile
- query for "active log" is subject to change
- backwards queries will be for static data
Live vs. Historical / Hot vs. Cold feeds
- e.g. a timepoint in the past where the data universe becomes "clipped", and more recent times behave as now

Ride Report:

no problems with caching. But what happens if we query across a boundary? E.g. caching hours, but want last hour and half?
- Lyft: query across both hours, assumption is a regular query occuring on a normalish schedule.
log-rotation model: it would take some time to do the rotation

Remix:

Variety of different windows. Backfills usually go daily, but sometimes down to the hourly.
Caching semantics don't necessarily need to be exposed to client?
Hot vs. Cold model seems to make sense
Would be best to make this explicit in the query itself

Bird:

With log rotation model, potential issue with large trip payloads in high-traffic cities (dense route objects)
Probably need to consider trips separately from status_changes in these caching conversations

Shared Streets:

Long Beach:

Optional API where consumer can specify intent? E.g. "here comes a request for 12, one-month blocks"

LA: is anyone using device_id or vehicle_id query params?

No one on the call is using these query params
"city" sharding is already encoded in the token / URL, so aggregators may need to make multiple requests for given period for each geography.

Louisville, similar option:

Agenda set from active issues/PRs since last convening - keep the comments coming!
Cities working toward 0.3.0, LA and Santa Monica testing and close to deployment in next couple weeks

omf-mds-github-footer