-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Technical approach metadata #13
Comments
All great. Only, as mentioned in #12, the seeding of the new table should include locations that have no coordinates. Do you always know the originating country? If not, then a special code (XX?) should be defined for unclear cases. |
Another question that has come up via @maschu09 :
@sethvincent or @olafveerman (and @jflasher for good measure) will be better suited to answering this q, but my quick take: I am guessing that it won't be feasible to re-write the 400 million data points in the S3 buckets, but a) you will be able to query the API* for this output and b) for older data there will be a fairly painless way that the user will be able to mesh the locations with the archived data. *Separate from this project, but on the order of the next few months, we are working to no longer have a 90-day restriction on the API. |
Also, on:
Generally speaking, the country is usually defined, but you can imagine some scenarios where we don't know - or the idea of applying 'country' is not fitting, so I agree we should designate a special code for cases where that is unclear. To @maschu09's suggestion, 'XX' looks like the standard, so I think we should just go with that. |
@vgeorge @olafveerman I'm wondering if we actually need to change |
@jflasher I believe you are right. The Athena query groups measurements by coordinates and parameter id, using the queries from this file: https://github.com/openaq/openaq-api/pull/394/files#diff-46368f02f56796a118231baa2c760aaf If there is a duplicated measurement, there will be no effect in id generation. |
This document describes how we plan to add the ability to store metadata about stations in OpenAQ, and subsequently query data in the platform by the metadata. We’ve come up with an approach that should:
A description on the structure of the Station ID is here: #12
Approach
Overall, we propose to store the metadata in a table
locations
. When performing any query based on location or its metadata (eg. all rural stations within a country), the API will:locations
table and get a list of relevant locationslocations
to query themeasurements
table.This approach requires us to:
locations
measurements
openaq-fetch
that checks for the uniqueness constraintNew table: locations
We propose to store all the metadata in a - to be created -
locations
table in the database. Endpoints that rely on location will get data from this table, instead of aggregating it from themeasurements
table.Each record will consist of a unique ID, the coordinates, and a
jsonb
field with the metadata.Seeding the table
This table will be populated with the unique set of locations found in the
measurements
table. Uniqueness is determined on the coordinates.Rewrite endpoints
The following endpoints will need to have (part of) their logic rewritten. The response from each of these endpoints will be the same, except for additional metadata.
/locations
locations
table, instead ofmeasurements
POST
methodPATCH
method/measurements
locations
table/cities
- controllerlocations
table/countries
locations
table/latest
- controllerlocations
tableAdding new locations
When a measurement is ingested for a new location, a new entry needs to be added in the
locations
table. Two potential ways to implement this:openaq-fetch
- when checking for uniqueness in the fetch process, perform a lookup in thelocations
table. When thelocations
table doesn’t contain a record with those coordinates, create a new one.locations
table. This could be triggered by the same process that starts the Athena jobs.Uniqueness constraints
openaq-fetch
Before inserting data into the database,
openaq-fetch
checks for the uniqueness of the measurement. If a measurement with the same location, timestamp and pollutant already exists in the database, it will not insert it.We need to rewrite this logic, and make sure it checks for uniqueness on the coordinates.
Open questions:
measurements
table. Deduping attributes likelocation
,city
,country
, andcoordinates
.The text was updated successfully, but these errors were encountered: