You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many of the tables I'm interested in are updated monthly, weekly, even daily, and they're often snapshots exported from database tables that support operational systems (eg records correspond to some workload and are updated as the workload is worked). By retrieving a data table and comparing it against prior pulls of that table, it's possible to identify which records are new as well as which existing records were changed/updated. To avoid missing an update, it's necessary to do this retrieval+comparison for every distinct export of the table from its source system, but most public data systems don't indicate when the next update will happen (although this cadence can often be reliably deduced by checking at periodic intervals), so it's often necessary to check more often than is necessary.
Data tables can be pretty large (often in excess of 1GB), so it's both rude and expensive to download and ingest the table more frequently than is needed. Fortunately, the main data tables this project uses are served via Socrata's data platform, which provides an API for checking table metadata which includes the time that the data was last updated, which can quickly avert the need to execute an unnecessary data pull. And when it is necessary to pull data it would be ideal to only ingest new or updated records.
The text was updated successfully, but these errors were encountered:
Many of the tables I'm interested in are updated monthly, weekly, even daily, and they're often snapshots exported from database tables that support operational systems (eg records correspond to some workload and are updated as the workload is worked). By retrieving a data table and comparing it against prior pulls of that table, it's possible to identify which records are new as well as which existing records were changed/updated. To avoid missing an update, it's necessary to do this retrieval+comparison for every distinct export of the table from its source system, but most public data systems don't indicate when the next update will happen (although this cadence can often be reliably deduced by checking at periodic intervals), so it's often necessary to check more often than is necessary.
Data tables can be pretty large (often in excess of 1GB), so it's both rude and expensive to download and ingest the table more frequently than is needed. Fortunately, the main data tables this project uses are served via Socrata's data platform, which provides an API for checking table metadata which includes the time that the data was last updated, which can quickly avert the need to execute an unnecessary data pull. And when it is necessary to pull data it would be ideal to only ingest new or updated records.
The text was updated successfully, but these errors were encountered: