'Continuous Processing' with Data Packages
When storing your data in Data Packages, it is considered good
practice to store scripts for updating, processing, or analyzing your
data in a directory called
scripts/ placed at the root of your Data
Package. I've written a tutorial to show how to achieve continuous
processing: that is, the delivery of updated data every time
something changes, either in the source data or the processing code.
Depending on the timeliness of your dataset, you'll want to
periodically run update scripts stored in your
but what if you don't want to run the update script of your Data
Package by yourself? Instead, why not let
Travis CI do it for you?
If your Data Package already...
- has scripts that download the source data, cleans it or reformats it into a nice interoperable format
- relies on
maketo run the scripts
- has tests to validate the data
...then you're ready to go to the next level of automation! Here's a tutorial to enable regular updates of the data with Travis CI.
It's very well suited for small data (less then 300 MB) and when the processing step is short (i.e. less than 10 minutes). This makes this workflow perfect for Data Packages!
Read the tutorial to find out more!