HDX collector for the UNDP Human Development Report Office API.
hdxscraper-hdro operates in the following way:
- Downloads
Human Development Statistical
json data api - Filters for data within a given data range or, if none is given, within the past year
- Places the resulting data into a database table
With hdxscraper-hdro, you can
- Save HDRO Data to an external database
- Create CKAN datasets with externally generated CSV files
- Update resources previously uploaded to CKAN with new metadata
hdxscraper-hdro has been tested on the following configuration:
- MacOS X 10.9.5
- Python 2.7.10
hdxscraper-hdro requires the following in order to run properly:
- Python >= 2.7 (MacOS X comes with python pre-installed)
local
(You are using a virtualenv, right?)
git clone https://github.com/reubano/hdxscraper-hdro.git
pip install -r requirements.txt
manage setup
ScraperWiki Box
rm -rf tool
git clone https://github.com/reubano/hdxscraper-fao.git tool
cd tool
make setup
local
manage run
ScraperWiki Box
cd tool
source venv/bin/activate
screen manage -m Scraper run
# Now press `Ctrl-a d`
The results will be stored in a SQLite database scraperwiki.sqlite
.
view all available commands
manage
upload to production site
manage upload
upload to staging site
manage upload -s
update dataset on production site
manage update
update dataset on staging site
manage update -s
update dataset on production site
manage update
update dataset on staging site
manage update -s
cd tool
make update
source venv/bin/activate
screen manage -m Scraper run
# Now press `Ctrl-a d`
hdxscraper-hdro will use the following Environment Variables if set:
Environment Variable | Description |
---|---|
CKAN_API_KEY | Your CKAN API Key |
CKAN_PROD_URL | Your CKAN instance remote production url |
CKAN_REMOTE_URL | Your CKAN instance remote staging url |
CKAN_USER_AGENT | Your user agent |
If you would like to create collector or scraper from scratch, check out cookiecutter-collector.
pip install cookiecutter
cookiecutter https://github.com/reubano/cookiecutter-collector.git
- fork
- commit
- submit PR
- ???
- PROFIT!!!
- improve this readme
- add comments to confusing parts of the code
- write a "Getting Started" guide
- write additional deployment instructions (Heroku, AWS, [Digital Ocean](http://digitalocean.com/, GAE)
- follow this guide and see if everything works as expected
- if something doesn't work, please submit an issue
hdxscraper-hdro is distributed under the MIT License.