Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ebdata scrapers should be runnable as scripts, and provide a convenient way to load their schemas #234

Open
slinkp opened this issue Sep 28, 2012 · 1 comment

Comments

@slinkp
Copy link
Contributor

slinkp commented Sep 28, 2012

It would be much easier to document how to run, and load schemas for, the scripts in ebdata/scrapers if I could tell users to just do something like this hypothetical terminal session:

$ flickr_retrieval --help
Usage: flickr_retrieval [options] [commands]

Options:
  -h, --help       show this help message and exit
  --schema=SCHEMA  Slug of schema to use when retrieving. Default is 'photos'.
  -f, --force      With the load-schema command, create the schema even if it already exists.

Commands:
  run              Retrieve photos.
  load-schema      Create the 'photos' schema. Will exit if it already exists,
unless you also specify `--force`. 

$ flickr_retrieval load-schema
Loading /home/pw/builds/openblock/builds/20110519/src/openblock/ebdata/ebdata/scrapers/general/flickr/photos_schema.json
Installed 5 object(s) from 1 fixture(s)

$ flickr_retrieval run
INFO list_detail: update() in <class '__main__.FlickrScraper'> started
INFO newsitem_list_detail: Created NewsItem photos: 10084 (total created in this scrape: 1)
INFO newsitem_list_detail: Created NewsItem photos: 10085 (total created in this scrape: 2)
...

If all our scrapers followed that command-line API, it would be pretty nice.

As it is, we have to document how to find where ebdata is installed (which differs depending on how you installed it); find the relevant python script; run it with the right python (i.e. have your virtualenv activated); oh and make sure you've done django-admin.py loaddata path/to/whereever/the/schema/lives. And the script and schema fixture don't have 100% consistent naming conventions.

THat is a lot of things that can be got wrong and confuse someone who isn't experienced with python packaging and so forth.

This would be straightforward to fix, but I don't have time at the moment.

@slinkp
Copy link
Contributor Author

slinkp commented Sep 28, 2012

Ticket imported from Trac:
http://developer.openblockproject.org/ticket/241
Reported by: slinkp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant