Collect images and metadata from phil.cdc.gov the Public Health Image Library at the Center for Disease Control
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
examples
notes
scripts
COPYING
PERMISSION
README.mkd
data_storer.py
data_transport.py
imginfo.py
parser.py
scraper.py
searchtype_query.py
start.py

README.mkd

DEPRECATED.

This has been replaced with this: [https://github.com/gameguy43/usable_image_scraper]

The remainder of this source should only be used for reference, the above is a better implementation of the same thing.

Collect Public Health Images

Requirements

These scripts require the following modules

  • start.py
    • import Queue
    • import threading
    • import traceback
    • import urllib
    • import urllib2
  • scraper.py
    • import cookielib
    • import string
  • data_storer.py
    • import sqlalchemy
  • imginfo.py
    • import Image # requires PIL 'python image library'

Files

  • start.py
    • command script that imports scraper.py, parser.py and data_storer.py
    • contains configuration GLOBALs for directory stucture
    • cdc_phil_scrape_range(start, end) controls what ID range the script will collect