Code for getting and exploring the photogrammar data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
marc_records
src
LICENSE
README.md

README.md

photogrammar

Code for getting and exploring the photogrammar data.

For example, we download a list of all the photo ids (these uniquely define the urls for scraping the rest of the data, by running the following code:

python src/get_photo_ids.py

This creates a file pickle/all_urls.p, a python pickle file. Now we can run the code to download MARC records from the Library of Congress website for all photo ids in the all_urls.p file. This is done by:

python src/get_marc_records.py

When finished, there should be files in the marc_records directory, such as 'marc_recordsfsa1997000988.csv'. Now, to finish the first stage of the scrape, we download the image urls using a similar syntax:

python src/get_img_urls.py

Which will create text files in the directory 'img_url' such as 'img_url/fsa1997000987.txt' which contain the urls of the photo images.