Python code to retrieve metadata and thumbnails from digital repositories and collections on the subject Chinese Exclusion Act
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
exclusion.html
exclusion_master_parse.py
master_final
master_subjects

README.md

This Python 3.4 script is an attempt to build a simple web page of primary sources for a single subject. It queries resources in digital archives and special collections to retrieve metadata. It also downloads the thumbnails and URLS for the resources. This code is written to specifically find materials on the Chinese Exclusion Act.

The goal is to test possible methods for creating a subject-based tool for researchers and also to test the reliability of the metadata obtained in using digital repository APIs and by scraping web sites. The metadata is relatively reliable, but not perfect, especially dates.

RESOURCES QUERIED
The Digital Public Library of America (DPLA)
Calisphere (images only)
The California State Parks Museum Collections

PRIMARY FILES PRODUCED
A JSON dictionary with normalized metadata for each object;
A raw webpage (HTML) sorted by date to troubleshoot the proper download of metadata and thumbnails;
A file of subject terms collected including a count of the objects where the term appears.

SECONDARY FILE PRODUCED
A running file that dumps metadata for an object once processed. This is to assist in troubleshooting problem areas when running the script.

WHAT YOU NEED TO RUN THIS CODE
A Python interpreter (Python 3.4)
An API key for DPLA
All the Python modules in the import line (except ‘my_file’)
A folder in the same directory as the script called “thumbnails”

The resulting page with contextualizing information and additional style formatting can be viewed at the following URL:
http://pfch.nyc/quipu/project.html

HOW THIS CODE CAN BE REPURPOSED MEANINGFULLY
This code is specifically written for the subject Chinese Exclusion Act. It may work for another subject, however, if you would like to use this script to begin finding resources and collecting metadata/thumbnails for your own subject, you will definitely need to change certain parts of this script:

*** Code and metadata should be checked, if you plan to use it ***
*** DPLA is constantly expanding its base of content providers, which will effect the ability of this script to retrieve all metadata ***

README last updated: 12/21/2015