Skip to content
/ quipu Public

Python code to retrieve metadata and thumbnails from digital repositories and collections on the subject Chinese Exclusion Act

Notifications You must be signed in to change notification settings

kllhwang/quipu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This Python 3.4 script is an attempt to build a simple web page of primary sources for a single subject. It queries resources in digital archives and special collections to retrieve metadata. It also downloads the thumbnails and URLS for the resources. This code is written to specifically find materials on the Chinese Exclusion Act.

The goal is to test possible methods for creating a subject-based tool for researchers and also to test the reliability of the metadata obtained in using digital repository APIs and by scraping web sites. The metadata is relatively reliable, but not perfect, especially dates.

RESOURCES QUERIED
The Digital Public Library of America (DPLA)
Calisphere (images only)
The California State Parks Museum Collections

PRIMARY FILES PRODUCED
A JSON dictionary with normalized metadata for each object;
A raw webpage (HTML) sorted by date to troubleshoot the proper download of metadata and thumbnails;
A file of subject terms collected including a count of the objects where the term appears.

SECONDARY FILE PRODUCED
A running file that dumps metadata for an object once processed. This is to assist in troubleshooting problem areas when running the script.

WHAT YOU NEED TO RUN THIS CODE
A Python interpreter (Python 3.4)
An API key for DPLA
All the Python modules in the import line (except ‘my_file’)
A folder in the same directory as the script called “thumbnails”

The resulting page with contextualizing information and additional style formatting can be viewed at the following URL:
http://pfch.nyc/quipu/project.html

HOW THIS CODE CAN BE REPURPOSED MEANINGFULLY
This code is specifically written for the subject Chinese Exclusion Act. It may work for another subject, however, if you would like to use this script to begin finding resources and collecting metadata/thumbnails for your own subject, you will definitely need to change certain parts of this script:

*** Code and metadata should be checked, if you plan to use it ***
*** DPLA is constantly expanding its base of content providers, which will effect the ability of this script to retrieve all metadata ***

README last updated: 12/21/2015

About

Python code to retrieve metadata and thumbnails from digital repositories and collections on the subject Chinese Exclusion Act

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published