Paper Machines is an open-source extension for the Zotero bibliographic management software. Its purpose is to allow individual researchers to generate analyses and visualizations of user-provided corpora, without requiring extensive computational resources or technical knowledge.
In order to run Paper Machines, you will need the following:
- a corpus of documents (preferably with high-quality metadata)
- Python (download for Windows)
- Java (download)
To begin, right-click (control-click) on the collection you wish to analyze and select "Extract Texts for Paper Machines." Once the extraction process is complete, this right-click menu will offer several different processes that may be run on a collection, each with an accompanying visualization.
Show word frequency as a function of size. An oft-maligned, but still arguably useful way to get a quick impression of the most common words in your collection. After it is generated, it will appear in the Tags pane of Zotero.
Finds phrases that follow a certain pattern, such as "x and y," and displays the most common pairings. This method is derived from a Many Eyes visualization).
Generates a map linking texts to the places they mention, filtered by time. The underlying functionality is based on Pete Warden's geodict. NOTE: you must download the "geodict" version for this, as it adds an extra 70 megs to the download.
Shows the proportional prevalence of different "topics" (collections of words likely to co-occur) in the corpus over time, highlighting spots where topics are more common. This uses the MALLET package to perform latent Dirichlet allocation, and by default displays the 20 most "coherent" topics, based on a metric devised by Mimno et al.
Thanks to Google Summer of Code for funding this work, and to Matthew Battles and Jo Guldi for overseeing it. My gratitude also to the creators of all the open-source projects upon which this work relies: