Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A Zotero extension for analysis and visualization in the digital humanities.
JavaScript Python Shell

This branch is 202 commits behind master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
chrome
.gitignore
README.md
chrome.manifest
install.rdf

README.md

Paper Machines

Overview

Paper Machines is an open-source extension for the Zotero bibliographic management software. Its purpose is to allow individual researchers to generate analyses and visualizations of user-provided corpora, without requiring extensive computational resources or technical knowledge.

Prerequisites

In order to run Paper Machines, you will need the following:

Usage

To begin, right-click (control-click) on the collection you wish to analyze and select "Extract Texts for Paper Machines." Once the extraction process is complete, this right-click menu will offer several different processes that may be run on a collection, each with an accompanying visualization.

Word Cloud

Show word frequency as a function of size. An oft-maligned, but still arguably useful way to get a quick impression of the most common words in your collection. After it is generated, it will appear in the Tags pane of Zotero.

Phrase Net

Finds phrases that follow a certain pattern, such as "x and y," and displays the most common pairings. This method is derived from a Many Eyes visualization).

Geoparser

Generates a map linking texts to the places they mention, filtered by time. The underlying functionality is based on Pete Warden's geodict. NOTE: you must download the "geodict" version for this, as it adds an extra 70 megs to the download.

Topic Modeling

Shows the proportional prevalence of different "topics" (collections of words likely to co-occur) in the corpus over time, highlighting spots where topics are more common. This uses the MALLET package to perform latent Dirichlet allocation, and by default displays the 20 most "coherent" topics, based on a metric devised by Mimno et al.

Acknowledgements

Thanks to Google Summer of Code for funding this work, and to Matthew Battles and Jo Guldi for overseeing it. My gratitude also to the creators of all the open-source projects upon which this work relies:

Something went wrong with that request. Please try again.