Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Jupyter notebooks for digital humanities

En: Jupyter notebooks are useful for organizing and documenting code, and embedding code within scholarly arguments and/or pedagogical materials. The following list of notebooks for digital humanities purposes was sourced from Twitter in June 2019, but PRs with suggested additions are welcome! If you only want notebooks in English, search for "en".

De: Jupyter-Notizbücher eignen sich zum Organisieren und Dokumentieren von Code und zum Einbetten von Code in wissenschaftliche Argumente und / oder pädagogische Materialien. Die folgende Liste von Notizbüchern für die Digital Humanities wurde im Juni 2019 von Twitter bezogen, PRs mit Ergänzungsvorschlägen sind jedoch willkommen! Wenn Sie Notizbücher nur in Deutsch möchten, suchen Sie nach der Abkürzung "de".

Es: Los cuadernos Jupyter son útiles para organizar y documentar códigos, e incorporar códigos dentro de argumentos académicos y / o materiales pedagógicos. La siguiente lista de cuadernos para fines de humanidades digitales se obtuvo de Twitter en junio de 2019, ¡pero los RP con sugerencias de adiciones son bienvenidos! Si quieres cuadernos en español, busca en esta página por "es".

Fr: Les cahiers Jupyter sont utiles pour organiser et documenter le code, et pour incorporer du code dans des arguments scientifiques et / ou du matériel pédagogique. La liste suivante de cahiers à des fins de sciences humaines numériques a été extraite de Twitter en juin 2019, mais les PR avec les ajouts suggérés sont les bienvenus! Si vous voulez des cahiers en français, cherchez sur cette page l'abréviation "fr".

Research & projects

Course materials

  • (en/Python) Applied Data Analysis as taught at DHOxSS 2019: covers tidying data, visualization, modeling, and advanced applications of data analysis. By Giovanni Colavizza and Matteo Romanello. On Binder
  • (en/Python) Applied Natural Language Processing course at UC Berkeley: covers impact of tokenization choices on sentiment classification, distinctive terms using different methods, text classification, hyperparameter choices for classification accuracy, hypothesis testing, word embeddings, CNN and LTSM, social networks in literary texts, and more. By David Bamman.
  • (en/Python) Becoming a Historian course at UC Berkeley: notebook with introduction to Python using AHA job posting data, by Chris Hench.
  • (en/Python) Chinatown and the Culture of Exclusion course at UC Berkeley: using demographic data from the 20th-21st century, this module has students analyzing how a specific Chinatown, such as SF Chinatown, has changed over time. Students use some simple computational text analysis methods to explore and compare the structures of poems written on Angel Island and in Chinatown publications from the early 20th century. By Michaela Palmer, Maya Shen, Cynthia Leu, Chris Cheung, course taught by Amy Lee.
  • (en/Python) Data Arts course at UC Berkeley: notebooks looking at coincidence, correlation, and causation; and the evolution of social networks over time. Course by Greg Niemeyer.
  • (en/Python) Deconstructing Data Science course at UC Berkeley, by David Bamman, on Binder.
  • (en/Python) European Economic History course at UC Berkeley: notebooks related to the Industrial Revolution and the rise of the European economy to world dominance in the 19th century, emphasizing the diffusion of the industrial system and its consequences, the world trading system, and the rise of modern imperialism. Developed by Alec Kan, Beom Jin Lee, Anusha Mohan.
  • (en/Python) History data science connector course at UC Berkeley: various notebooks for analyzing historical data using data science methods.
  • (en/Python) Japanese Internment course at UC Berkeley: notebook for mapping the beginning and end coordinates of people who migrated from one location to another after being placed in internment camps. By Melanie Yu, Andrew Linxie, Nga Pui Leung, and Francis Kumar.
  • (en/Python) Literature and data course at UC Berkeley: a mix of readings and Jupyter notebooks that experiment with popular statistical methods that have recently gained visibility in literary study, and consider them as forms of “distant reading.” By Teddy Roland.
  • (de/Python) Seminar »Methoden computergestützter Textanalyse«: Das Seminar wendet sich diesen Möglichkeiten computergestützter Textanalyse zu. Neben der Diskussion der theoretischen und methodologischen Grundlagen geht es insbesondere um die praktische Anwendung der entsprechenden Verfahren. Am Beispiel der Programmiersprache Python soll gezeigt werden, wie sich konkretes Textmaterial aufbereiten, analysieren und interpretieren lässt. Von Frederik Elwert.
  • (en/Python) Sumerian text analysis course at UC Berkeley: intro to Python, how to find differences between texts based on their words, and how to visualize the results. By Jonathan Lin, Stephanie Kim, Erik Cheng, and Sujude Dalieh; course taught by Niek Veldhius.
  • (en/Python) Text analysis for graduate medievalists course at UC Berkeley: an introduction into parsing and performing text analysis on medieval manuscripts using Python. By Mingyue Tang, Sierra Blatan, Shubham Gupta, Tejas Priyadarshan, and Sasank Chaganty.

Learning Python

Text analysis

Using APIs


Data cleaning

  • (en/Python) EEBO-TCP full-text document cleaning: code to make EEBO-TCP texts more easily analyzed in Natural Language Processing (NLP), though most of the edits can be used on any text file. By Jamel Ostwald.
  • (en/Python) Data manipulation workshop: covers how to load in data into a Pandas DataFrame, perform basic cleaning and analysis, and visualize relevant aspects of a dataset, using a dataset of tweets. By Scott Bailey, Javier de la Rosa, Ashley Jester. (filled-in version)
  • (en/Python) Japanese text segmentation: uses the RakutenMA Python module to segment Japanese text, by Quinn Dombrowski.
  • (en/Python) Unicode to ASCII: notebook for converting Unicode text to ASCII, by Quinn Dombrowski.


  • (en/Python) Mapping Geographic Subjects using the HathiTrust Extracted Features Dataset: Retrieves a book-level dataset from the HathiTrust Extracted Features Dataset, "recreate" the book's text using token-frequency data (i.e. tokenPosCount), runs the text through a named entity recognition tagger (Stanford NER Tagger), separate out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, and maps the coordinates using Folium. By Patrick Burns.
  • (en/Python) Mapping with iPyLeaflet by Eric Kansa, part of Open Context Jupyter (on Binder)
  • (en/Python) Using NER to Map MARC Geographic Subject Headings: Retrieves a MARC record from the NYU Library LMS, runs the text of the MARC record through a named entity recognition tagger (Stanford NER Tagger), separates out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, maps the coordinates using Folium. By Patrick Burns.




  • (es/Python) Análisis chocométrico: al final de la conferencia HD Hispanicas 2019 hubo análisis chocométrico: 4 chocolates Lindt para probar, con diferentes porcentajes de cacao: 50, 78, 90 y 99%. El objetivo era saber a nivel personal y social qué porcentaje gusta más. Por José Calvo.
  • (en/Python) GLAM Workbench: over 50 notebooks with tools and examples to help you work with data from galleries, libraries, archives, and museums (the GLAM sector), focusing on Australia and New Zealand, by Tim Sherratt.
  • (en/Python & R) Notebook templates for Binder: containing all you need to set up jupyter notebooks for use with Mybinder. creates a docker file that it then launches for you in the Jupyter Notebook exectuable environment. By clicking on a button, you get a version of the repository as it currently exists. You would then be able to create new Python or R notebooks and run the code, writing your analysis and your code together. You could also open a terminal within Jupyter and use git to push any changes you make back to this repository. By Shawn Graham, on Binder.
  • (en/Python) Processing: example of integrating the Processing languages for sketches and visualizations into Python and Jupyter notebooks, by Shawn Graham, on Binder.
  • (en/Python & R) SPARQL and LOD: introduces SPARQL and Linked Open Data, by Shawn Graham, on Binder
  • (en/Python & R) sqlite: introduces some of the basic commands for querying and modifying a database using the Structured Query Language, SQL; illustrates writing a query into a 'dataframe', a table that you can then manipulate or visualize; shows how to load a sqlite database into R. By Shawn Graham, on Binder.
  • (en/Python) WARC Processing with Spark and Python: a brief guide for processing WARC (Web Archive) data using PySpark and warcio, by Ed Summers.


A collection of Jupyter notebooks in many human and computer languages for doing digital humanities. PRs welcome!






No releases published


No packages published