Skip to content
Word embedding models and text data from charter school websites for workshop and hackathon of TextXD 2018 at BIDS, UC Berkeley.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Word Embeddings: Workshop and Exploration of Charter Schools

This repository includes a workshop (more info below) introducing word embedding models as well as hack session starter code for loading and exploring word embedding models with charter school data. Some data are contained in the repo; others will be linked into the Jupyter instance we'll set up to start the workshop. The charter school data come from author Jaren Haber's web-scraping of charter school websites, and the embeddings were created in the word2vec implementation in gensim. The repository is prepared for TextXD 2018 ( at the Berkeley Institute for Data Science (BIDS), UC Berkeley.

Introduction to word embeddings (workshop)


This one-hour workshop introduces word embeddings in Python and explores the features produced through the word2vec model. We'll mainly use the Akkadian ORACC corpus, put together by Professor Niek Veldhuis, UC Berkeley Near Eastern Studies. We'll also look briefly at a Word2Vec model trained on the ECCO-TCP corpus of 2,350 eighteenth-century literary texts made available by Ryan Heuser.

Learning Goals

  • Learn the intuition behind word embedding models (WEMs)
  • Learn how to implement a WEM using the gensim implementation of word2vec
  • Explore a corpus you've probably never seen before
  • Think through how visualization of WEMs might help you explore your corpus
  • Implement text analysis on a non-English language


All are welcome! You don't need to know how neural nets work or be a Python expert to benefit from this workshop. We'll focus on the concepts behind word embeddings more than the specific syntax. This workshop will be most useful to people who have some familiarity with Python but have never done word embeddings before.


If you notice a problem with these materials, please make an issue describing the problem. Collaboration and transparency are worth everyone's time!

Workshop leader

  • Jaren Haber


You can’t perform that action at this time.