Skip to content
Intensive 1-week introduction to text mining with Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

title: Text Mining Bootcamp
place: 12.0.26 på KU Sønder Campus, Danmark
time: August 14-18/2017,  9 AM to 3 PM.
instructors: Peter Leonard (Yale University Library) & Kristoffer L. Nielbo (Interacting Minds Centre)


  1. Install the Anaconda distribution of Python for your OS
  2. Read chapters 1-6 of Automate the Boring Stuff with Python


  • Sweigart, A. (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press.


DAY 1: Programming with Python

Time Content Instructor
09:00-09:30 Welcome & Setup KLN
09:30-10:30 Text Analytics KLN
10:30-11:00 Analyzing Tabular Data KLN
11:00-11:30 Repeating Actions with Loops KLN
11:30-12:00 Storing Multiple Values in Lists KLN
12:00-13:00 Lunch *
13:00-13:30 Analyzing Data from Multiple Files KLN
13:30-14:00 Making Choices KLN
14:00-14:30 Creating Functions KLN
14:30-15:00 Finish KLN

DAY 2: From Print to Probability

Time Content Instructor
09:00-09:30 Welcome KLN
09:30-10:00 Reading Unstructured Data KLN
10:00-10:30 Cleaning & Segmentation KLN
10:30-11:00 Free Play KLN
11:00-11:30 Language Normalization KLN
11:30-12:00 Term Frequencies KLN
12:00-13:00 Lunch *
13:00-13:30 Dispersion and Distributions KLN
13:30-14:00 Vector Space Representations KLN
14:00-14:30 Project hour KLN
14:30-15:00 Project hour KLN

DAY 3: Time, Density, and Information

Time Content Instructor
09:00-09:30 Welcome KLN
09:30-10:00 Beyond Words KLN
10:00-10:30 Lexical Density KLN
10:30-11:00 Free Play KLN
11:00-11:30 Readability KLN
11:30-12:00 Information KLN
12:00-13:00 Lunch *
13:00-13:30 Sentiment vectors KLN
13:30-14:00 Sentiment vectors KLN
14:00-14:30 Project hour PL & KLN
14:30-15:00 Project hour PL & KLN

DAY 4: Latent Variables and (Multiple) Relations

Time Content Instructor
09:00-09:30 Welcome PL
09:30-10:00 Network Analysis: Introduction PL
10:00-10:30 Network Analysis: Textual/Literary Examples PL
10:30-11:00 Free Play: Brainstorming Network Projects PL
11:00-11:30 Network Analysis: Building a Dataset PL
11:30-12:00 Network Analysis: Tools - Gephi PL
12:00-13:00 Lunch *
13:00-13:30 Topic Modeling PL
13:30-14:00 Topics Modeling Hands-On PL
14:00-14:30 Project hour PL
14:30-15:00 Project hour PL

DAY 5: Classification and Associations
topics: classification, document similarity, and word embedding

Time Content Instructor
09:00-09:30 Statistical learning KLN
09:30-10:00 Classification: Introduction KLN
10:00-10:30 Representation ins
10:30-11:00 Validation KLN
11:00-11:30 Optimization KLN
11:30-12:00 Free Play KLN
12:00-13:00 Lunch *
13:00-13:30 Topic Modeling: Review PL
13:30-14:00 Word Embedding: Demonstrations PL
14:00-14:30 Word Embedding: Hands-On PL
14:30-15:00 Finish *
You can’t perform that action at this time.