Skip to content
Intensive 1-week introduction to text mining with Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DATA
brandes_data
code_sample
formats
resource
slides
topic_modeling
word_embedding
.gitignore
README.md
day_5_1.py
quickndirty.py
text_classifier.ipynb
tmbootcamp.py

README.md

title: Text Mining Bootcamp
place: 12.0.26 på KU Sønder Campus, Danmark
time: August 14-18/2017,  9 AM to 3 PM.
instructors: Peter Leonard (Yale University Library) & Kristoffer L. Nielbo (Interacting Minds Centre)
contact: kln@cas.au.dk

Preparation

  1. Install the Anaconda distribution of Python for your OS
  2. Read chapters 1-6 of Automate the Boring Stuff with Python

Literature

  • Sweigart, A. (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press.

Schedule

DAY 1: Programming with Python

Time Content Instructor
09:00-09:30 Welcome & Setup KLN
09:30-10:30 Text Analytics KLN
10:30-11:00 Analyzing Tabular Data KLN
11:00-11:30 Repeating Actions with Loops KLN
11:30-12:00 Storing Multiple Values in Lists KLN
12:00-13:00 Lunch *
13:00-13:30 Analyzing Data from Multiple Files KLN
13:30-14:00 Making Choices KLN
14:00-14:30 Creating Functions KLN
14:30-15:00 Finish KLN

DAY 2: From Print to Probability

Time Content Instructor
09:00-09:30 Welcome KLN
09:30-10:00 Reading Unstructured Data KLN
10:00-10:30 Cleaning & Segmentation KLN
10:30-11:00 Free Play KLN
11:00-11:30 Language Normalization KLN
11:30-12:00 Term Frequencies KLN
12:00-13:00 Lunch *
13:00-13:30 Dispersion and Distributions KLN
13:30-14:00 Vector Space Representations KLN
14:00-14:30 Project hour KLN
14:30-15:00 Project hour KLN

DAY 3: Time, Density, and Information

Time Content Instructor
09:00-09:30 Welcome KLN
09:30-10:00 Beyond Words KLN
10:00-10:30 Lexical Density KLN
10:30-11:00 Free Play KLN
11:00-11:30 Readability KLN
11:30-12:00 Information KLN
12:00-13:00 Lunch *
13:00-13:30 Sentiment vectors KLN
13:30-14:00 Sentiment vectors KLN
14:00-14:30 Project hour PL & KLN
14:30-15:00 Project hour PL & KLN

DAY 4: Latent Variables and (Multiple) Relations

Time Content Instructor
09:00-09:30 Welcome PL
09:30-10:00 Network Analysis: Introduction PL
10:00-10:30 Network Analysis: Textual/Literary Examples PL
10:30-11:00 Free Play: Brainstorming Network Projects PL
11:00-11:30 Network Analysis: Building a Dataset PL
11:30-12:00 Network Analysis: Tools - Gephi PL
12:00-13:00 Lunch *
13:00-13:30 Topic Modeling PL
13:30-14:00 Topics Modeling Hands-On PL
14:00-14:30 Project hour PL
14:30-15:00 Project hour PL

DAY 5: Classification and Associations
topics: classification, document similarity, and word embedding

Time Content Instructor
09:00-09:30 Statistical learning KLN
09:30-10:00 Classification: Introduction KLN
10:00-10:30 Representation ins
10:30-11:00 Validation KLN
11:00-11:30 Optimization KLN
11:30-12:00 Free Play KLN
12:00-13:00 Lunch *
13:00-13:30 Topic Modeling: Review PL
13:30-14:00 Word Embedding: Demonstrations PL
14:00-14:30 Word Embedding: Hands-On PL
14:30-15:00 Finish *
You can’t perform that action at this time.