Skip to content
Materials for the "Fundamentals of Text Processing for User Generated Content" hands-on session at the Data Science Summer School hosted by École Polytechnique and the DATAIA Institute.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
colab Rename 06_Machine_Learning_DS3Text.ipynb to colab/06_Machine_Learning… Jul 11, 2019
data Create reddit_2019_05_100K.zip Jun 26, 2019
.gitignore
LICENSE Initial commit Jun 25, 2019
README.md Update README.md Jul 11, 2019

README.md

ds3 2019

Here you will find the teaching materials for the "Fundamentals of Text Processing for User Generated Content" hands-on session at the Data Science Summer School hosted by École Polytechnique and the DATAIA Institute.

What is the course about?

The course is designed as an introduction to the basics of natural language processing for analyzing unstructured, user-generated content. It is for beginners to the topic (and NLP in general), but it will be helpful to have basic knowledge of Python and a familarity with data science techniques.

Topics covered include:

  • text preprocessing in Python,
  • collecting your own data from Twitter and Reddit,
  • content analysis,
  • text embeddings, and
  • supervised learning with text data.

What materials are available here?

The slides can be found here. They mostly serve as a high-level introduction to the examples and exercies (in Colab notebooks), which are linked to from the slides themselves. Copies of the Colab notebooks can also be found in the folder called /colab in this repository.

Can I work through the material on my own?

If you didn't attend the tutorial, you can certainly work through the materials on your own (the Colab notebooks are designed to be readable and doable for individuals working at their own pace). The slides will guide you through the content. The notebooks are intendend to be worked through in order. Each one will have examples to view and 1 or 2 practice exercises to complete (with sample solutions).

You can’t perform that action at this time.