Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Binder

Natural language processing: An introduction in Python

A workshop with the Massive Data Institute, Georgetown University

Overview

This workshop will equip newcowers to natural language processing or NLP (with some Python know-how) with a foundation for applying NLP methods in their work. The focus is on common steps in an NLP research workflow and user-friendly implementations of popular packages and methods.

We will first go through the common “preprocessing recipe” used as for a variety of applications and NLP techniques. This includes: a) tokenization; b) removing stopwords, punctuation, and numbers; c) stemming/lemmatizing words; d) calculation of word frequencies / proportions; and e) part of speech tagging. We will then go over simple dictionary methods (including sentiment analysis) using a bag-of-words approach.

For a recorded introduction to NLP and text preprocessing, watch my talk here on YouTube (58 mins). You can also see the slides under the day-1/ folder.

Workshop goals

  • Build intuitions about opportunities and limitations for using text as data
  • Understand at a high-level:
    • how a few primary NLP methods work
    • what kinds of questions they answer
    • how to design and implement an NLP project
  • Gain practice with:
    • preprocessing text data
    • common steps in NLP
    • dictionary methods
    • NLTK and Scikit-learn
  • Acquire resources for further learning

Prerequisites

We will get our hands dirty implementing basic natural language processing tools and methods. To follow along with the code—which is the point—will need some familiarity with Python and Jupyter Notebooks. If you haven't programmed in Python or haven’t used Jupyter Notebooks, please do some self-teaching before this workshop using resources like those listed below.

Getting started & software prerequisites

For simplicity, just click the "Launch Binder" button (at the top of this Readme) to create a virtual environment ready for this workshop. It may take a few minutes; if it takes longer than 10, try again.

If you want to run the code on your computer, you have two options. You could use Anaconda to make installation easy: download Anaconda . Or if you already have Python 3.x installed with the full list of libraries listed under requirements.txt, you're welcome to clone this repository and follow along on your own machine. You can also install all the necessary packages like so:

pip3 install -r requirements.txt

Open-Access, Online Resources on Python and NLP

Contributing

If you spot a problem with these materials, please make an issue describing the problem.

Acknowledgments


MDI logo

About

An introduction to Natural Language Processing for NLP beginners with some Python know-how. Created for GU's Massive Data Institute in fall 2020 by Jaren Haber, PhD

Resources

License

Releases

No releases published

Packages

No packages published