Learning from Text: Introduction to Natural Language Processing with Python
May 3, 2017
Michelle L. Gill, Ph.D.
Senior Data Scientist
Software Installation Instructions
Tested on Mac OS X 10.12 and Ubuntu 14.04
Download the Anaconda distribution for Python 3.6 (Python 2.7 will not work) from this link and configure your environment as described at the end of the installation process. This will install the following necessary libraries: Jupyter notebook, Numpy, Scipy, Pandas, Scikit-Learn, Matplotlib, and Seaborn.
With the above Anaconda environment activated, install the following additional libraries using the commands listed:
conda install -y -c anaconda gensim nltk conda install -y -c conda-forge textblob pip install pyldavis
Packages can also installed with
pipthe conda installation does not work.
Download the corpora associated with nltk using the following command from a terminal:
python -m nltk.downloader -d $HOME/nltk_data all
This will create a folder "nltk_data" in your home directory that is large (~ 4 GB) when expanded.
Download Google's pre-trained word2vec files from this link. Note that this file is also somewhat large (~ 1.5 GB). This file can be downloaded to a preferred location and left there.
Clone this GitHub repo. Note that the materials associated with this workshop are being updated, so this step should be performed (or updated) the evening before the workshop.