# Setup

In this series of workshops we will be using [Jupyter Notebook](http://jupyter.org/) as our Integrated Development Environment (<acronym title="Integrated Development Environment" style="border-bottom: 1px dashed #999; cursor: help">IDE</acronym>), and the [Anaconda](https://www.continuum.io/) distribution of Python. 

If you would prefer to use your own Python distribution and setup, please do. Just make sure you have a virtual environment with Jupyter Notebooks available. 

## Anaconda

Anaconda is a software application that packages together different Python versions and a package manager. There are other options out there, like using your distribution's Python version, or [`pyenv`](https://github.com/pyenv/pyenv), but we won't be covering the setup for those. Therefore, the first thing to do is to download and install the [Anaconda Navigator](https://www.continuum.io/downloads) graphical installer for Python 3.5 (or higher).


## Jupyter Notebooks
> The Jupyter Notebook App is a server-client application that allows editing and running “notebook“ documents via a web browser [...] In addition to displaying/editing/running notebook documents, the Jupyter Notebook App has a “Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to open notebook documents or shutting down their kernels.
>
> &mdash; <cite>[Jupyter/IPython Notebook Quick Start Guide](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)</cite>

In Jupyter, a kernel is the engine in which commands are run. In our case, the engine will be in Python, specifically, the IPython kernel.

Jupyter let's you write and evaluate code at a granular level without re-running scripts constantly and using a lot of print debugging. It also allows mixing in Markdown and HTML within your notebook, and so is a great way of presenting code and data analysis.

For getting started, you should go to the official [Jupyter documentation](http://test-jupyter.readthedocs.io/en/latest/index.html), the [starter guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/), or take a look on [how to run code](https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Running%20Code.ipynb), or [other interesting notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks). There are even [video tutorials available](https://www.youtube.com/watch?v=HW29067qVWk).

## Virtualenvs

In the Python world, a `virtualenv`, virtual environment, or just environment, is a way of managing isolated environments so each project can have its own dependencies without conflicts with other projects.

> A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. It solves the “Project X depends on version 1.x but, Project Y needs 4.x” dilemma, and keeps your global site-packages directory clean and manageable.
>
> &mdash; <cite>[The Hitchhiker’s Guide to Python!](http://python-guide-pt-br.readthedocs.io/en/latest/dev/virtualenvs/)</cite>

Anaconda has its own environment and package manager (`conda`), let's you easily set Python versions, and comes with many of the standard packages used in scientific computing. It also provides the concept of channels so you can easily install packages that are maintained by other people. We will be covering the instructions for setting up your Python environment using Anaconda.


## Setting up Jupyter using Anaconda in the command line

To set up an environment, in your shell or terminal, run (`$` means a shell command):

`
$ conda create -n name_of_your_environment python=x.y package1 package2
`

This creates an environment named `name_of_your_environment`, where the Python version is specified to `x.y`, and installs the packages `lib1` and `lib2` into the environment.

For example;

`
$ conda create -n data python=3.5 jupyter requests
`

Installs `jupyter` and `requests` in a `data` virtualenv using Python 3.5.

After you create the environment, run

`
$ source activate name_of_your_environment
`

Or

`
$ activate name_of_your_environment
`

Depending on whether you are on OSX or Windows to activate the environment.

Once you have a virtual environment running, just install `jupyter` and run `jupyter notebook` from the location where you want to store your notebook.

`
$ jupyter notebook
`

And go to http://localhost:8888/ in your browser.

If you need to install packages from other channels, just add `-c channel_name` to the `install`:

`
$ conda install -n name_of_your_environment -c channel_name package_maintained_by_someone_else
`

Often times, we will be using the channel `conda-forge` for more up-to-date or missing packages from the official Anaconda repository.

## Setting up Jupyter using Anaconda Navigator

First, launch the Anaconda-Navigator interface, go to "Environments," and create a new one with Python 3.5 (or higher) and click on it.

![Environment creation](images/anaconda-envs.gif "Environment creation")

Go to channels and add a new one called `conda-forge`, then click on "Update channels."

![Channels](images/anaconda-channels.gif "Channels")

Now go to the "Installed" packages and select "Not installed." Look for "jupyter" in the "Search Packages" box and mark it to install. Repeat the process for any other package you need, such as "requests" for example. Then click on "Apply," and you should see a list of packages before installing them. Click "OK."

![Packages](images/anaconda-packages.gif "Packages")

Once installation is finished, click on the green triangle and select "Open with Jupyter Notebook." A terminal window should popup. It's loading Jupyter. After a few seconds you should see the main interface of Jupyter showing the current directory contents.

![Launching Jupyter](images/anaconda-jupyter.gif "Launching Jupyter")

Navigate to where you want your notebooks stored, and then click on "New" and select "Python 3.5". Now you should be ready to start writing Python code in a new and clean notebook.

![Launching a Notebook](images/anaconda-notebook.gif "Launching a Notebook")

Now you should have all you need to start coding in Python with Anaconda and Jupyter Notebook. We still recommend you to have your code version controlled, preferably with a tool like `git`, to ease history revision and to avoid accidental code loss.

### Downloading necessary data

Throughout this workshop, we'll be using some large data files, including text corpora and pre-trained vectors. In order to save time during the workshop and not depend too heavily on the conference wi-fi, it would be good to download the data in advance. 

Once you've created your virtual environment and activated it, with Jupyter Notebook running, run the cells below. These will install necessary packages and download data files. 

In [None]:
#  Install dependencies listed in the requirements.txt file
!pip install -r requirements.txt

In [None]:
# Install Cython for use in compiling fasttext in part 3
!pip install Cython

In [None]:
# Download the various NLTK corpora and models we'll use
!python -m nltk.downloader all

In [None]:
# Unzip corpus into targe dir
!unzip embeddings/w2v_gnews_small.zip -d embeddings

In [None]:
import os
glove_dir = "glove.6B"
glove_zip_file = "glove.6B.zip"
glove_files_url = "http://nlp.stanford.edu/data/wordvecs/glove.6B.zip"

if not os.path.isdir(glove_dir):
    !mkdir $glove_dir
    !curl --progress-bar -Lo $glove_zip_file $glove_files_url
    !unzip $glove_zip_file -d $glove_dir 

In [None]:
# Download wiki vectors
filename = 'wiki.so.vec'
if not os.path.isfile(filename):
    !echo "Downloading $filename"
    !curl --progress-bar -Lo $filename https://s3-us-west-1.amazonaws.com/fasttext-vectors/$filename

In [None]:
# Download corpus
filename = 'wiki.simple.zip'
if not os.path.isfile(filename):
    !echo "Downloading $filename"
    !curl --progress-bar -Lo $filename https://s3-us-west-1.amazonaws.com/fasttext-vectors/$filename
    !unzip $filename

In [None]:
# Download and prepare pre-trained vectors
import gensim.downloader as pretrained

pretrained.load('word2vec-google-news-300', return_path=True)
# return_path avoids to load the model in memory

for filename, dirname in (('eng-fiction-all_sgns.zip', 'fiction'), ('coha-word_sgns.zip', 'coha')):
    if (not os.path.isfile(filename) and not os.path.isdir(dirname)):
        print(f'Downloading {filename}')
        !curl --progress-bar -Lo $filename http://snap.stanford.edu/historical_embeddings/$filename
    if (os.path.isfile(filename) and not os.path.isdir(dirname)):
        print(f'Uncompressing {filename}')
        !unzip -q -o $filename -d $dirname