# Getting Started

## Picking a class

Having decided that taking a Data Science course would be a good way to expand upon and improve my Python skills, I did some research and found some very positive reviews for a __[Python-based Data Analysis course](https://www.udacity.com/course/intro-to-data-analysis--ud170)__ from Udacity.  While Data Analysis is not Data Science, from what I understand, Data Science encompasses the Analysis component.  The truth is that I'm a bit fuzzy on the semantics between the 2 disciplines, especially as they seem to share many of the same characteristics.  The __[course overview](https://www.class-central.com/mooc/4937/udacity-intro-to-data-analysis)__ about the Udacity Intro to Data Analysis class sounded good to me and I settled on it as my starting point. 

## Familiarizing myself with the ecosystem

Any discipline has a preferred set of tools that the practitioners wield in pursuit of their craft.  The Python sect of the Data Science discipline is no exception.  A few tools that seemed to crop up over and over in my research were:
- iPython/Jupyter (and their corresponding "notebooks")
- NumPy
- Matplotlib
- Pandas
- Seaborn

Of these, the only tool I had some passing familiarity with was Matplotlib, which I had used in the past at 3Tier to create __[color bars like these](https://energytransition.org/files/2015/04/average_wind_speed_uk_vs_germany.jpg)__ to describe custom heat maps I had created from wind speed and solar reflectance data.  I was excited to learn something new, but how was I going to get all these things installed?

## Anaconda to the rescue

For the past several years, my primary personal computer has been a Chromebook Pixel.  I love that machine!  It has an amazing display, it's very fast, and it has exceptional battery life.  It does of course have some trade-offs, the main one being that you can only install applications from the Chrome Web Store.  Well, it turns out that's not entirely correct.  You can run the machine (and any other Chromebook, I assume) in "Developer Mode", which then gives you access to a shell.  From that shell, people have discovered all sorts of ways to install applications and utilities in the same manner that you would install them from a command line on a Linux machine.  In fact, one of the things I played with soon after discovering this capability was __[Chromebrew](http://skycocker.github.io/chromebrew/)__, which is an analog to the Mac "Homebrew" system.  While Chromebrew made the installation of certain things convenient, like Vim and base Python, many Python libraries have complex dependencies that I had trouble overcoming.  It turns out that Conda was developed specifically to address this problem.

<p>The best descriptions I've read of how Conda works and why it was created are in blog posts by __[Jake VanderPlas](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/)__ and __[Travis Oliphant](http://technicaldiscovery.blogspot.com/2013/12/why-i-promote-conda.html)__.  Both of these guys know what they're talking about and are worth looking up if you're interested in Data Science.  Jake is the Director of Research at the UW Data Science Institute and has written a book that I intend to buy, [the Oreilley Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do).  Travis, among other things, is the __[primary creator](https://en.wikipedia.org/wiki/Travis_Oliphant)__ of NumPy.  In a nutshell though,  Conda is a package manager that allows you to easily install all of the tools that are needed for Python Data Science work.  That's a sufficient description for our purposes now.  Anaconda, which is what we'll be installing, is a distribution that contains conda, as well as some other things, like Jupyter, which is a browser-based interactive notebook for programming, mathematics, and data science.  Jupyter makes working with the Udacity class materials much easier and I think is probably required to do the course.</p>

## Installing Anaconda on a Chromebook

I followed the general instructions listed on __[this Reddit page](https://www.reddit.com/r/chromeos/comments/2gtghf/ipython_on_chromebook_dont_mind_if_i_do/)__. Basically that can be summed up as:
1. Download the 64-bit version of Anaconda that you want (Python 3 or Python 2.7) from __[here](https://www.continuum.io/downloads)__ into your *Downloads* directory.
2. Open a shell in your Chromebook by holding '*Ctl Alt T*' and the typing "*shell*"
3. In the resulting shell that opens do the following:
```bash
cd /usr
sudo chmod a+rw local
cp ~/Downloads/Anaconda2-4.4.0-Linux-x86_64.sh ./
bash ./Anaconda2-4.4.0-Linux-x86_64.sh 
```
That's it. You can test that conda is in your path by typing
```bash
which conda
```
and getting back
```
/usr/local/anaconda2/bin/conda
```

## Setting up a new virtual environment

Now that you have Anaconda installed, it's time to create a virtual environment.  You can enter the virtual environment from anywhere, but for clarity, I'm going to assume a known starting location of `~/Downloads`.
1. Open a shell and type the following
```bash
mkdir ~/Downloads/code/udata_a1
cd ~/Downloads/code/udata_a1
conda create -n udata_a1
source activate udata_a1
```
You should see your prompt change after running the `source activate` command to something like this:
```
(udata_a1) chronos@localhost ~/Downloads/code/udata_a1 
```
This means that you are inside the vitial environment named "udata_a1".  You can exit your virtual environment by typing,
```
source deactivate udata_a1
```
and your prompt should return to normal.

## Install specific packages

Installing the packages needed for the class is pretty easy.  You simply activate an environment and then execute the following command:

```bash
conda install python=2.7 numpy pandas matplotlib seaborn
```

This will install all of the specified items, as well as any dependencies that they need.  You can verify they're installed using the `conda list` command.

```bash
conda list
matplotlib                2.0.2               np113py27_0  
numpy                     1.13.0                   py27_0  
pandas                    0.20.2              np113py27_0  
python                    2.7.13                        0  
python-dateutil           2.6.0                    py27_0  
seaborn                   0.7.1                    py27_0  
etc.
```
NOTE that I truncated the output of the full listing for clarity.

## Start up Jupyter notebook

Now all that's left to do is fire up Jupyter notebook and connect to it in a browser using the address it gives you.

```bash
jupyter notebook --no-browser
[I 21:27:47.307 NotebookApp] Serving notebooks from local directory: /home/chronos/user/Downloads/code/udata_a1
[I 21:27:47.307 NotebookApp] 0 active kernels 
[I 21:27:47.307 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=51e3efe3138160ae4d7aecfa19b4ee858a940d67bfecd57d
[I 21:27:47.307 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 21:27:47.307 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=51e3efe3138160ae4d7aecfa19b4ee858a940d67bfecd57d
```

From there you can open .ipynb files or explore the contents of the directory from which you started the notebook server.