# Week 01: Environment Setup

## BEENOS Inc. Machine Learning Study Group

In this first week, you'll determine whether your environment is properly setup for machine learning projects in Python.

## Part 1: Which Python?

It's up to you to choose how to set up your Python machine learning environment.

Do you want a quick installation, with everything done for you? Then go with **Anaconda**. This is a great choice if you're an absolute beginner, or if you're not sure what libraries you will or won't need.

If you go with Anaconda, you'll need lots of disk space to install all the libraries that come with it. (*Miniconda* is a slimmed-down alternative for those who don't want to bloat their system with lots of packages.)

Do you prefer to install just what you need, or need to have more control over your system? Then you can [download **Python** from the official homepage](https://www.python.org/) and install it directly.

When you install Python on your own, you won't get many of the popular machine learning libraries that come with Anaconda. You'll have to install those separately. You'll also need a good *virtual environment* manager to help keep your projects separate. Anaconda comes pre-installed with this functionality.

In this notebook, we'll focus on **installing Anaconda.**

### Anaconda: Out-of-the-Box Data Science

Anaconda is a Python (and R) distribution that aims to give you everything you need to start doing data science immediately. With Anaconda, you get:

* the full Python scientific stack
* easy installation of new packages
* environment management, separate from sys Python

This is super useful for those new to scientific computing in Python. Anaconda lets you get up and running with machine learning libraries in no time.

### Step 1: Install Anaconda

[Click here to download the latest Anaconda distribution.](https://www.anaconda.com/distribution/)

You can download the command line installer or the graphical installer. **I highly encourage you to use the latest version of Python 3.** As of this writing, that's Python 3.7.

Once the file is downloaded, run the installer. Then, execute the following code cell to check that conda has been properly installed.

In [13]:
# Exercise 0: Run this code cell.
#===============================#

! which conda
! which python

/Users/montgomery_j/anaconda3/bin/conda
/Users/montgomery_j/anaconda3/bin/python


The above commands should show you where conda and Python are installed on your system.

You can also check their versions:

In [14]:
# Exercise 1: Run this code cell.
#===============================#

! conda --version
! python --version

conda 4.6.14
Python 3.7.1


If the above commands show an error, then you need to debug your system. **Don't move forward until you can get them running properly!** Doing so will only frustrate you down the road.

### Step 2: Install Libraries

To do machine learning in Python, you'll need libraries from the Python scientific stack. This includes (but is not limited to):

* iPython/Jupyter
* SciPy
* Numpy
* Pandas
* Matplotlib
* scikit-learn

...and many more!

The good news is, if you're using Anaconda, most of these libraries should already be in your system.

You can use `conda list` to see all the packages that are currently installed, but it can be a bit time-consuming to scroll through and check them all manually.

Instead, you can ask conda to search for a specific package by passing its name.

In [23]:
# Exercise 2: Run this code cell to check if pandas is installed.
#===============================================================#

! conda list pandas

# packages in environment at /Users/montgomery_j/anaconda3:
#
# Name                    Version                   Build  Channel
pandas                    0.23.4           py37h6440ff4_0  


The above code cell should show where `pandas` is installed on your system.

If pandas isn't installed, you should see an empty row, like this:

In [27]:
# conda doesn't return this package, because it's not installed

! conda list i_dont_exist

# packages in environment at /Users/montgomery_j/anaconda3:
#
# Name                    Version                   Build  Channel


In [32]:
# Exercise 3: Run this code cell to check the rest of the scientific stack.
#=========================================================================#

! conda list numpy 
! conda list scipy
! conda list jupyter
! conda list matplotlib
! conda list scikit-learn

# packages in environment at /Users/montgomery_j/anaconda3:
#
# Name                    Version                   Build  Channel
numpy                     1.15.4           py37hacdab7b_0  
numpy-base                1.15.4           py37h6575580_0  
numpydoc                  0.8.0                    py37_0  
# packages in environment at /Users/montgomery_j/anaconda3:
#
# Name                    Version                   Build  Channel
scipy                     1.1.0            py37h1410ff5_2  
# packages in environment at /Users/montgomery_j/anaconda3:
#
# Name                    Version                   Build  Channel
jupyter                   1.0.0                    py37_7  
jupyter_client            5.2.4                    py37_0  
jupyter_console           6.0.0                    py37_0  
jupyter_core              4.4.0                    py37_0  
jupyterlab                0.35.3                   py37_0  
jupyterlab_server         0.2.0                    py37_0  
# packages in

If you don't see a package listed in the output above, then that means you need to install it.

#### Exercise 4 (Optional): Go to the terminal and use `conda install <package_name>` to insall any missing packages.

### Step 3: Update conda

There may be updates to conda or Python packages that weren't a part of the distribution you installed. To check this, you can run `conda update conda` and `conda update <package_name>` to check and be sure that you have the latest versions.

#### Exercise 5 (Optional): Go to the terminal and run `conda update conda` or `conda update <package_name>` if necessary.

**This concludes the Anaconda setup!!** In the future, should you ever need to install or use any new Python library, you can run the commands found in this notebook. (Note that you can still run the traditional `pip install` to get packages that aren't distributed through conda.)

### HELP! I'm having trouble!

Getting your environment setup properly is one of the most time-consuming parts of the development process.

If the above code cells aren't working for you, **stop now and troubleshoot!** Unfortunately, due to the system-specific nature of error messages, I can't tell you how to fix your system. Plus, troubleshooting is part of the job. Head on over to Google, Stack Overflow or Reddit and search for help.

**Do not move forward until the above code cells run without errors.**

## Part 2: Which environment?

It's up to you to choose how to set up your Python machine learning environment.

Do you want a quick installation, with everything done for you? Then go with **Anaconda**. This is a great choice if you're an absolute beginner, or if you're not sure what libraries you will or won't need.

If you go with Anaconda, you'll need lots of disk space to install all the libraries that come with it. (*Miniconda* is a slimmed-down alternative for those who don't want to bloat their system with lots of packages.)

Do you prefer to install just what you need, or need to have more control over your system? Then you can [download **Python** from the official homepage](https://www.python.org/) and install it directly.

When you install Python on your own, you won't get many of the popular machine learning libraries that come with Anaconda. You'll have to install those separately. You'll also need a good *virtual environment* manager to help keep your projects separate. Anaconda comes pre-installed with this functionality.

In this notebook, we'll focus on **installing Anaconda.**