# Week 01 Coding: Environment Setup

## BEENOS Inc. Machine Learning Study Group

In this first week, you'll determine whether your environment is properly setup for machine learning projects in Python.

### How to Use This Notebook

If you are viewing this notebook through Binder, you can try to execute the exercise cells directly in your browser. You will see the server's versions of Python, conda, etc. but you may not be able to update any packages. As such, **I highly recommend you open the shell and run these commands on your local machine.** Doing so will ensure the output is specific to your system.

## Part 1: Which Python?

It's up to you to choose how to set up your Python machine learning environment.

Do you want a quick installation, with everything done for you? Then go with **Anaconda**. This is a great choice if you're an absolute beginner, or if you're not sure what libraries you will or won't need.

If you go with Anaconda, you'll need lots of disk space to install all the libraries that come with it. (*Miniconda* is a slimmed-down alternative for those who don't want to bloat their system with lots of packages.)

Do you prefer to install just what you need, or need to have more control over your system? Then you can [download **Python** from the official homepage](https://www.python.org/) and install it directly.

When you install Python on your own, you won't get many of the popular machine learning libraries that come with Anaconda. You'll have to install those separately with `pip install`. You'll also need a good *virtual environment* manager to help keep your projects separate. Anaconda comes pre-installed with this functionality.

### Anaconda: Out-of-the-Box Data Science

Anaconda is a Python (and R) distribution that aims to give you everything you need to start doing data science immediately. With Anaconda, you get:

* the full Python scientific stack
* easy installation of new packages
* environment management, separate from sys Python

This is super useful for those new to scientific computing in Python. Anaconda lets you get up and running with machine learning libraries in no time.

### Step 1: Installing Anaconda

[Click here to download the latest Anaconda distribution.](https://www.anaconda.com/distribution/)

You can download the command line installer or the graphical installer. **I highly encourage you to use the latest version of Python 3.** As of this writing, that's Python 3.7.

Once the file is downloaded, run the installer. Then, execute the following code cell to check that conda has been properly installed.

#### Exercise 0: Run the following commands in your shell.

```python
which conda
which python
```

The above commands should show you where conda and Python are installed on your system.

You can also check their versions:

#### Exercise 1: Run the following commands in your shell.

```python
conda --version
python --version
```

If the above commands show an error, then you need to troubleshoot your system. Be sure to do this before moving foward.

### Step 2: Update conda

There may be updates to conda that aren't a part of the distribution you installed. To check this, you can run `conda update conda` to check and be sure that you have the latest version.

#### Exercise 2: Run the following command in your shell.

```python
conda update conda
```

### Step 3: Create a new environment

When you start building multiple projects in python, you might find that each one requires a different version of a Python package you have installed. For example, a data analysis tutorial might ask you to install `numpy==1.12.0`, while our study group uses `numpy==1.16.0`.

Updating numpy might break the code for your previous installation, and then you can't finish the tutorial!

What should you do if you want to use both versions at the same time?

This is where **virtual environments** come in. These are isolated Python installations that hold separate versions of libraries and even Python itself.

So, in your `curiositeam` environment you can safely install the latest version of numpy, and keep the older version in a separate environment titled `data-tutorial`.

Then, you *activate* the environment you need when you're ready to work on a specific project.

Creating and managing environments in conda is easy. Simply pass the name of the environment you want to build.

#### Exercise 3: Create a virtual environment for this study group. Replace `[my_env]` with your desired name.

```python
conda create -n [my_env]
```

I used `curiositeam` as the name of my environment. You can also pass your preferred version of Python with `python=x.x`, but conda should default to the version included with your distribution.

#### Exercise 4: Activate your environment.

```python
conda activate [my_env]
```

### Step 4: Install Libraries

To do machine learning in Python, you'll need libraries from the Python scientific stack. This includes (but is not limited to):

* iPython/Jupyter
* SciPy
* Numpy
* Pandas
* Matplotlib
* scikit-learn

...and many more!

The good news is, if you're using Anaconda, most of these libraries should already be in your system.

You can use `conda list` to see all the packages that are currently installed, but it can be a bit time-consuming to scroll through and check them all manually.

Instead, you can ask conda to search for a specific package by passing its name.

#### Exercise 5: Use `conda list [package]` to check and see if your packages are installed.

If the package isn't installed, conda should return an empty row, like this:

In [7]:
# conda doesn't return this package, because it's not installed

! conda list i_dont_exist

# packages in environment at /Users/montgomery_j/anaconda3/envs/curiositeam:
#
# Name                    Version                   Build  Channel


If see an empty row when you try to list a package, then that means you need to install it.

When you activate your environment for the first time, you may not see any packages installed. You can manually install packages into specific environments using `conda install` and passing the name of the environment you want to install into.

#### Exercise 6: Use `conda install -n [my_env] [package]` to insall any missing packages.

## Ready to go?

**This concludes the Anaconda setup!!**

When you're all done with your project for now, you can run `conda deactivate` to exit your virtual environment.

#### Exercise 7: Run `conda deactivate` to leave your virtual environment.

In the future, should you ever need to install or use any new Python library, you can run the commands found in this notebook. (Note that you can still run the traditional `pip install` to get packages that aren't distributed through conda.)

### HELP! I'm having trouble!

Getting your environment setup properly is one of the most time-consuming parts of the development process.

If the above steps aren't working for you, **stop now and troubleshoot!** Unfortunately, due to the system-specific nature of error messages, I can't tell you how to fix your system. Plus, troubleshooting is part of the job. Head on over to Google, Stack Overflow or Reddit and search for help.

## If you're using `pip`...

If you're not using Anaconda, you can still create and manage virtual environments with Python.

Real Python has a great tutorial on creating and managing virtual environments in Python. Check it out here:

[Python Virtual Environments: A Primer](https://realpython.com/python-virtual-environments-a-primer/)