# Getting started

## Anaconda Installation 

We will use Python via the Anaconda distribution.

Download Anaconda [here](https://www.anaconda.com/download).

:::{.callout-note}
There are a variety of different Python distributions; for statistics and machine learning, we recommend Anaconda as it comes with many useful packages pre-installed. 

(It is also a good idea NOT to use your computer's pre-installed Python - you don't want to accidentally change any system settings!)
:::

Anaconda requires a few GBs of storage - a more lightweight version is Miniconda, which you can download [here](http://conda.pydata.org/miniconda.html).

## Managing packages

There are many open source Python packages for statistics and machine learning.

To download packages, two popular package managers are **Conda** and **Pip**.  Both Conda and Pip come with the Anaconda distribution.

**Conda** is a general-purpose package management system, designed to build and manage software of any type from any language. This means conda can take advantage of many non-python packages (like BLAS, for linear algebra operations).

**Pip** is a package manager for python. You may see people using pip with environments using [virtualenv](https://virtualenv.pypa.io/en/stable/) or [venv](https://docs.python.org/3/library/venv.html).

We recommend:

- use a [conda environment](#environments)
- within this environment, use conda to install base packages such as `pandas` and `numpy`
- if a package is not available via conda, then use pip

See [here](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) for some conda vs pip misconceptions, and why conda is helpful.


## Environments

### About

It is good coding practice to use virtual environments with Python. From [this blog](https://www.dataquest.io/blog/a-complete-guide-to-python-virtual-environments/):

> A Python virtual environment consists of two essential components: the Python interpreter that the virtual environment runs on and a folder containing third-party libraries installed in the virtual environment. These virtual environments are isolated from the other virtual environments, which means any changes on dependencies installed in a virtual environment don’t affect the dependencies of the other virtual environments or the system-wide libraries. Thus, we can create multiple virtual environments with different Python versions, plus different libraries or the same libraries in different versions.

![](https://www.dataquest.io/wp-content/uploads/2022/01/python-virtual-envs1.webp)


### Creating an environment for MSDS-534

We recommend creating a virtual environment for your MSDS-534 coding projects. This way, you can have an environment with all the necessary packages and you can easily keep track of what versions of the packages you used.

1. Open Terminal (macOS) or a shell 
2. Create an environment called `msds534` using Conda with the command:
   ```conda create --name msds534```
3. To install packages in your environment, first activate your environment:
   ```conda activate msds534```
4. Then, install the following packages using the command:
   ```conda install numpy pandas scikit-learn matplotlib seaborn jupyter ipykernel```
5. Install PyTorch by running the appropriate command from [here](https://pytorch.org) (for macOS, the command is: `pip3 install torch torchvision`)
6. To exit your environment:
   ```conda deactivate```

Here is a helpful [cheatsheet](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) for `conda` environment commands.

For more details about the shell / bash, here is a helpful [resource](https://missing.csail.mit.edu/2020/course-shell/).


## VSCode

There are a number of Python IDEs (integrated development environments). In class, we will be using VSCode (download [here](https://code.visualstudio.com)).

1. Download `lecture-1.ipynb` [here] and open it in VSCode. 
2. To use your `msds534` environment, on the top right hand corner, click "Select Kernel" > "Python Environments" > msds534. If it prompts you to install `ipykernel`, follow the prompts to install it. 

Jupyter notebooks (`.ipynb` files) are useful to combine code cells with text (as [markdown](https://www.markdownguide.org/basic-syntax/) cells).

VSCode also has a Python interactive window [(details here)](https://code.visualstudio.com/docs/python/jupyter-support-py).

## Learning Python

In this class, we will assume some familiarity with:

- [Numpy](https://moran-teaching.github.io/msds597-website/lec-1/lecture-1.html)
- [Pandas](https://moran-teaching.github.io/msds597-website/lec-2/lecture-2.html)
- [Object-oriented programming](https://moran-teaching.github.io/msds597-website/lec-10/lecture-10.html#interlude-object-oriented-programming)