# Environment Setup

While some computers come with Python already installed, such as those running macOS, we will need a number of other Python tools (packages) in order to effectively manipulate, analyze, and visualize data. This guide will walk you through the easiest ways to set up your own data science environment, from installing python to running your first Jupyter notebook.

## 0a. Using Conda

We highly recommend that you download and use
[Conda](https://www.continuum.io/downloads), an open-source package
manager for scientific computing. Why? Well, scientific computing
packages typically link to some heavy-duty C and Fortran libraries.
Conda makes it extremely easy to install these cross-language
dependencies easily on all major platforms. If you don't use Conda,
you might have to spend time messing around with installing an entire
Fortran toolchain, which is definitely not fun (especially on Windows). This might take a long time to download so make sure you have a strong wifi signal!

The easiest way to install Conda is with Anaconda (you should pick python 2), which is essentially Conda bundled with two dozen commonly
installed dependencies. Anaconda comes with
[Spyder](https://en.wikipedia.org/wiki/Spyder_%28software%29) a nice IDE
for Python data science.  We will also use [Jupyter
notebooks](http://jupyter.org/), which let you code in the browser (this
curriculum was originally developed on a Jupyter notebook, and we do
make use of a couple Jupyter-specific commands).

Be warned, though: Anaconda installs a _lot_ of packages.  If you're
strapped for hard drive space, consider installing
[Miniconda](http://conda.pydata.org/miniconda.html), the minimal
distribution with Conda.
After installing Conda, open up a command prompt (`cmd.exe` for Windows
and `Terminal` for Mac) and type:

```bash
$ conda update conda
$ conda install bokeh matplotlib notebook numpy pandas requests scikit-learn seaborn
```

For Conda power users, you also use the following `environment.yml` file:

```
name: devfest
dependencies:
- beautifulsoup4=4.4
- matplotlib=1.5.1
- notebook=4.1.0
- numpy=1.10.2
- pandas=0.17.1
- python=3.5
- requests=2.9.1
- seaborn=0.7.0
```

## 0b. Without Conda

We highly suggest using conda for easy package management. You should only be using python without conda if you do not have root access to your machine.

If you're not using Conda, then we recommend you install Python 2.7 from
the [Python website](https://www.python.org/). Certain computers may
come with a version of Python installed: if you see something like:

```bash
$ python
Python X.X.X (default, Dec  7 2015, 11:16:01)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
```

when you type `python` onto a terminal, then you're good to go (don't
worry too much if you have Python 2.7.A or Python 2.7.B -- our code should be compatible with all versions).

Python should automatically come with `pip`, the Python package
manager. We will install a set of packages commonly used for data science and visualization. Open up a terminal and type:

```bash
$ pip install ipython
$ pip install jupyter
$ pip install matplotlib
$ pip install numpy
$ pip install pandas
$ pip install scikit-learn
$ pip install scipy
$ pip install seaborn
```

On Windows, you might need to visit [Christoph Gohlke's Unofficial
Windows Binaries](http://www.lfd.uci.edu/~gohlke/pythonlibs/) to get
things to install correctly.

## 1. Getting Started

Open a terminal on macOS or linux (or `cmd.exe` on Windows) and run:

```bash
$ jupyter notebook
```

a Jupyter notebook window should pop-up. Just create a new Python 2 notebook and you should be good to go.

If you decide to use something other than a Jupyter notebook, note that we used some Jupyter-specific commands. We wrote this curriculum using [Jupyter notebooks](http://jupyter.org/), so there may be some slight finnegaling required (such as omitting `\%matplotlib inline`). 

In order to access datasets bundled with this curriculum, as well as view it in the original notebooks, it is recommended to "clone" it using git. If your computer does not have git, or you are not familiar with command line interfaces, please download it [here](https://github.com/cerrno/intro-to-datascience), else use the following command in your favorite projects folder:

```bash
$ cd [your_favorite_projects_directory]
$ git clone https://github.com/cerrno/intro-to-datascience.git
```

Hint: the download link for a zip folder can be found on the github project page as follows

<img src="files/Github Instructions.png">