# Python For Data Analysis
## Class 1

The objectives of this class are for y'all to have:

1. Installed python3 and created a virtualenvironment to work in
2. Gained some familiarity with python's package manager, `pip`
3. Learned to use the `ipython` interactive shell
4. Learned some of the basics of python functionality and style
5. Learned about python notebooks
6. Done some basic plotting and work with `pandas`

### Install python and virtualenv

Install Homebrew

```sh
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

Update homebrew
```sh
$ brew update
$ brew doctor
```


Install Python3
```sh
$ brew install python3
```

Install virtualenv and virtualenv wrapper

```sh
$ pip3 install virtualenv
$ pip3 install virtualenvwrapper
$ export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
$ source /usr/local/bin/virtualenvwrapper.sh
```

Make sure virtualenvwrapper will start correctly for next time you open a new shell

```sh
$ echo "export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3" >> ~/.bash_profile
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bash_profile
```

### virtual environments in python

`pip` is python's package manager. `pip3` is used for installing packages for `python3`. If we get our virtual environments set up correctly, you won't have to remember when to use `pip3`, `pip` should just work. However, it's good to know that there's some magic happening in the background.

Virtual environments allow you to easily keep track of which external libraries (and versions!) are required for a project. This may seem like a pain in the beginning, but will end up saving you a lot of confusing down the line.

Let's create a virtual environment for ourselves to work in

```sh
$ cd ~/workspace
$ mkdir python-for-data-analysis
$ cd python-for-data-analysis
$ mkvirtualenv python-for-data-analysis
$ setvirtualenvproject
```

To deactivate a virtualenvironment, we simply use the `deactivate` command

```sh
$ deactivate
```

With our project set up this way, we can easily jump to the directory our project is in when we want to get started

```
$ cd ~
$ workon python-for-data-analysis
$ pwd
```

### IPython

We'll use the IPython interactive shell for developing.


Once we've activated our virtual environment, we need to install IPython.
```sh
$ pip install ipython
```

Once we start ipython, we should see that we're working using python3:

```sh
$ ipython
Python 3.6.0 (default, Dec 24 2016, 08:01:42) 
```

IPython has some nice features for developing which we'll introduce to you over the next few weeks.

** Let's Pause Here and Make Sure Everyone Has Their Virtual Environment and IPython Installation Set Up **

In [None]:
# Try the following code in your IPython repl
print("Hello World!")
me = "Michael"
    print("Hello " + me + "!")

### Some Python Basics

In [None]:
# variable assignment
x = 3
y = "banana"
print(x)
print(type(x))
print(y)
print(type(y))

In [None]:
# Lists
z = [1, "a", x**2]
print(z)

# Python is 0-indexed!
print(z[0])
print(z[2])



In [None]:
# What happens if we run this?
# print(z[3])

In [None]:
# Dictionaries
d = {'first_name': 'Michael', 'last_name': 'Kaminsky', 'something else': z}
print(d)
print(d['first_name'])
print(d['last_name'])
print(d['something else'])

In [None]:
print(d.keys())

In [None]:
# Loops
for i in range(10):
    print(i)

In [None]:
for item in z:
    print(item)    

#### Exercise
1. Create a dictionary with the names of your family members where their first name is the key, the value is their age.
2. Write a loop that loops through all entries in the dictionary and prints your family member's first names and ages

In [None]:
# Functions

def my_func():
    print("hello world!") # Note: white space matters!
    
my_func()

In [None]:
def my_other_func(name):
    print("Hello " + name)
    
my_other_func("Michael")

In [None]:
def welcome(name, age=None):
    print("Hello " + name)
    if age:
        print("I understand you're " + str(age) + " years old")
    
welcome("Michael", 29)

# What happens if we run this?
# welcome("Michael")


#### Exercise
1. Create a function that will add entries into your family member dictionary. 
2. Add the following entries: Kermit age 99, Othello age 14, William age 40

### Notebooks

In [None]:
!pip install pandas # Ipython magic!
!pip install jupyter
#exit()

Start up our notebook server
```sh
$ jupyter
```

Open a browser (if one doesn't open for you). And navigate to http://localhost:8888/

#### Notebook Basics

Notebooks are a very convenient way for organizing and, most importantly, sharing data analyses. They allow for single-purpose explanatory scripts to be shared (complete with visualizations). They are often used in research settings or for prototyping. The ability to intersperse code with formatted, explanatory markdown is especially useful.

Basic shortcuts and UI:
* Edit a cell by hitting "enter"
* Switch out of edit  mode by hitting "esc"
* Run a cell with cmd+"enter"
* Add a new cell above with "a" (when not in edit mode)
* Add a new cell below with "b" (when not in edit mode)

You can find more information [in the docs](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Notebook%20Basics.html)

# Load 311 data
```bash
$ cd ../
$ git clone https://github.com/jvns/pandas-cookbook
$ cd python-for-data-analysis
```
    

In [None]:
import pandas as pd # use 'as' keyword to namespace a package
complaints = pd.read_csv('../pandas-cookbook/data/311-service-requests.csv', low_memory=False)


In [None]:
print(complaints.head())

In [None]:
print(len(complaints))

In [None]:
print(complaints.columns)

In [None]:
print(complaints['Created Date'])

In [None]:
print(complaints[0:3])

In [None]:
print(complaints['Location'].dtype)

#### Exercise
Write a loop that will loop through the columns of the data frame printing their names and types

In [None]:
!pip install matplotlib
import matplotlib
%matplotlib inline

In [None]:
complaints['created'] = pd.to_datetime(complaints['Created Date'])
complaints.set_index('created', inplace=True)

In [None]:
complaints['Unique Key'].groupby([complaints.index.date]).count().plot(kind='line')




#df.groupby([df.index.date, 'action']).count().plot(kind='bar')


In [None]:
complaints['Unique Key'].groupby([complaints.index.date]).count().plot(kind='line', rot=90)

In [None]:
# Plot by hour
complaints['Unique Key'].groupby([complaints.index.hour]).count().plot(kind='line', rot=90)

# What's going on with this?

In [None]:
complaints[['Unique Key', 'Borough']].groupby(['Borough']).count().plot(kind='Bar', rot=90)

Exercise: 
* plot a line chart with complaints by day by borough (time on the x axis, one line per borough)