# Python For Data Analysis
## Class 1

The objectives of this class are for y'all to have:

1. Installed python3 and created a virtualenvironment to work in
2. Gained some familiarity with python's package manager, `pip`
3. Learned to use the `ipython` interactive shell
4. Learned some of the basics of python functionality and style

### Install python and virtualenv

Install Homebrew

```sh
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

Update homebrew
```sh
$ brew update
$ brew doctor
```


Install Python3
```sh
$ brew install python3
```

Install virtualenv and virtualenv wrapper

```sh
$ pip3 install virtualenv
$ pip3 install virtualenvwrapper
$ export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
$ source /usr/local/bin/virtualenvwrapper.sh
```

Make sure virtualenvwrapper will start correctly for next time you open a new shell

```sh
$ echo "export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3" >> ~/.bash_profile
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bash_profile
```

### virtual environments in python

`pip` is python's package manager. `pip3` is used for installing packages for `python3`. If we get our virtual environments set up correctly, you won't have to remember when to use `pip3`, `pip` should just work. However, it's good to know that there's some magic happening in the background.

Virtual environments allow you to easily keep track of which external libraries (and versions!) are required for a project. This may seem like a pain in the beginning, but will end up saving you a lot of confusing down the line.

Let's create a virtual environment for ourselves to work in

```sh
$ cd ~/workspace
$ mkdir python-for-data-analysis
$ cd python-for-data-analysis
$ mkvirtualenv python-for-data-analysis
$ setvirtualenvproject
```

To deactivate a virtualenvironment, we simply use the `deactivate` command

```sh
$ deactivate
```

With our project set up this way, we can easily jump to the directory our project is in when we want to get started

```
$ cd ~
$ workon python-for-data-analysis
$ pwd
```

### IPython

We'll use the IPython interactive shell for developing.


Once we've activated our virtual environment, we need to install IPython.
```sh
$ pip install ipython
```

Once we start ipython, we should see that we're working using python3:

```sh
$ ipython
Python 3.6.0 (default, Dec 24 2016, 08:01:42) 
```

IPython has some nice features for developing which we'll introduce to you over the next few weeks.

** Let's Pause Here and Make Sure Everyone Has Their Virtual Environment and IPython Installation Set Up **

In [12]:
# Try the following code in your IPython repl
print("Hello World!")
me = "Michael"
    print("Hello " + me + "!")

Hello World!
Hello Michael!


### Some Python Basics

In [15]:
# variable assignment
x = 3
y = "banana"
print(x)
print(type(x))
print(y)
print(type(y))

3
<class 'int'>
banana
<class 'str'>


In [19]:
# Lists
z = [1, "a", x**2]
print(z)

# Python is 0-indexed!
print(z[0])
print(z[2])



[1, 'a', 9]
1
9


In [None]:
# What happens if we run this?
# print(z[3])

In [23]:
# Dictionaries
d = {'first_name': 'Michael', 'last_name': 'Kaminsky', 'something else': z}
print(d)
print(d['first_name'])
print(d['last_name'])
print(d['something else'])

{'first_name': 'Michael', 'last_name': 'Kaminsky', 'something else': [1, 'a', 9]}
Michael
Kaminsky
[1, 'a', 9]


In [27]:
print(d.keys())

dict_keys(['first_name', 'last_name', 'something else'])


In [34]:
# Loops
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [35]:
for item in z:
    print(item)    

1
a
9


#### Exercise
1. Create a dictionary with the names of your family members where their first name is the key, the value is their age.
2. Write a loop that loops through all entries in the dictionary and prints your family member's first names and ages

In [36]:
# Functions

def my_func():
    print("hello world!") # Note: white space matters!
    
my_func()

hello world!


In [38]:
def my_other_func(name):
    print("Hello " + name)
    
my_other_func("Michael")

Hello Michael


In [41]:
def welcome(name, age=None):
    print("Hello " + name)
    if age:
        print("I understand you're " + str(age) + " years old")
    
welcome("Michael", 29)

# What happens if we run this?
# welcome("Michael")


Hello Michael
I understand you're 29 years old


#### Exercise
1. Create a function that will add entries into your family member dictionary. 
2. Add the following entries: Kermit age 99, Othello age 14, William age 40

### Pandas

In [1]:
!pip install pandas # Ipython magic!
!pip install jupyter
import pandas as pd # "import" is how we load a package in python
#exit()



# Load 311 data
```bash
$ cd ../
$ git clone https://github.com/jvns/pandas-cookbook
```
    

In [3]:
complaints = pd.read_csv('../pandas-cookbook/data/311-service-requests.csv', low_memory=False)


In [4]:
print(complaints.head())

   Unique Key            Created Date             Closed Date Agency  \
0    26589651  10/31/2013 02:08:41 AM                     NaN   NYPD   
1    26593698  10/31/2013 02:01:04 AM                     NaN   NYPD   
2    26594139  10/31/2013 02:00:24 AM  10/31/2013 02:40:32 AM   NYPD   
3    26595721  10/31/2013 01:56:23 AM  10/31/2013 02:21:48 AM   NYPD   
4    26590930  10/31/2013 01:53:44 AM                     NaN  DOHMH   

                               Agency Name           Complaint Type  \
0          New York City Police Department  Noise - Street/Sidewalk   
1          New York City Police Department          Illegal Parking   
2          New York City Police Department       Noise - Commercial   
3          New York City Police Department          Noise - Vehicle   
4  Department of Health and Mental Hygiene                   Rodent   

                     Descriptor        Location Type Incident Zip  \
0                  Loud Talking      Street/Sidewalk        11432   
1 

In [5]:
print(complaints.columns)

Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
       'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
       'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
       'Intersection Street 1', 'Intersection Street 2', 'Address Type',
       'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
       'Resolution Action Updated Date', 'Community Board', 'Borough',
       'X Coordinate (State Plane)', 'Y Coordinate (State Plane)',
       'Park Facility Name', 'Park Borough', 'School Name', 'School Number',
       'School Region', 'School Code', 'School Phone Number', 'School Address',
       'School City', 'School State', 'School Zip', 'School Not Found',
       'School or Citywide Complaint', 'Vehicle Type', 'Taxi Company Borough',
       'Taxi Pick Up Location', 'Bridge Highway Name',
       'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment',
       'Garage Lot Name', 'Ferry Direction', 'Ferry Termina

In [6]:
print(complaints['Created Date'])

0         10/31/2013 02:08:41 AM
1         10/31/2013 02:01:04 AM
2         10/31/2013 02:00:24 AM
3         10/31/2013 01:56:23 AM
4         10/31/2013 01:53:44 AM
5         10/31/2013 01:46:52 AM
6         10/31/2013 01:46:40 AM
7         10/31/2013 01:44:19 AM
8         10/31/2013 01:44:14 AM
9         10/31/2013 01:34:41 AM
10        10/31/2013 01:25:12 AM
11        10/31/2013 01:24:14 AM
12        10/31/2013 01:20:57 AM
13        10/31/2013 01:20:13 AM
14        10/31/2013 01:19:54 AM
15        10/31/2013 01:14:02 AM
16        10/31/2013 12:54:03 AM
17        10/31/2013 12:52:46 AM
18        10/31/2013 12:51:00 AM
19        10/31/2013 12:46:27 AM
20        10/31/2013 12:43:47 AM
21        10/31/2013 12:41:17 AM
22        10/31/2013 12:39:55 AM
23        10/31/2013 12:38:00 AM
24        10/31/2013 12:37:16 AM
25        10/31/2013 12:35:18 AM
26        10/31/2013 12:33:00 AM
27        10/31/2013 12:32:44 AM
28        10/31/2013 12:32:08 AM
29        10/31/2013 12:32:00 AM
          

In [8]:
print(complaints.Location)

0          (40.70827532593202, -73.79160395779721)
1         (40.721040535628305, -73.90945306791765)
2          (40.84332975466513, -73.93914371913482)
3           (40.7780087446372, -73.98021349023975)
4          (40.80769092704951, -73.94738703491433)
5           (40.7499893014072, -73.88198770727831)
6          (40.68153278675525, -73.83173699701601)
7          (40.67181584567338, -73.84309181950769)
8          (40.73991339303542, -74.00079028612932)
9          (40.66820406598287, -73.95064760056546)
10         (40.63437840816299, -73.96946177104543)
11         (40.73081644089586, -73.98607265739876)
12         (40.78897400211689, -73.95225898702977)
13         (40.89151738488846, -73.83645714593568)
14          (40.6264774690411, -73.99921826202639)
15          (40.7965967075252, -73.97036973473399)
16          (40.63618202176914, -74.1161500428337)
17         (40.63243692394328, -73.88817263437012)
18                                             NaN
19         (40.85205827756883, 

In [9]:
print(complaints[0:3])

   Unique Key            Created Date             Closed Date Agency  \
0    26589651  10/31/2013 02:08:41 AM                     NaN   NYPD   
1    26593698  10/31/2013 02:01:04 AM                     NaN   NYPD   
2    26594139  10/31/2013 02:00:24 AM  10/31/2013 02:40:32 AM   NYPD   

                       Agency Name           Complaint Type  \
0  New York City Police Department  Noise - Street/Sidewalk   
1  New York City Police Department          Illegal Parking   
2  New York City Police Department       Noise - Commercial   

                     Descriptor        Location Type Incident Zip  \
0                  Loud Talking      Street/Sidewalk        11432   
1  Commercial Overnight Parking      Street/Sidewalk        11378   
2              Loud Music/Party  Club/Bar/Restaurant        10032   

   Incident Address                    ...                     \
0  90-03 169 STREET                    ...                      
1         58 AVENUE                    ...         

In [11]:
print(complaints[0:1])

   Unique Key            Created Date Closed Date Agency  \
0    26589651  10/31/2013 02:08:41 AM         NaN   NYPD   

                       Agency Name           Complaint Type    Descriptor  \
0  New York City Police Department  Noise - Street/Sidewalk  Loud Talking   

     Location Type Incident Zip  Incident Address  \
0  Street/Sidewalk        11432  90-03 169 STREET   

                    ...                    Bridge Highway Name  \
0                   ...                                    NaN   

  Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name  \
0                      NaN       NaN                    NaN             NaN   

  Ferry Direction Ferry Terminal Name   Latitude  Longitude  \
0             NaN                 NaN  40.708275 -73.791604   

                                  Location  
0  (40.70827532593202, -73.79160395779721)  

[1 rows x 52 columns]


In [12]:
print(complaints['Location'].dtype)

object


#### Exercise
Write a loop that will loop through the columns of the data frame printing their names and types

In [13]:
print(len(complaints))

111069


In [19]:
!pip install matplotlib
import matplotlib
%matplotlib inline



In [24]:
complaints['created'] = pd.to_datetime(complaints['Created Date'])