# Introduction to Python and Jupyter Notebooks

To begin, be sure you understand how to move between cells in a Jupyter notebook and change them from code to markdown.  If you want additional work with styling markdown cells, please see the [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).  In this part of the notebook, we will review some numpy basics and create some simple plots with Matplotlib.

In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/T8JGn4JRy4g?ecver=1" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

In [2]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### NumPy and Matplotlib

To begin, let's play with some basic `matplotlib` plots and the NumPy random methods. For more information please consult the documentation [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html). 

In [3]:
a = np.random.randint(1, 20, 100)

In [4]:
plt.figure()
plt.hist(a)

<IPython.core.display.Javascript object>

(array([10.,  9., 16., 10.,  8.,  5.,  8.,  7., 17., 10.]),
 array([ 1. ,  2.8,  4.6,  6.4,  8.2, 10. , 11.8, 13.6, 15.4, 17.2, 19. ]),
 <a list of 10 Patch objects>)

In [5]:
b = np.random.random(100)
c = np.random.normal(5, 10, 100)
d = np.random.binomial(100, .3, 100)

In [6]:
np.random.binomial?

In [7]:
a[:5]

array([13,  6,  6,  7,  7])

In [8]:
plt.figure(figsize = (9, 6))

plt.subplot(2, 2, 1)
plt.hist(a)
plt.title("Random Integers")

plt.subplot(2, 2, 2)
plt.hist(b, color = 'green')
plt.title("Random Floats")

plt.subplot(2, 2, 3)
plt.hist(c, color = 'grey')
plt.title("Normal Distribution")

plt.subplot(2, 2, 4)
plt.hist(d, color = 'orange')
plt.title("Binomial Distribution")

<IPython.core.display.Javascript object>

Text(0.5,1,'Binomial Distribution')

In [9]:
plt.figure()
plt.scatter(c, d)
plt.title("Scatter Plot", loc = 'left')
plt.xticks([])
plt.yticks([])

<IPython.core.display.Javascript object>

([], <a list of 0 Text yticklabel objects>)

In [10]:
dists = [a, b, c, d]
plt.figure()
plt.boxplot(dists)
plt.title("Boxplots of Distributions", loc = "right")

<IPython.core.display.Javascript object>

Text(1,1,'Boxplots of Distributions')

In [11]:
import seaborn as sns

plt.figure()
for i in [a,c,d]:
    sns.distplot(i, hist = False)

<IPython.core.display.Javascript object>

### Loading Data: Intro to Pandas

Now, we use the Pandas library to examine a variety of datasets.  Below, I create four different `DataFrame` objects from files.  The first three are from `.csv` files located in our **data** directory.  The final, is through the API from NYCOpenData.  We will continue to visit methods of accessing and structuring data, but to begin we use these two popular options.  

To load the `.csv` files, we provide Pandas with a path or url in the `.read_csv()` method.  I load all four datasets in what follows.

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/9Dsg9DQAU_g?ecver=1" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

In [13]:
nyc311data = pd.read_json('https://data.cityofnewyork.us/resource/fhrw-4uyv.json')
drones = pd.read_csv('../data/drones.csv', encoding = "utf8")
ozone = pd.read_csv('../data/ozone.csv')
titanic = pd.read_csv('../data/titanic.csv')

In [14]:
nyc311data.columns

Index([':@computed_region_92fq_4b7q', ':@computed_region_efsh_h5xi',
       ':@computed_region_f5dn_yrer', ':@computed_region_yeji_bk3q',
       'address_type', 'agency', 'agency_name', 'borough',
       'bridge_highway_segment', 'city', 'closed_date', 'community_board',
       'complaint_type', 'created_date', 'cross_street_1', 'cross_street_2',
       'descriptor', 'due_date', 'facility_type', 'ferry_terminal_name',
       'incident_address', 'incident_zip', 'intersection_street_1',
       'intersection_street_2', 'latitude', 'location', 'location_type',
       'longitude', 'park_borough', 'park_facility_name',
       'resolution_action_updated_date', 'resolution_description',
       'school_address', 'school_city', 'school_code', 'school_name',
       'school_not_found', 'school_number', 'school_phone_number',
       'school_region', 'school_state', 'school_zip', 'status', 'street_name',
       'taxi_company_borough', 'taxi_pick_up_location', 'unique_key',
       'x_coordinate_state

In [15]:
nyc311data.dtypes

:@computed_region_92fq_4b7q       float64
:@computed_region_efsh_h5xi       float64
:@computed_region_f5dn_yrer       float64
:@computed_region_yeji_bk3q       float64
address_type                       object
agency                             object
agency_name                        object
borough                            object
bridge_highway_segment             object
city                               object
closed_date                        object
community_board                    object
complaint_type                     object
created_date                       object
cross_street_1                     object
cross_street_2                     object
descriptor                         object
due_date                           object
facility_type                      object
ferry_terminal_name                object
incident_address                   object
incident_zip                      float64
intersection_street_1              object
intersection_street_2             

In [16]:
nyc311data.describe()

Unnamed: 0,:@computed_region_92fq_4b7q,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,:@computed_region_yeji_bk3q,incident_zip,latitude,longitude,unique_key,x_coordinate_state_plane,y_coordinate_state_plane
count,950.0,949.0,950.0,950.0,956.0,950.0,950.0,1000.0,950.0,950.0
mean,28.796842,14795.075869,35.56,3.151579,10838.580544,40.729644,-73.92762,38749510.0,1004302.0,205121.818947
std,14.581474,3637.065419,20.948632,1.20858,543.359591,0.089521,0.070968,2144.728,19678.19,32615.490071
min,1.0,10092.0,1.0,1.0,7114.0,40.523806,-74.211054,38745760.0,925599.0,130177.0
25%,17.0,11723.0,16.25,2.0,10453.0,40.659393,-73.96643,38747600.0,993558.0,179519.25
50%,29.0,13515.0,36.0,3.0,11208.0,40.71072,-73.936934,38749540.0,1001704.0,198235.0
75%,42.0,17216.0,54.0,4.0,11230.0,40.819868,-73.890401,38751350.0,1014611.0,237982.25
max,51.0,24671.0,71.0,5.0,11694.0,40.911842,-73.713145,38753170.0,1063750.0,271503.0


In [17]:
complaints = nyc311data[['complaint_type', 'borough', 'agency', 'agency_name']]

In [18]:
complaints.head()

Unnamed: 0,complaint_type,borough,agency,agency_name
0,Damaged Tree,BROOKLYN,DPR,Department of Parks and Recreation
1,Blocked Driveway,QUEENS,NYPD,New York City Police Department
2,Illegal Parking,BROOKLYN,NYPD,New York City Police Department
3,Food Establishment,BRONX,DOHMH,Department of Health and Mental Hygiene
4,Noise - Residential,QUEENS,NYPD,New York City Police Department


In [19]:
complaints.groupby(by = 'borough').size()

borough
BRONX            188
BROOKLYN         367
MANHATTAN        196
QUEENS           192
STATEN ISLAND     28
Unspecified       29
dtype: int64

In [20]:
complaints[complaints['borough'] =='BROOKLYN'].sort_values('complaint_type')[:10]

Unnamed: 0,complaint_type,borough,agency,agency_name
277,APPLIANCE,BROOKLYN,HPD,Department of Housing Preservation and Develop...
786,APPLIANCE,BROOKLYN,HPD,Department of Housing Preservation and Develop...
560,Air Quality,BROOKLYN,DEP,Department of Environmental Protection
69,Animal Abuse,BROOKLYN,NYPD,New York City Police Department
253,Animal Abuse,BROOKLYN,NYPD,New York City Police Department
287,Animal Abuse,BROOKLYN,NYPD,New York City Police Department
809,Animal Abuse,BROOKLYN,NYPD,New York City Police Department
502,Animal in a Park,BROOKLYN,DPR,Department of Parks and Recreation
557,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
540,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department


In [21]:
BK_COMPLAIN = complaints[complaints['borough'] == 'BROOKLYN']['complaint_type'].value_counts()

In [22]:
plt.figure(figsize = (7, 5))
plt.bar(BK_COMPLAIN.index[:6], BK_COMPLAIN[:6])

<IPython.core.display.Javascript object>

<Container object of 6 artists>

In [23]:
plt.tick_params(labelrotation = 20)

In [24]:
plt.figure(figsize = (9, 7))
bars = plt.barh(BK_COMPLAIN.index[:5], BK_COMPLAIN[:5])
plt.title("Top 5 311 Complaints in Brooklyn", loc = 'left', fontsize = 12)

<IPython.core.display.Javascript object>

Text(0,1,'Top 5 311 Complaints in Brooklyn')

In [25]:
labels = BK_COMPLAIN.index

In [26]:
for i in labels[:6]:
    print(i)

HEAT/HOT WATER
Damaged Tree
Noise - Residential
Blocked Driveway
Illegal Parking
Traffic Signal Condition


In [27]:
for i in range(5):
    label = labels[i]
    plt.gca().text(2, i, label, color = 'w', fontsize = 10)

In [28]:
plt.tick_params(top = 'off', bottom = 'off', left = 'off', right = 'off', labelleft='off', labelbottom='off')

In [29]:
for spine in plt.gca().spines.values():
    spine.set_visible(False)

### Titanic Manipulation

In [30]:
titanic.head()

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [31]:
titanic[titanic.pclass == 3][:5]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


In [32]:
titanic.sample(frac=0.1)[:5]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
553,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22.0,0,0,2620,7.225,,C
90,0,3,"Christmann, Mr. Emil",male,29.0,0,0,343276,8.05,,S
170,0,1,"Van der hoef, Mr. Wyckoff",male,61.0,0,0,111240,33.5,B19,S
267,1,3,"Persson, Mr. Ernst Ulrik",male,25.0,1,0,347083,7.775,,S
59,0,3,"Goodwin, Master. William Frederick",male,11.0,5,2,CA 2144,46.9,,S


In [33]:
titanic.iloc[4:10]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
8,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [34]:
titanic.nlargest(10, 'age')

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
630,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
851,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S
96,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
493,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C
116,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
672,0,2,"Mitchell, Mr. Henry Michael",male,70.0,0,0,C.A. 24580,10.5,,S
745,0,1,"Crosby, Capt. Edward Gifford",male,70.0,1,1,WE/P 5735,71.0,B22,S
33,0,2,"Wheadon, Mr. Edward H",male,66.0,0,0,C.A. 24579,10.5,,S
54,0,1,"Ostby, Mr. Engelhart Cornelius",male,65.0,0,1,113509,61.9792,B30,C
280,0,3,"Duane, Mr. Frank",male,65.0,0,0,336439,7.75,,Q


In [35]:
titanic.nsmallest(10, 'age')

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
803,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
755,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
469,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C
644,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
78,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29.0,,S
831,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S
305,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
164,0,3,"Panula, Master. Eino Viljami",male,1.0,4,1,3101295,39.6875,,S
172,1,3,"Johnson, Miss. Eleanor Ileen",female,1.0,1,1,347742,11.1333,,S
183,1,2,"Becker, Master. Richard F",male,1.0,2,1,230136,39.0,F4,S


In [36]:
gender = titanic[['survived', 'sex']]

In [37]:
gender[gender['survived'] == 0].groupby('sex').size()

sex
female     81
male      468
dtype: int64

In [38]:
gender[gender['survived'] == 1].groupby('sex').size()

sex
female    233
male      109
dtype: int64