# Presentation
## Processing Data with Python
### Topics

* Obesity
* Dogs
* Playgrounds
* Crime

# Importing Libraries and Initializing Data

In [1]:
import numpy as np
import pandas as pd
obesity_pandas = pd.read_csv('Obesity.csv')
df = obesity_pandas[['City Neighborhood', '2006-2010 estimate of obesity']] 
df.info()
df.sort_values(by=['City Neighborhood'])

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 416 entries, 0 to 415
Data columns (total 2 columns):
City Neighborhood                140 non-null object
2006-2010 estimate of obesity    416 non-null float64
dtypes: float64(1), object(1)
memory usage: 6.6+ KB


Unnamed: 0,City Neighborhood,2006-2010 estimate of obesity
379,Allegheny Center,0.313581
378,Allegheny West,0.206399
66,Allentown,0.353925
364,Arlington,0.334520
365,Arlington Heights,0.473120
78,Banksville,0.239786
11,Bedford Dwellings,0.636222
76,Beechview,0.281814
72,Beechview,0.295785
373,Beltzhoover,0.479978


# Convert Categorical Data Municipality to Dummy Variables

In [2]:
df_dummies = pd.get_dummies(df['City Neighborhood'])
del df_dummies[df_dummies.columns[-1]]
df_new = pd.concat([df, df_dummies], axis=1)
del df_new['City Neighborhood']

In [3]:
x = df_new.values

# Print Correlation Matrix

In [14]:
correlation_matrix = np.corrcoef(x.T)
print(correlation_matrix)
pearsoncorr = df_new.corr(method='pearson')
pearsoncorr

[[ 1.          0.00610784 -0.04040271 ... -0.00503468  0.02902387
  -0.01362569]
 [ 0.00610784  1.         -0.00240964 ... -0.00240964 -0.00240964
  -0.00240964]
 [-0.04040271 -0.00240964  1.         ... -0.00240964 -0.00240964
  -0.00240964]
 ...
 [-0.00503468 -0.00240964 -0.00240964 ...  1.         -0.00240964
  -0.00240964]
 [ 0.02902387 -0.00240964 -0.00240964 ... -0.00240964  1.
  -0.00240964]
 [-0.01362569 -0.00240964 -0.00240964 ... -0.00240964 -0.00240964
   1.        ]]


Unnamed: 0,2006-2010 estimate of obesity,Allegheny Center,Allegheny West,Allentown,Arlington,Arlington Heights,Banksville,Bedford Dwellings,Beechview,Beltzhoover,...,Strip District,Summer Hill,Swisshelm Park,Terrace Village,Troy Hill/Herr's Island,Upper Hill,Upper Lawrenceville,West End,West oakland,Westwood
2006-2010 estimate of obesity,1.000000,0.006108,-0.040403,0.023614,0.015194,0.075338,-0.025915,0.146114,-0.006578,0.078314,...,0.030589,0.001194,-0.021733,0.230161,0.005350,0.067816,0.015269,-0.005035,0.029024,-0.013626
Allegheny Center,0.006108,1.000000,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Allegheny West,-0.040403,-0.002410,1.000000,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Allentown,0.023614,-0.002410,-0.002410,1.000000,-0.002410,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Arlington,0.015194,-0.002410,-0.002410,-0.002410,1.000000,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Arlington Heights,0.075338,-0.002410,-0.002410,-0.002410,-0.002410,1.000000,-0.002410,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Banksville,-0.025915,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,1.000000,-0.002410,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Bedford Dwellings,0.146114,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,1.000000,-0.003412,-0.002410,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410
Beechview,-0.006578,-0.003412,-0.003412,-0.003412,-0.003412,-0.003412,-0.003412,-0.003412,1.000000,-0.003412,...,-0.003412,-0.003412,-0.003412,-0.004831,-0.003412,-0.003412,-0.003412,-0.003412,-0.003412,-0.003412
Beltzhoover,0.078314,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.003412,1.000000,...,-0.002410,-0.002410,-0.002410,-0.003412,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410,-0.002410


This Correlation Matrix shows the correlation coefficients between any given neighborhood and the obesity dataset. Towns where the absolute value of the correlation coefficient have the greatest incidence of obesity. Assuming our ordered set of to

# Introducing Playgrounds

In [11]:
playgrounds_pandas = pd.read_csv('Playgrounds.csv')
df2 = playgrounds_pandas[['neighborhood']] 
df2.info()
df2.sort_values(by=['neighborhood'])

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121 entries, 0 to 120
Data columns (total 1 columns):
neighborhood    121 non-null object
dtypes: object(1)
memory usage: 1.0+ KB


Unnamed: 0,neighborhood
26,Allegheny Center
119,Allegheny Center
79,Allegheny Center
103,Allegheny Center
46,Allentown
9,Banksville
4,Bedford Dwellings
85,Beechview
0,Beechview
106,Beechview


In [15]:
df.corrwith(df2.count())

2006-2010 estimate of obesity   NaN
dtype: float64