# Playground Analysis
*Kiana Kazemi*

Our goal is to find neighbourhood with highest kid to number of playgrounds ratio. This ensures very busy parks and highlights the lack of playgrounds within those neighborhoods.

First, we will load our data. Due to the size of this dataset and some formatting that was interfering with the code, unnessecary rows were deleted. The edited csv was then named "playgrounds":

In [53]:
# load pandas
import pandas as pd
import numpy as np

# load data
playgrounds = pd.read_csv("Playgrounds-Edited.csv")

Now, we will look at the basic statistics:

In [54]:
playgrounds['name'].value_counts()

Able Long Playground             1
North Ave. Playground            1
Robert E. WIlliams Playground    1
Riverview Playground             1
Rhododendron Playground          1
                                ..
Farmhouse Playground             1
Eric Guy Kelly Playground        1
Esplen Playground                1
Enright Playground               1
Spring Garden Ave Playground     1
Name: name, Length: 125, dtype: int64

In [55]:
playgrounds['neighborhood'].value_counts()

Squirrel Hill South    8
Beechview              5
South Side Slopes      5
Highland Park          4
Sheraden               4
                      ..
Esplen                 1
Fairywood              1
Regent Square          1
Allentown              1
East Allegheny         1
Name: neighborhood, Length: 68, dtype: int64

We find there are 125 playgrounds and 68 neighborhoods in this dataset
Our goal is to now find the number of playgrounds within each neighborhood:

In [56]:
#create a new dataframe that finds the number of playgrounds per neighbourhood
playfreq = playgrounds.groupby('neighborhood')['name'].nunique().reset_index(name = 'playgrounds')

#create a descending list based on the number of playgrounds per neighborhood
playfreq.sort_values(by = 'playgrounds', ascending = False)

Unnamed: 0,neighborhood,playgrounds
56,Squirrel Hill South,8
4,Beechview,5
52,South Side Slopes,5
0,Allegheny Center,4
5,Beltzhoover,4
...,...,...
41,New Homestead,1
42,Oakwood,1
43,Perry North,1
44,Perry South,1


Now let's take a quick look at the top 10 neighborhoods with the most playgrounds:

In [57]:
playfreq.sort_values(by = 'playgrounds', ascending = False).head(10)

Unnamed: 0,neighborhood,playgrounds
56,Squirrel Hill South,8
4,Beechview,5
52,South Side Slopes,5
0,Allegheny Center,4
5,Beltzhoover,4
29,Highland Park,4
49,Sheraden,4
40,Mount Washington,3
21,Elliott,3
20,East Liberty,3


While we have now easily found the number of playgrounds per neighborhood and which neighborhoods have the most, this won't provide us with the strongest evidence. What would help further is finding the ratio of kids per playground, to understand which playgrounds would most likely be overcrowded.

To do so, we need to import a second dataset with the number of kids per playground. This data was not readily available in csv format, so I had to create and enter the data into my own data file. 

In [58]:
# load data
children = pd.read_csv("Children-Population.csv")

Here is a quick look at our data, looking at the neighborhoods with the highest population of children:

In [59]:
children.sort_values(by = 'under18', ascending = False).head(10)

Unnamed: 0,id,neighborhood,under18
12,13,Brookline,2540
14,15,Squirrel Hill South,2333
45,46,Carrick,2153
66,67,Squirrel Hill North,1569
50,51,Sheraden,1468
32,33,Highland Park,1332
0,1,Beechview,1314
11,12,Brighton Heights,1298
8,9,Homewood North,1151
65,66,Point Breeze North,1133


Next, we need to work on finding the ratios. We will do so by creating a new dataframe and dividing the values of the "under18" column in the "children" dataframe by the "playgrounds" column in the "playfreq" dataframe:

In [65]:
#find ratios
fullframe = pd.DataFrame(playgrounds['neighborhood'], playfreq['playgrounds'], children['under18'])

under18     1314 459  496  405  437  232  341  401  1151 177   ... 750  678   \
playgrounds                                                    ...             
4            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
5            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
...          ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
3            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN   
1            NaN  NaN  NaN  NaN  NaN  Na

In [61]:
#show top 5 with greatest numbers as having the worst ratios on bar chart