# Group 29: Final Project

We researched the best neighborhood in Pittsburgh with the metric of what is the safest neighborhood in Pittsburgh for kids. The three of us used fire incidents, speed humps, and playground equipment as the metric of the study. We found that from all areas, Squirrel Hill South was at or near the top for all the categories. From this, we made the conclusion that Squirrel Hill South is the safest neighborhood for kids in Pittsburgh.

## Playground Equipment

First, let's import pandas and our data file. The data file, "Playground-Equipment.csv", tells us about every piece of (recorded) playground equipment in Pittsburgh.

In [1]:
# load pandas
import pandas as pd
import numpy as np

# load data
playground_equipment = pd.read_csv("Playground-Equipment.csv")

FileNotFoundError: [Errno 2] File Playground-Equipment.csv does not exist: 'Playground-Equipment.csv'

Let's take a look at the first 10 rows of the data.

In [None]:
playground_equipment.head(10)

How many pieces of playground equipment are in each neighborhood? We can use .value_counts() for that!

In [None]:
playground_equipment["neighborhood"].value_counts()

Using the pure number of pieces of playground equipment as our metric, Squirrel Hill South is the best neighborhood. But what about other metrics, like disability accessibility and actual quality of equipment? Let's count the number of disability-accessible pieces of playground equipment in each neighborhood.

In [None]:
query_mask = playground_equipment["ada_accessible"] == "t"
disability_equipment = playground_equipment[query_mask]
disability_equipment["neighborhood"].value_counts()

Once again, Squirrel Hill South comes out on top. The only ranking change among the top 5 is Allegheny Center, which drops way down, and Elliott, which ties East Liberty, Beechview, and South Side Slopes with 4 pieces of disability-accessible playground equipment.

Lastly, let's weight each piece of playground equipment based on its quality. To judge quality, we will use the "manufacturer" column. I have ranked the manufacturers based on a quick Google image search of some of their equipment. This ranking is somewhat arbitrary, but I did my best to give an incredibly *detailed* and *thorough* analysis of each (not).

First, let's print out all the unique types of manufacturers.

In [None]:
playground_equipment["manufacturer"].unique()

Here are my totally not arbitrary rankings:
1. Kompan (WTF is this an amusement park?) ![Kompan](https://dk22sb66g7qaa.cloudfront.net/aesir-dam-viewports/castle-and-nature-playground-made-from-robinia-1366.jpeg?rel=2020-12-15+10%3A49%3A27)
2. Burke (big and well-developed) ![Burke](https://www.bciburke.com/Portals/0/adam/Products%20Slider/NrWfuy30tkiBvIXviywlwQ/Image/field-of-dreams.jpg)
3. Landscape Structures (futuristic, would definitely play on it) ![Landscape](https://www.rossrec.com/wp-content/uploads/bfi_thumb/Hedra-playsystem-1-ojo8m76h9u248v59brz47n5r3kub5gv3acjmf2z6co.jpg)
4. Playworld (big slide) ![Playworld](https://playworld.com/sites/default/files/refresh-intro-image.jpg)
5. Little Tykes (car is cheap, fuel-efficient, and environmentally friendly) ![Little Tykes](https://m.media-amazon.com/images/I/71VEtPLgBxL._AC_SL1500_.jpg)
6. Park Structures (bland color scheme, but cool slide) ![Park Structures](https://www.miracle-recreation.com/content/uploads/2018/09/MREC_2018_OH_Westfork-Park_Structure-301.jpg)
7. Miracle (fairly standard, nothing exceptional) ![Miracle](https://hasley-recreation.com/wp-content/uploads/Destination-Park-Loganville-Georgia-Playground-Miracle-1024x682.jpg)
8. Gametime (terrible color scheme, but cool slide) ![Gametime](https://www.gametime.com/images/sized/GameTime-Playground-Tower-Rendering-18861-1621263048-3ecff71e66a3f640cb051d8d5d39bc69.jpg)
9. Iron Mountain Forge (lame) ![Iron Mountain Forge](https://ww1.prweb.com/prfiles/2011/04/21/8334129/green.jpg)
10. Big Toys (lame and too much exercise) ![Big Toys](https://www.bigtoys.com/images/homepage/category-traditionalstructures-img1.jpg)

I've created a point system based upon the rankings of the slides. Rank k gets 11 - k points. For example, Rank 1 gets 10 points, Rank 5 gets 6 points, and Rank 10 gets 1 point. With this points system in mind, I've calculated the total number of points for each neighborhood in the code block below.

In [None]:
equipment_types = ["Kompan", "Burke", "Landscape Structures", "Playworld", "Little Tykes", "Park Structures", "Miracle", "Gametime", "Iron Mountain Forge", "Big Toys"]

points = {}
for i in range(10):
    type = equipment_types[i];
    query_mask = playground_equipment["manufacturer"] == type
    equipment_of_type = playground_equipment[query_mask]; 
    seen_nbhds = {}
    for nbhd in equipment_of_type["neighborhood"]: 
        if nbhd in seen_nbhds: #points already added for nbhd for this type
            continue
        seen_nbhds[nbhd] = 1;
        num_of_type = equipment_of_type["neighborhood"].value_counts().get(nbhd)
        if not num_of_type: #num_of_type is None
            continue
        addition = (10 - i) * num_of_type #Number of points to add
        if nbhd in points:
            points[nbhd] += addition
        else:
            points[nbhd] = addition
ser = pd.Series(points)
ser = ser.sort_values(ascending=False)
print(ser)

This data makes sense, as the rankings are largely the same. However, East Liberty is the sole owner of Kompan equipment, which is marked as the highest rated at 10 points each. This explains why East Liberty wins with a large margin on this sub-metric.

In [None]:
query_mask = playground_equipment["manufacturer"] == "Kompan"
kompan_equipment = playground_equipment[query_mask]
kompan_equipment["neighborhood"].value_counts()

Now, let's combine all three sub-metrics: total count, disability accessibility, and the weighted point system. To do so, let's normalize each metric (i.e. divide by the total sum) and then sum them for each neighborhood.

In [None]:
totals = {}

total_pge = playground_equipment["neighborhood"].value_counts().sum() #total pieces of playground equipment
total_de = disability_equipment["neighborhood"].value_counts().sum() #total pieces of disability equipmnet
total_wps = ser.sum() #total points in the weighted point system
for nbhd in playground_equipment["neighborhood"]:
    if nbhd in totals: #nbhd already counted
        continue
    pge_count = playground_equipment["neighborhood"].value_counts().get(nbhd) #playground equipment count
    de_count = disability_equipment["neighborhood"].value_counts().get(nbhd) #disability equipment count
    wps_count = ser.get(nbhd) #weighted point system count
    if not pge_count: 
        pge_count = 0 #handles exceptions
    if not de_count:
        de_count = 0 #handles exceptions
    if not wps_count:
        wps_count = 0 #handles exceptions
    totals[nbhd] = pge_count / total_pge + de_count / total_de + wps_count / total_wps 
totals_series = pd.Series(totals)
totals_series = totals_series.sort_values(ascending=False)
print(totals_series)

As expected, Squirrel Hill South comes out on top with the most equipment in total, the most disability-accessible equipment, and the second-highest quality rating. East Liberty is a close second, but most of its boost came from the ranking system, which is kind of arbitrary. Let's make a bar chart of the top 5 neighborhoods judged by our final normalized, combined metric.

In [None]:
# Create a bar chart using the index as the category labels
pd.Series(totals_series.head(5)).plot.bar()

And, let's take a look at the top 10.

In [None]:
# Create a bar chart using the index as the category labels
pd.Series(totals_series.head(10)).plot.bar()

With all of this data in mind, **Squirrel Hill South** is the clearly best neighborhood for playground equipment!

## Fire Incidents

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

fire_Incidents = pd.read_csv("fireIncidents.csv")

Making a query mask to filter out just the neighborhoods with Passanger Vehicle fires

In [None]:
fire_data = pd.read_csv('fireIncidents.csv')
passenger_mask = fire_data['type_description'] == "Passenger vehicle fire"
rubbish_mask = fire_data['type_description'] == "Outside rubbish, trash or waste fire"


vehicle_Fires = fire_data[passenger_mask]
rubbish_Fires = fire_data[rubbish_mask]

In [None]:
vehicle_Fires['neighborhood'].value_counts()

The five safest neighborhoods for passanger fires is Hays, Swisshelm Park, Chartiers City, Arlington Heights, and Regent Square

In [None]:
rubbish_Fires['neighborhood'].value_counts()

The five safest neighborhoods for rubbish and trash fires are Windgap, Friendship, Lower Lawrenceville, Regent Square, and Upper Hill

METRIC #2 - Number of Alarms (The lower the number of alarms, the safer the neighborhood)

In [None]:
alarm_data = pd.read_csv("fireIncidents.csv")
alarm_mask = alarm_data['alarms'] == 0
number_of_alarms = alarm_data[alarm_mask]


In [None]:
number_of_alarms['neighborhood'].value_counts().head().plot.bar()
#plt.bar(neighborhoods,number of alarms (best and worst))

In [None]:
number_of_alarms['neighborhood'].value_counts().tail().plot.bar()

The neighborhoods with the most alarms  (the safest) Glen Hazel, Oakwood, Regent Square, Arlington Heights, and East Carnegie

FINAL CONCLUSION - The safest neighborhood according to these metrics would be Squirrel Hill South. I actually used to visit it all the time and loved every bit of it. Addtionally, according to these metrics, it's the most safe for children which, I believe, means that the entire neighborhood is safe in general. 

## Speed Humps

In [None]:
import pandas as pd
# import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
speed = pd.read_csv("speed.csv")

In [None]:
speed.groupby("neighborhood").count()

In [None]:
speed["neighborhood"].value_counts()

In [None]:
speed["neighborhood"].value_counts().plot.bar()

Squirrel Hill has by far the most speed bumps.

In [None]:
speed["locator_street"].value_counts().plot.bar()

In [None]:
speed["locator_street"].value_counts()

No streets seem to have a ton more speed bumps than the rest.

After looking over the data, the negihborhoods with the most speed bumps have the roads with the most speed bumps, as you would expect. 

Based on number of speed bumps, the safest neighborhood for kids in Pittsburgh is Squirrel Hill South with the safest road in the neighborhood being Saline St.

In conclusion, I think Squirrel Hill South is the best neighborhood in Pittsburgh. It is the safest neighborhood as far as speed bumps go and also has fewer fires. There is also a large amount of playground equipment for kids.