## Population

In [58]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib as plt

Here we have the census data from 2010 for the Pittsburgh area. This dataset has some flaws which make it difficult to use in pandas, such as commas used to show a thousand mark, turning numbers into strings. We're just going to clean up these flaws in the columns we need to use.

In [49]:
# Read the data
pgh_census_data = pd.read_csv("population-density.csv")

# Manipulating the way the data is displayed so it is easier to analyze
pgh_census_data.columns = [c.replace(' ', '_') for c in pgh_census_data.columns]
pgh_census_data.columns = [c.replace('.', '') for c in pgh_census_data.columns]
pgh_census_data.columns = [c.replace('%', 'Pct') for c in pgh_census_data.columns]
pgh_census_data.columns = [c.replace('-', '_') for c in pgh_census_data.columns]
pgh_census_data.columns = [c.replace('(2010)', '2010') for c in pgh_census_data.columns]
pgh_census_data['Pop_2010'] = pgh_census_data['Pop_2010'].str.replace(',','')

#The data is already sorted by neighborhood and does not need to be sorted further
pgh_census_data.head()

Unnamed: 0,Neighborhood,Sector_#,Pop_1940,Pop_1950,Pop_1960,Pop_1970,Pop_1980,Pop_1990,Pop_2000,Pop_2010,...,Pct_Other_2010,Pct_White_2010,Pct_2+_Races_2010,Pct_Hispanic_(of_any_race)_2010,Pct_Pop_Age_<_5_2010,Pct_Pop_Age_5_19_2010,Pct_Pop_Age_20_34_2010,Pct_Pop_Age_35_59_2010,Pct_Pop_Age_60_74_2010,Pct_Pop_Age_>_75_2010
0,Allegheny Center,3,4521,3862,2512,632,1586,1262,886,933,...,0.64%,40.84%,0.0397,0.029,0.0419,0.217,0.2757,0.2243,0.1761,0.065
1,Allegheny West,3,3210,3313,2170,1124,820,654,508,462,...,0.65%,76.62%,0.0303,0.028,0.0,0.0,0.0837,0.682,0.1255,0.1088
2,Allentown,6,8227,7487,6416,5361,4292,3600,3220,2500,...,0.80%,59.40%,0.0464,0.023,0.0366,0.1485,0.2411,0.353,0.144,0.0767
3,Arlington,7,2702,3203,4430,3949,2294,2210,1999,1869,...,0.37%,76.46%,0.0316,0.014,0.0691,0.1889,0.1945,0.3153,0.0888,0.1433
4,Arlington Heights,7,2413,2860,2272,2037,1466,1497,238,244,...,1.64%,9.43%,0.0492,0.008,0.041,0.3925,0.1638,0.3072,0.0341,0.0614


There's a lot of good data here, but there's one column in particular which is o fuse to us. We're going to be taking a look at the "Pct_Pop_Age_60_74_2010" column, whihc gives us the percent of the population in each neighborhood that is between the ages of 60 and 74 years old. These people pose an annoyance on the road due to their unbelievably slow driving habits. The less of them there are, the better. So first things first, we need to make a new column that displays the number of people in this age group in each neighborhood.

In [52]:
pgh_census_data.fillna(value=0)

#converting strings to floats so they can be multiplied
pgh_census_data = pgh_census_data.astype({"Pop_2010": float})
pgh_census_data['Boomers'] = pgh_census_data['Pop_2010'] * pgh_census_data['Pct_Pop_Age_60_74_2010']
pgh_census_data.head()

Unnamed: 0,Neighborhood,Sector_#,Pop_1940,Pop_1950,Pop_1960,Pop_1970,Pop_1980,Pop_1990,Pop_2000,Pop_2010,...,Pct_White_2010,Pct_2+_Races_2010,Pct_Hispanic_(of_any_race)_2010,Pct_Pop_Age_<_5_2010,Pct_Pop_Age_5_19_2010,Pct_Pop_Age_20_34_2010,Pct_Pop_Age_35_59_2010,Pct_Pop_Age_60_74_2010,Pct_Pop_Age_>_75_2010,Boomers
0,Allegheny Center,3,4521,3862,2512,632,1586,1262,886,933.0,...,40.84%,0.0397,0.029,0.0419,0.217,0.2757,0.2243,0.1761,0.065,164.3013
1,Allegheny West,3,3210,3313,2170,1124,820,654,508,462.0,...,76.62%,0.0303,0.028,0.0,0.0,0.0837,0.682,0.1255,0.1088,57.981
2,Allentown,6,8227,7487,6416,5361,4292,3600,3220,2500.0,...,59.40%,0.0464,0.023,0.0366,0.1485,0.2411,0.353,0.144,0.0767,360.0
3,Arlington,7,2702,3203,4430,3949,2294,2210,1999,1869.0,...,76.46%,0.0316,0.014,0.0691,0.1889,0.1945,0.3153,0.0888,0.1433,165.9672
4,Arlington Heights,7,2413,2860,2272,2037,1466,1497,238,244.0,...,9.43%,0.0492,0.008,0.041,0.3925,0.1638,0.3072,0.0341,0.0614,8.3204


People usually don't take too kindly to being represented as floats, so we're just going to turn all of our Boomers into integers so we have some nice whole numbers.

In [54]:
# Converting the Boomers column to integers because we cant have partial people
pgh_census_data = pgh_census_data.astype({"Boomers": int})
pgh_census_data.head()

Unnamed: 0,Neighborhood,Sector_#,Pop_1940,Pop_1950,Pop_1960,Pop_1970,Pop_1980,Pop_1990,Pop_2000,Pop_2010,...,Pct_White_2010,Pct_2+_Races_2010,Pct_Hispanic_(of_any_race)_2010,Pct_Pop_Age_<_5_2010,Pct_Pop_Age_5_19_2010,Pct_Pop_Age_20_34_2010,Pct_Pop_Age_35_59_2010,Pct_Pop_Age_60_74_2010,Pct_Pop_Age_>_75_2010,Boomers
0,Allegheny Center,3,4521,3862,2512,632,1586,1262,886,933.0,...,40.84%,0.0397,0.029,0.0419,0.217,0.2757,0.2243,0.1761,0.065,164
1,Allegheny West,3,3210,3313,2170,1124,820,654,508,462.0,...,76.62%,0.0303,0.028,0.0,0.0,0.0837,0.682,0.1255,0.1088,57
2,Allentown,6,8227,7487,6416,5361,4292,3600,3220,2500.0,...,59.40%,0.0464,0.023,0.0366,0.1485,0.2411,0.353,0.144,0.0767,360
3,Arlington,7,2702,3203,4430,3949,2294,2210,1999,1869.0,...,76.46%,0.0316,0.014,0.0691,0.1889,0.1945,0.3153,0.0888,0.1433,165
4,Arlington Heights,7,2413,2860,2272,2037,1466,1497,238,244.0,...,9.43%,0.0492,0.008,0.041,0.3925,0.1638,0.3072,0.0341,0.0614,8


Much better. Now all that's left to do is sort the dataset by our "Boomers" column so we can see which neighborhoods have the most and least old people

In [57]:
# Sorting the neighborhoods by the number of boomers
pgh_census_data.sort_values("Boomers", ascending=False)

Unnamed: 0,Neighborhood,Sector_#,Pop_1940,Pop_1950,Pop_1960,Pop_1970,Pop_1980,Pop_1990,Pop_2000,Pop_2010,...,Pct_White_2010,Pct_2+_Races_2010,Pct_Hispanic_(of_any_race)_2010,Pct_Pop_Age_<_5_2010,Pct_Pop_Age_5_19_2010,Pct_Pop_Age_20_34_2010,Pct_Pop_Age_35_59_2010,Pct_Pop_Age_60_74_2010,Pct_Pop_Age_>_75_2010,Boomers
13,Brookline,5,14721,16559,20381,20336,17231,15488,14318,13214.0,...,91.43%,0.0210,0.016,0.0738,0.1290,0.2557,0.3395,0.1263,0.0756,1668
76,Squirrel Hill South,10,20203,20737,18517,16669,15165,14968,14507,15110.0,...,82.03%,0.0230,0.032,0.0543,0.1180,0.2936,0.3250,0.1034,0.1058,1562
15,Carrick,5,16534,16530,16480,15855,12930,11625,10685,10113.0,...,85.99%,0.0267,0.016,0.0584,0.1525,0.1655,0.4063,0.1237,0.0935,1250
67,Shadyside,12,17680,19279,18177,15848,13945,13385,13754,13915.0,...,71.90%,0.0193,0.033,0.0452,0.0588,0.5035,0.2269,0.0849,0.0807,1181
75,Squirrel Hill North,10,10435,13009,13778,13576,12353,11471,10408,11363.0,...,75.02%,0.0253,0.041,0.0512,0.2827,0.2992,0.2078,0.0982,0.0610,1115
9,Bloomfield,12,20708,20074,16715,14411,11761,10405,9089,8442.0,...,81.57%,0.0222,0.026,0.0369,0.0865,0.4022,0.2747,0.1256,0.0740,1060
52,Mount Washington,6,20013,19060,17415,14787,11795,10700,9878,8799.0,...,85.93%,0.0226,0.016,0.0451,0.1039,0.3567,0.3010,0.1195,0.0739,1051
36,Greenfield,9,14111,13952,12984,12234,9873,8485,7832,7294.0,...,88.29%,0.0215,0.029,0.0447,0.1276,0.3079,0.3089,0.1346,0.0763,981
7,Beechview,5,10853,11994,14032,14360,11911,9311,8772,7974.0,...,80.94%,0.0265,0.056,0.0562,0.1530,0.2389,0.3318,0.1203,0.0998,959
39,Highland Park,12,8960,10239,9805,9223,8032,7029,6749,6395.0,...,66.51%,0.0266,0.029,0.0680,0.1177,0.2641,0.3864,0.1118,0.0519,714


So as the data above shows, The Top 5 neighborhoods with the most people in this age range are Brookline, Squirrel Hill South, Carrick, Shadyside, and Squirrel Hill North. These neighborhoods all have more than a thousand people in this age range, which is far too many for our ideal driving neighborhood.

On the other hand, the Top 5 neighborhoods with the least amount of people in this age range are South Shore, Chateau, Arlington Heights, St. Clair, and California - Kirkbride. Additionally, South Shore and Chateau both have no people in this age range.

This data presents a potential issue that it simply reflects the general population level of any given neighborhood. While this may be true and many people might see this as a downside, this is ideal for our metric. The less people there are on the road, the easier it is to drive. 