**Since we are trying to find out the most fun neighborhood in Pittsburgh to drive in**, we want the least amount of bus stops! Who wants to stop for a bus or deal with people walking across the street (or rude bus drivers...)? Here is a dataset that is updated monthly that contains the bus stop uses throughout Pittsburgh.

My goal is to find out which neighborhood has the lowest amount of stop usage. Since this is focused on which neighborhood is just the most "fun" to drive in, I am not counting the fact that there are probably more drivers since less people use the bus. I am simply focusing on the enjoyment of driving with as little stops as possible.

First I am going to import pandas, read in, and print the data.
Doing this we can see the data in which I will determine which neighborhood has the least amount of people getting on the bus.

In [2]:
import pandas as pd
import geopandas as gpd
import fpsnippets
import operator

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)>

In [None]:
df = pd.read_csv('data/Bus_Stop_Data.csv')

After my data is printed, it must be sorted by **neighborhood**. My data is **labeled by latitude and longitude**.

In [None]:
df.head()

Unnamed: 0,clever_id,stop_id,stop_name,direction,routes_ser,latitude,longitude,mode,shelter,stop_type,datekey,time_period,route_name,serviceday,total_ons,total_offs,days,avg_ons,avg_offs
0,7858,E02110,5TH ST AT CAVIT AVE,Inbound,"69, P69",40.3858,-79.76,Bus,No Shelter,Bus Stop,201909,Pre-pandemic,69,Sat,12.0,0.0,4,3.0,0.0
1,7858,E02110,5TH ST AT CAVIT AVE,Inbound,"69, P69",40.3858,-79.76,Bus,No Shelter,Bus Stop,201909,Pre-pandemic,69,Sun,14.0,0.0,6,2.333333,0.0
2,7858,E02110,5TH ST AT CAVIT AVE,Inbound,"69, P69",40.3858,-79.76,Bus,No Shelter,Bus Stop,201909,Pre-pandemic,69,Weekday,64.0,1.0,20,3.2,0.05
3,7858,E02110,5TH ST AT CAVIT AVE,Inbound,"69, P69",40.3858,-79.76,Bus,No Shelter,Bus Stop,201909,Pre-pandemic,P69,Weekday,39.0,0.0,20,1.95,0.0
4,7858,E02110,5TH ST AT CAVIT AVE,Inbound,"69, P69",40.3858,-79.76,Bus,No Shelter,Bus Stop,202001,Pre-pandemic,69,Sat,11.0,0.0,4,2.75,0.0


Printing data and column info!

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107611 entries, 0 to 107610
Data columns (total 19 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   clever_id    107611 non-null  int64  
 1   stop_id      107611 non-null  object 
 2   stop_name    107611 non-null  object 
 3   direction    107611 non-null  object 
 4   routes_ser   107611 non-null  object 
 5   latitude     107611 non-null  float64
 6   longitude    107611 non-null  float64
 7   mode         107596 non-null  object 
 8   shelter      107596 non-null  object 
 9   stop_type    107596 non-null  object 
 10  datekey      107611 non-null  int64  
 11  time_period  107611 non-null  object 
 12  route_name   107611 non-null  object 
 13  serviceday   107611 non-null  object 
 14  total_ons    107405 non-null  float64
 15  total_offs   107404 non-null  float64
 16  days         107611 non-null  int64  
 17  avg_ons      107405 non-null  float64
 18  avg_offs     107404 non-

Separating the columns containing latitude & longitude, then printing them.

In [None]:
datax = df.get(["latitude"])
datay = df.get(["longitude"])

In [None]:
datax

Unnamed: 0,latitude
0,40.3858
1,40.3858
2,40.3858
3,40.3858
4,40.3858
...,...
107606,40.3925
107607,40.3925
107608,40.3925
107609,40.3925


In [None]:
datay

Unnamed: 0,longitude
0,-79.7600
1,-79.7600
2,-79.7600
3,-79.7600
4,-79.7600
...,...
107606,-79.8086
107607,-79.8086
107608,-79.8086
107609,-79.8086


In [None]:
#Creating a dictionary and each time a name of a neighborhood is repeated from looping through the latitude nad longitude add one.
dict1 = {}

for index, row in df.iterrows():
    try:
        x = (fpsnippets.geo_to_neighborhood(row['latitude'], row['longitude']))
        if x is not None:
            if x not in dict1:
                dict1[x]=1
            else:
                dict1[x] +=1
    except:
        print("Exception")
            
print(dict1)

{'Lincoln-Lemington-Belmar': 1349, 'Perry North': 420, 'Mt. Oliver': 112, 'Oakwood': 112, 'Spring Garden': 128, 'Troy Hill': 313, 'Homewood North': 406, 'East Hills': 540, 'Crafton Heights': 360, 'Brookline': 991, 'Overbrook': 393, 'Friendship': 288, 'East Liberty': 1914, 'Allegheny West': 44, 'Chartiers City': 72, 'Arlington Heights': 12, 'Bedford Dwellings': 132, 'Regent Square': 160, 'Point Breeze': 756, 'Homewood West': 352, 'West End': 270, 'South Shore': 911, 'West Oakland': 2046, 'California-Kirkbride': 279, 'Manchester': 376, 'Allegheny Center': 1046, 'Esplen': 315, 'East Allegheny': 723, 'Crawford-Roberts': 822, 'Polish Hill': 580, 'Central Northside': 569, 'Fineview': 156, 'Central Oakland': 760, 'Point Breeze North': 369, 'Allentown': 263, 'South Side Slopes': 660, 'Knoxville': 428, 'Middle Hill': 408, 'North Shore': 1242, 'St. Clair': 13, 'Northview Heights': 432, 'Spring Hill-City View': 432, 'Upper Hill': 264, 'Bon Air': 132, 'Bluff': 1684, 'Terrace Village': 420, 'Ridgem

Now that this data has been labeled by **neighborhood**, the totals can lead me to which is the best neighborhood to choose, according to the data. I am now going to sort them.

In [None]:
#sorting the dictionary from least to greatest.
sort = dict(sorted(dict1.items(), key=operator.itemgetter(1)))
print(sort)

{'Arlington Heights': 12, 'Ridgemont': 12, 'St. Clair': 13, 'Swisshelm Park': 24, 'New Homestead': 36, 'Allegheny West': 44, 'East Carnegie': 48, 'Chartiers City': 72, 'Arlington': 84, 'Hays': 84, 'Banksville': 96, 'Mt. Oliver': 112, 'Oakwood': 112, 'Fairywood': 120, 'Spring Garden': 128, 'Bedford Dwellings': 132, 'Bon Air': 132, 'Beechview': 132, 'Duquesne Heights': 144, 'Fineview': 156, 'Regent Square': 160, 'Chateau': 168, 'Windgap': 252, 'Westwood': 252, 'Allentown': 263, 'Upper Hill': 264, 'West End': 270, 'Glen Hazel': 270, 'California-Kirkbride': 279, 'Friendship': 288, 'Stanton Heights': 304, 'Troy Hill': 313, 'Esplen': 315, 'Homewood West': 352, 'Crafton Heights': 360, 'Summer Hill': 360, 'Point Breeze North': 369, 'Lincoln Place': 369, 'Manchester': 376, 'Overbrook': 393, 'Elliott': 404, 'Homewood North': 406, 'Middle Hill': 408, 'Perry North': 420, 'Terrace Village': 420, 'Knoxville': 428, 'Northview Heights': 432, 'Spring Hill-City View': 432, 'Carrick': 472, 'Highland Park

In [None]:
#I am considering which neighborhood to be the least to be the winner, but this can be changed.

So... according to the data, the Pittsburgh neighborhood with the least amount of people boarding the bus is **Arlington Heights**!