# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

1. In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
2. In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

# Week 1
## 1.) Business Problem

People who rely on public transportation will often go to places that are within walking distance from busses or trains. They care not only about accessibility to and from certain areas and neighborhoods but also what businesses are accessible. My goal is to look to see what types of restaurants are within walking distance from the Port Authority bus stops in Pittsburgh, PA. Not only will I see what restaurants are within walking distance from bus stops, but we will see what bus routes have have certain cuisines (i.e. what neighborhoods have what types of cuisines). I can also examine which neighborhoods are most accessible through public transportation. I will examine which neighborhoods have the most bus routes passing through them. To do this, we just need to count how many routes run through a given neighborhood and will rank the neighborhoods accordingly. 

## 2.) Data

Regarding the data, I will require the API from Foursquare to look at the various places given different locations. 

From the API, I will use the explore endpoint to search near the various bus stops. I will record the various businesses and if the business is already listed from a previous bus stop, then I will include all stops that are close to that particular bus stop. Below is an example of using the Foursquare API for a 2 kilometer area in a neighborhood in Pittsburgh.

### Foursquare API example data

In [19]:
import json, requests
import pandas as pd

In [7]:
CLIENT_ID = 'UZNFARAB5RPK3VRKCEPIBJIU0YUBE4EMDMUOWL3FK2OBNLK4' # your Foursquare ID
CLIENT_SECRET = 'R3KLBEUMMU5A3FTOX4WW1SRJHKED1SQXGMSYTFXHFJ1PEXRE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

LIMIT=100
radius = 2000

lat, lng = 40.442540, -79.914514

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
results = requests.get(url).json()

Your credentails:
CLIENT_ID: UZNFARAB5RPK3VRKCEPIBJIU0YUBE4EMDMUOWL3FK2OBNLK4
CLIENT_SECRET:R3KLBEUMMU5A3FTOX4WW1SRJHKED1SQXGMSYTFXHFJ1PEXRE


In [16]:
results = results['response']['groups'][0]['items']

In [25]:
lst = []
for v in results:
    lst.append([(v['venue']['name'],v['venue']['categories'][0]['name'],
                 v['venue']['location']['lat'],v['venue']['location']['lng'])])
               
nearby_venues = pd.DataFrame([item for venue_list in lst for item in venue_list], 
                             columns=['Venue','Category','Venue Latitude','Venue Longitude'])

In [26]:
nearby_venues

Unnamed: 0,Venue,Category,Venue Latitude,Venue Longitude
0,Five Points Artisan Bakeshop,Bakery,40.444477,-79.916964
1,CAFE 33,Asian Restaurant,40.437643,-79.919236
2,Hidden Harbor,Cocktail Bar,40.437673,-79.919409
3,Everyday Noodles 天天見麵,Noodle House,40.438159,-79.920124
4,Point Brugge Café,New American Restaurant,40.450181,-79.913901
5,The Manor,Indie Movie Theater,40.437247,-79.922673
6,Independent Brewing Co,Bistro,40.437791,-79.919299
7,Rita's Italian Ice & Frozen Custard,Ice Cream Shop,40.438030,-79.919771
8,Rose Tea Cafe,Taiwanese Restaurant,40.437924,-79.920080
9,Commonplace Coffee Co.,Coffee Shop,40.438217,-79.921907


### Bus stop data

The other data necessary to have is the data for each bus stop. For this, we take the data for the Port Authority bus stops. This data is found at https://data.wprdc.org/dataset/port-authority-of-allegheny-county-transit-stops. Unfortunately, this data is not in a CSV format. It instead is in a DBF format. It's necessary to transform it into a CSV data format. To do so, we must import the package csv and dbfread. From there, we can output the file into a CSV and read it in. 

In [29]:
import csv
from dbfread import DBF

def dbf_to_csv(dbf_table_pth):#Input a dbf, output a csv, same name, same path, except extension
    csv_fn = dbf_table_pth[:-4]+ ".csv" #Set the csv file name
    table = DBF(dbf_table_pth)# table variable is a DBF object
    with open(csv_fn, 'w', newline = '') as f:# create a csv file, fill it with dbf content
        writer = csv.writer(f)
        writer.writerow(table.field_names)# write the column name
        for record in table:# write the rows
            writer.writerow(list(record.values()))
    return csv_fn# return the csv name

dbf_path = 'PAAC_Stops_1611.dbf'
new_csv = dbf_to_csv(dbf_path)

stops_df = pd.read_csv(new_csv)
stops_df.head(10)

Unnamed: 0,InternalID,Name,ExternalID,Direction,Lat,Long,Time_Point,NewZone,No_Rts_Ser,Routes_161,Mode,Public_She,Public_Sto
0,S00010,10TH AVE AT ANN ST,11652.0,Inbound,40.406334,-79.908116,No,1A,1.0,53L,Bus,No Shelter,Bus Stop
1,N71237,10TH AVE AT GARFIELD ST,20654.0,Outbound,40.606766,-79.753111,No,2,1.0,P10,Bus,No Shelter,Bus Stop
2,N71239,10TH AVE AT ORMOND ST,20656.0,Outbound,40.608314,-79.7505,No,2,1.0,P10,Bus,No Shelter,Bus Stop
3,N71238,10TH AVE AT SUMMIT ST,20655.0,Outbound,40.607561,-79.75176,No,2,1.0,P10,Bus,No Shelter,Bus Stop
4,S00080,12TH AVE AT AMITY ST,11650.0,Inbound,40.404325,-79.908139,No,1A,2.0,"53, 53L",Bus,No Shelter,Bus Stop
5,S00090,12TH AVE AT AMITY ST FS,19384.0,Outbound,40.404339,-79.908339,No,1A,2.0,"53, 53L",Bus,No Shelter,Bus Stop
6,E70002,16TH ST AT PENN AVE,18641.0,Inbound,40.448759,-79.987314,No,1,1.0,54,Bus,No Shelter,Bus Stop
7,N71245,1ST AVE AT BOYD ST,20649.0,Inbound,40.599348,-79.75546,No,2,1.0,P10,Bus,No Shelter,Bus Stop
8,N71244,1ST AVE AT LOCK ST,20648.0,Inbound,40.600002,-79.752677,No,2,1.0,P10,Bus,No Shelter,Bus Stop
9,S00530,22ND AVE AT MAIN ST,11036.0,Outbound,40.397642,-79.901809,No,1A,2.0,"53, 53L",Bus,No Shelter,Bus Stop


In [31]:
stops_data_dict = pd.read_csv('stopsdatadictionary.csv')
stops_data_dict

Unnamed: 0,Field,Description
0,InternalID,ID
1,Name,Stop Name
2,ExternalID,External ID
3,Direction,Direction
4,Lat,Latitude (y Coordinate)
5,Long,Longitude (x coordinate)
6,Time_Point,Is this stop a time point?
7,NewZone,New Fare Zone
8,No_Rts_Ser,Routes served by this stop
9,Routes_161,Routes


### Neighborhood Boundaries

The neighborhood boundaries can be found in in the 'Neighborhoods_with_SNAP_Data.geojson' found on the website:
https://data.wprdc.org/dataset/neighborhoods-with-snap-data