# Capstone Project - Toronto Subway Analysis
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project, we will analyze the **Subway Stations of Toronto**, Canada on the basis of **Venues** around them. Toronto is Canada’s largest city and a world leader in such areas as business, finance, technology, entertainment and culture. With a recorded population of 2,731,571 in 2016, it is the most populous city in Canada and the fourth most populous city in North America. The Toronto subway is a rapid transit system serving Toronto and the neighbouring city of Vaughan in Ontario, Canada, operated by the Toronto Transit Commission (TTC)

This report will be helpful to people who are new to Toronto - Immigrants, people moved from different cities.

The Toronto subway has 75 stations. As there a lot of venues around any subway station and some stations are at short distance from each other, we will focus on top 100 venues within a radius of 300 meters.

Any resident in Toronto uses subway frequently and people who are new to the city don't know what kind of places they can't expect or find around a subway station. Moreover, such information is not readily avaiable.

When we consider all these, we can use data science tools and machine learning algorithms to create a map and  chart where all subway station is clustered according to the venues around them.

## Data <a name="data"></a>

Based on definition of our problem, we need following dataset:
* list of all subway stations in Toronto - We will get this data by Webscraping the following **Wikipedia** page, https://en.wikipedia.org/wiki/List_of_Toronto_subway_stations 
* Geographic coordinates of all subway stations - we will get this data using **Nominatim API**
* Popular venues arounds stations - We will get this data using **FourSquare API**

### Subway Stations

Let's get latitude & longitude coordinates of all subway stations

In [577]:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_Toronto_subway_stations'
df_list = pd.read_html(url)
df=df_list[1]
Subwaystn=df[['Station']]+ ', Toronto'
Subwaystn.head()

Unnamed: 0,Station
0,"Finch, Toronto"
1,"North York Centre, Toronto"
2,"Sheppard–Yonge, Toronto"
3,"York Mills, Toronto"
4,"Lawrence, Toronto"


In [488]:
# We know Highway 407 Subway is in Vaughan,not in Toronto so we will update Station name
Subwaystn.loc[Subwaystn['Station'] == 'Highway 407, Toronto'] = 'Highway 407, Vaughan'

In [489]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [490]:
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="ttc_subway")
# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
# 2- - create location column
Subwaystn['location'] = Subwaystn['Station'].apply(geocode)
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
Subwaystn['point'] = Subwaystn['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
Subwaystn[['latitude', 'longitude', 'altitude']] = pd.DataFrame(Subwaystn['point'].tolist(), index=df.index)
Subwaystn.head()

Unnamed: 0,Station,location,point,latitude,longitude,altitude
0,"Finch, Toronto","(Finch, Yonge Street, Willowdale, North York, ...","(43.7812974, -79.4158993, 0.0)",43.781297,-79.415899,0.0
1,"North York Centre, Toronto","(North York Centre, Yonge Street, Willowdale, ...","(43.7686787, -79.4126298, 0.0)",43.768679,-79.41263,0.0
2,"Sheppard–Yonge, Toronto","(Sheppard-Yonge, 20, Sheppard Avenue West, Wil...","(43.7614518, -79.4109148, 0.0)",43.761452,-79.410915,0.0
3,"York Mills, Toronto","(York Mills, Wilson Avenue, St. John, Don Vall...","(43.7440391, -79.406657, 0.0)",43.744039,-79.406657,0.0
4,"Lawrence, Toronto","(Lawrence, Wanless Avenue, Lawrence Park, Don ...","(43.7263483, -79.4024743, 0.0)",43.726348,-79.402474,0.0


In [491]:
stn_geo=Subwaystn.drop(['location','point','altitude'],axis=1)
stn_geo.head()

Unnamed: 0,Station,latitude,longitude
0,"Finch, Toronto",43.781297,-79.415899
1,"North York Centre, Toronto",43.768679,-79.41263
2,"Sheppard–Yonge, Toronto",43.761452,-79.410915
3,"York Mills, Toronto",43.744039,-79.406657
4,"Lawrence, Toronto",43.726348,-79.402474


In [492]:
stn_geo.count()

Station      75
latitude     75
longitude    75
dtype: int64

In [493]:
address = 'Toronto, Canada'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [494]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library


Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [578]:
# create map of Toronto using latitude and longitude values
map_ttc = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(stn_geo['latitude'], stn_geo['longitude'], stn_geo['Station']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_ttc)  
    
map_ttc

In [496]:
#FourSquare Credentails
# Removed FourSquare Credentails
VERSION ='20210101 ' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

## Let's explore Venues around Subway Stations

In [579]:
#Function to get 100 venues nearby Subways in 300 meters
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Subway Station', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [580]:
Subway_venues = getNearbyVenues(names=stn_geo['Station'],
                                   latitudes=stn_geo['latitude'],
                                   longitudes=stn_geo['longitude']
                                  )

Finch, Toronto
North York Centre, Toronto
Sheppard–Yonge, Toronto
York Mills, Toronto
Lawrence, Toronto
Eglinton, Toronto
Davisville, Toronto
St. Clair, Toronto
Summerhill, Toronto
Rosedale, Toronto
Bloor–Yonge, Toronto
Wellesley, Toronto
College, Toronto
Dundas, Toronto
Queen, Toronto
King, Toronto
Union, Toronto
St. Andrew, Toronto
Osgoode, Toronto
St. Patrick, Toronto
Queen's Park, Toronto
Museum, Toronto
St. George, Toronto
Spadina, Toronto
Dupont, Toronto
St. Clair West, Toronto
Eglinton West, Toronto
Glencairn, Toronto
Lawrence West, Toronto
Yorkdale, Toronto
Wilson, Toronto
Sheppard West, Toronto
Downsview Park, Toronto
Finch West, Toronto
York University, Toronto
Pioneer Village, Toronto
Highway 407, Vaughan
Vaughan Metropolitan Centre, Toronto
Kipling, Toronto
Islington, Toronto
Royal York, Toronto
Old Mill, Toronto
Jane, Toronto
Runnymede, Toronto
High Park, Toronto
Keele, Toronto
Dundas West, Toronto
Lansdowne, Toronto
Dufferin, Toronto
Ossington, Toronto
Christie, Toronto
B

In [581]:
print(Subway_venues.shape)
Subway_venues.head()

(1756, 7)


Unnamed: 0,Subway Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Finch, Toronto",43.781297,-79.415899,Pyung Won House,43.779386,-79.415511,Korean Restaurant
1,"Finch, Toronto",43.781297,-79.415899,Burrito Place,43.781258,-79.415801,Burrito Place
2,"Finch, Toronto",43.781297,-79.415899,Jungsoo Nae 정수네,43.78359,-79.41648,Korean Restaurant
3,"Finch, Toronto",43.781297,-79.415899,Toronto a la Cart - Korean,43.780563,-79.415801,Food Stand
4,"Finch, Toronto",43.781297,-79.415899,Huh Ga Ne 허가네,43.779362,-79.417366,Korean Restaurant


In [582]:
Subway_venues.groupby('Subway Station').count()

Unnamed: 0_level_0,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Subway Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst, Toronto",38,38,38,38,38,38
"Bay, Toronto",88,88,88,88,88,88
"Bayview, Toronto",22,22,22,22,22,22
"Bessarion, Toronto",8,8,8,8,8,8
"Bloor–Yonge, Toronto",23,23,23,23,23,23
...,...,...,...,...,...,...
"Wilson, Toronto",8,8,8,8,8,8
"Woodbine, Toronto",22,22,22,22,22,22
"York Mills, Toronto",9,9,9,9,9,9
"York University, Toronto",18,18,18,18,18,18


In [583]:
Subway_venues['Venue Category'].unique()

array(['Korean Restaurant', 'Burrito Place', 'Food Stand',
       'Smoothie Shop', 'Café', 'Coffee Shop', 'Bank', 'Sandwich Place',
       'Restaurant', 'Hot Dog Joint', 'Bubble Tea Shop', 'Gym',
       'Salad Place', 'Pharmacy', 'Food Court', 'Bus Station', 'Bus Line',
       'Japanese Restaurant', 'Ramen Restaurant', 'Steakhouse',
       'Movie Theater', 'Grocery Store', 'Plaza', 'Theater',
       'Ice Cream Shop', 'Juice Bar', 'Shopping Mall', 'Pool',
       'Sushi Restaurant', 'Arts & Crafts Store', 'Pet Store',
       'Electronics Store', 'Fast Food Restaurant', 'Pizza Place',
       'Vietnamese Restaurant', 'Discount Store', 'Hotel',
       'History Museum', 'Park', 'Burger Joint', 'Thai Restaurant',
       'Bakery', 'Poke Place', 'Poutine Place', 'Department Store',
       'Fried Chicken Joint', 'Pub', 'Bar', 'Karaoke Bar', 'Gas Station',
       'Breakfast Spot', 'Italian Restaurant', 'Hobby Shop',
       'Seafood Restaurant', 'Gastropub', 'Spa', 'Cosmetics Shop',
       'Deli /

In [584]:
print('There are {} uniques categories.'.format(len(Subway_venues['Venue Category'].unique())))

There are 244 uniques categories.


## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>