# Opening a College
### For the city of Toronto

This is a hypothetical situation where someone has decided to open up a college in Toronto. In this situation, the management has decided to hire Data Scientists in order to figure out the best place to open up a college in Toronto. The team of Data Scientists has to consider various factors around any neighborhood to decide the best place to open up a College.

### Description of the School

This is a type of school centred around complete self-dependence. This College provides all the essentials that a student needs such as Stationary, General Store, Gymnasium, Sports Facilities, Hostel, Playing Fields, a mess etc. 

The factors that I would consider while opening up the College are:
* The Number of Colleges in the Neighborhood. (The less the better.) - A dominant factor.
* The Number of Hotels, Restraunts and Bars. (The less the better.) - A dominant factor.
* The Transport services (Airports, Bus stations, Train Stations etc.) - The more the better.
* The Number of Active Entertainment Areas (Sports stadiums, Basketball Stadium etc.)
* The Number of Passive Entertainment Areas (Shopping Malls, Movie Theatres etc.) - The less the better.
* The Number of Hard Drinks shops. (The less the better.) - A very dominant factor

# Data for the Problem
### Foursquare Locations Data

Clearly, this problem can be solved solely by the foursquare data. The Foursquare API can easily provide this data by exploring any given location in Toronto. By classifying all the venues into the categories described above we can clearly determine how many venues of each type are present in any neighborhood.

The last thing we need to determine is the objective function to minimise in this situation, clearly not all of the venues have the equal amount of weightage or similar impact in determining the type of location. Hence, we would need to define an objective function that properly attaches weights to all of these parameters. But all of the data we require can be easily obtained from Foursquare.

# Code

### Necessary Packages for the Problem

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import folium 
import numpy as np

### Getting the Data Frame containing the Postal Code, Borough and Neighborhood from Wikipedia

In [2]:
URL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'class' : 'wikitable sortable'})
rows = table.find_all('tr')
columns = ['Postal Code', 'Borough', 'Neighborhood']
df = pd.DataFrame(columns = columns)
for i in range(1, len(rows)):
    row = rows[i].text.split('\n')[1:6:2]
    df.loc[i] = row
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Removing all entries with 'Not Assigned' Borough

In [3]:
df_processed = df[df['Borough'] != 'Not assigned']
df_processed = df_processed.reset_index()
df_processed = df_processed.drop(columns = ['index'])
df_processed.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


### Reading a CSV file for Longitudes and Latitudes

In [4]:
df2 = pd.read_csv('https://cocl.us/Geospatial_data')
df_processed2 = df_processed.set_index('Postal Code')
df2 = df2.set_index('Postal Code')
df_processed2['Latitude'] = df2['Latitude']
df_processed2['Longitude'] = df2['Longitude']
df_processed2 = df_processed2.reset_index()
df_processed2.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Authentication Information

In [5]:
CLIENT_ID = 'QG11RSZ4WFZK203ONXXRESILIC1EUAWKRA4GTF25VZBH3WPG' # your Foursquare ID
CLIENT_SECRET = 'I3X0AQJGB32S0YS0HF4FGH5R2LI2RLTKL4WLQWOEWDK3DP53' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
radius = 500
latitude = 43.753259
longitude = -79.329656

### Function for getting the nearby venues from Foursquare API

In [6]:
def getNearbyVenues(names, latitudes, longitudes, postal_codes, boroughs, radius=500):
    
    venues_list=[]
    for name, lat, lng, postal_code, borough in zip(names, latitudes, longitudes, postal_codes, boroughs):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = None
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
            print(name)
        except :
            pass
        
        venues_list.append([(
            postal_code,
            name,
            borough,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code','Neighborhood', 'Borough', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
toronto_venues = getNearbyVenues(df_processed2['Neighborhood'], df_processed2['Latitude'], df_processed2['Longitude'], df_processed2['Postal Code'], df_processed2['Borough'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [8]:
toronto_venues.head(16)

Unnamed: 0,Postal Code,Neighborhood,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,Parkwoods,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,Parkwoods,North York,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M3A,Parkwoods,North York,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,M4A,Victoria Village,North York,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,M4A,Victoria Village,North York,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
5,M4A,Victoria Village,North York,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
6,M4A,Victoria Village,North York,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
7,M4A,Victoria Village,North York,43.725882,-79.315572,Pizza Nova,43.725824,-79.31286,Pizza Place
8,M5A,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
9,M5A,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop


Now that we have the final Data Frame we need, we need to determine an objective function in order to calculate a score of all of these neighborhoods.

First we need to classify all the venue categories according to the 6 categories mentioned in the beginning. The 6 categories are Colleges, Hotels, Transport, AE (Active Entertainment Areas), PE (Passive Entertainment Areas) and bars (hard drinks shops). 
For that let us first get the list of all Venue Categories from Foursquare.

In [9]:
toronto_venues['Venue Category'].unique()

array(['Park', 'Food & Drink Shop', 'Construction & Landscaping',
       'Hockey Arena', 'Portuguese Restaurant', 'Coffee Shop',
       'French Restaurant', 'Pizza Place', 'Bakery',
       'Distribution Center', 'Spa', 'Restaurant', 'Pub',
       'Breakfast Spot', 'Gym / Fitness Center', 'Historic Site',
       'Farmers Market', 'Dessert Shop', 'Performing Arts Venue',
       'Chocolate Shop', 'Café', 'Mexican Restaurant', 'Yoga Studio',
       'Theater', 'Event Space', 'Ice Cream Shop', 'Shoe Store',
       'Electronics Store', 'Art Gallery', 'Cosmetics Shop', 'Bank',
       'Beer Store', 'Wine Shop', 'Antique Shop', 'Boutique',
       'Furniture / Home Store', 'Vietnamese Restaurant',
       'Clothing Store', 'Accessories Store', "Women's Store",
       'Miscellaneous Shop', 'Italian Restaurant', 'Creperie',
       'Arts & Crafts Store', 'Beer Bar', 'Sushi Restaurant',
       'Burrito Place', 'Japanese Restaurant', 'Hobby Shop', 'Diner',
       'Fried Chicken Joint', 'Smoothie Shop',

### Now we classify all the venue types into 6 categories

In [10]:
hotels = ['Food & Drink Shop', 'Coffee Shop', 'Pizza Place', 'Portuguese Restaurant', 'Asian Restaurant', 
          'Bakery', 'Beer Store', 'Breakfast Spot', 'Café', 'Chocolate Shop','Dessert Shop', 
          'French Restaurant', 'Ice Cream Shop', 'Mexican Restaurant', 'Restaurant', 'Creperie',
          'Vietnamese Restaurant', 'Burrito Place', 'Diner','Fried Chicken Joint', 'Italian Restaurant',
          'Sandwich Place', 'Smoothie Shop','Sushi Restaurant', 'Fast Food Restaurant','Caribbean Restaurant', 
          'Japanese Restaurant', 'Bubble Tea Shop', 'Burger Joint', 'Chinese Restaurant', 
          'Ethiopian Restaurant', 'Hotel','Middle Eastern Restaurant','Modern European Restaurant', 
          'New American Restaurant', 'Ramen Restaurant', 'Seafood Restaurant', 'Steakhouse', 'Tea Room',
          'Thai Restaurant', 'Dim Sum Restaurant','American Restaurant', 'BBQ Joint', 'Belgian Restaurant',
          'Comfort Food Restaurant', 'Food Truck', 'German Restaurant', 'Moroccan Restaurant', 'Poke Place',
          'Vegetarian / Vegan Restaurant', 'Bagel Shop','Cheese Shop', 'Eastern European Restaurant',
          'Fish Market', 'Gourmet Shop', 'Greek Restaurant','Indian Restaurant', 'Korean Restaurant', 
          'Fish & Chips Shop', 'Deli / Bodega','Donut Shop', 'Falafel Restaurant', 'Salad Place','Candy Store',
          'Hakka Restaurant','Mediterranean Restaurant', 'Brazilian Restaurant','Colombian Restaurant', 
          'Cupcake Shop', 'Food Court','Gluten-free Restaurant', 'Juice Bar','Latin American Restaurant', 
          'Noodle House', 'Soup Place', 'Cuban Restaurant', 'Southern / Soul Food Restaurant',
          'Frozen Yogurt Shop', 'Fruit & Vegetable Store','Taco Place', 'Motel', 'Indonesian Restaurant', 
          'Cajun / Creole Restaurant', 'Bed & Breakfast', 'Doner Restaurant', 'Dumpling Restaurant',
          'Filipino Restaurant', 'Snack Place', 'Taiwanese Restaurant', 'Afghan Restaurant','Theme Restaurant']

AE = ['Park', 'Pool', 'Hockey Arena', 'Gym / Fitness Center', 'Historic Site', 'Performing Arts Venue', 
      'Yoga Studio', 'Athletics & Sports', 'Arts & Crafts Store', 'Gym', 'Hobby Shop', 'Music Venue', 
      'Baseball Field', 'Bookstore', 'Comic Shop', 'Lake', 'Other Great Outdoors', 'Sporting Goods Shop', 
      'Golf Course', 'Skating Rink', 'Salon / Barbershop', 'Tailor Shop', 'Field', 'Basketball Stadium',
      'Jazz Club', 'Soccer Field', 'Sports Bar', 'Playground', 'Baseball Stadium', 'Dance Studio', 
      'Climbing Gym', 'Stadium', 'Piano Bar', 'Swim School', 'Garden', 'Tennis Court','Summer Camp', 
      'Opera House', 'River', 'Martial Arts Dojo', 'Skate Park'] 

bars = ['Pub', 'Wine Shop', 'Bar', 'Beer Bar', 'Gastropub', 'Hookah Bar', 'Wine Bar', 'Cocktail Bar', 
        'Irish Pub', 'Liquor Store', 'Nightclub', 'Smoke Shop', 'Hotel Bar', 'Gay Bar', 'Sake Bar',
       'Strip Club']

PE = ['Theater', 'Shopping Mall', 'Toy / Game Store', 'Beach', 'Concert Hall', 'Movie Theater', 
      'Video Game Store', 'Indie Movie Theater', 'Gaming Cafe']

colleges = ['College Auditorium' , 'College Rec Center', 'College Stadium','College Arts Building', 
            'College Gym']

transport = ['Bus Stop', 'Rental Car Location', 'Bus Station', 'Metro Station', 'Train Station', 'Airport',
            'Bus Line', 'Light Rail Station']

### Creating the Data Frame with the count of each type of Categorical Venues

In [11]:
venues_count = toronto_venues.copy()

venues_count['Category'] = [np.nan]*len(venues_count)

my_dict = {'Hotel or Restaurant' : 'Hotels', 'Active Entertainment Site' : 'AE', 'College Site' : 'Colleges',
           'Passive Entertainment Site' : 'PE', 'Bars' : 'Bars', 'Transport Facility' : 'Transport'}

for i in range(0, len(venues_count)):
    
    if venues_count.loc[i, 'Venue Category'] in hotels:
        venues_count.loc[i, 'Category'] = 'Hotel or Restaurant'
    elif venues_count.loc[i, 'Venue Category'] in AE:
        venues_count.loc[i, 'Category'] = 'Active Entertainment Site'
    elif venues_count.loc[i, 'Venue Category'] in PE:
        venues_count.loc[i, 'Category'] = 'Passive Entertainment Site'
    elif venues_count.loc[i, 'Venue Category'] in bars:
        venues_count.loc[i, 'Category'] = 'Bars'
    elif venues_count.loc[i, 'Venue Category'] in colleges:
        venues_count.loc[i, 'Category'] = 'College Site'
    elif venues_count.loc[i, 'Venue Category'] in transport:
        venues_count.loc[i, 'Category'] = 'Transport Facility'

venues_count = venues_count.dropna()      
venues_count['Count'] = [1]*len(venues_count)
venues_count = venues_count[['Postal Code', 'Category', 'Count']]
venues_count = venues_count.groupby(['Postal Code', 'Category']).sum().reset_index()

venue_counts = pd.DataFrame(columns = ['Postal Code', 'Hotels', 'AE', 'PE', 'Bars', 'Transport', 'Colleges'])
postal_codes = venues_count['Postal Code'].unique()
venues_count = venues_count.set_index(['Postal Code'])
venue_counts['Postal Code'] = postal_codes
venue_counts[['Hotels', 'AE', 'PE', 'Bars', 'Transport', 'Colleges']] = 0
venue_counts = venue_counts.set_index(['Postal Code'])
for i in range(0, len(postal_codes)):
    postal_code = postal_codes[i]
    if type(venues_count.loc[postal_code, 'Category']) == str:
        venue_counts.loc[postal_code, my_dict[venues_count.loc[postal_code, 'Category']]] = venues_count.loc[postal_code, 'Count']
    else:
        for j,k in zip(list(venues_count.loc[postal_code , 'Category']), list(venues_count.loc[postal_code , 'Count'])):
            venue_counts.loc[postal_code, my_dict[j]] = k
venue_counts = venue_counts.reset_index()
venue_counts = pd.merge(df_processed2, venue_counts, on = 'Postal Code', how = 'inner')
venue_counts.head(16)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Hotels,AE,PE,Bars,Transport,Colleges
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,1,0,0,0,0
1,M4A,North York,Victoria Village,43.725882,-79.315572,4,1,0,0,0,0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,22,7,2,4,0,0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2,0,0,0,0,0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,21,5,1,2,0,1
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,1,0,0,0,0,0
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,1,0,0,0,0,0
7,M3B,North York,Don Mills,43.745906,-79.352188,3,1,0,0,0,0
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,3,2,0,1,0,0
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,51,10,4,4,0,1


### The Objective Function
Remember that in the objective function, the positive and negative weights are to be assigned to each category according to its impact. Moreover, it should be noted that magnitude of weights shall be decided by how dominant any category is in deciding the location for building a college.

Objective Function = 10 x AE + 20 x Transport - 5 x PE - Hotels - 60 x Bars - Colleges

### Applying the Objective Function to the Data and Getting the top 5 Neighborhoods for College Location

In [12]:
venue_counts['Score'] = 10*venue_counts['AE'] + 20*venue_counts['Transport'] - 5*venue_counts['PE'] - 60*venue_counts['Bars'] - venue_counts['Colleges']
venue_counts.sort_values(by = 'Score', ascending = False).head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Hotels,AE,PE,Bars,Transport,Colleges,Score
44,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,3,2,0,0,4,0,100
98,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,4,6,0,0,2,0,100
72,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,10,5,0,0,0,0,50
14,M4C,East York,Woodbine Heights,43.695344,-79.318389,1,3,0,0,1,0,50
60,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,2,0,0,1,0,40


### Getting the Top 10 Neighborhoods for College Location

In [13]:
df_final = venue_counts.sort_values(by = 'Score', ascending = False).head(10)[['Postal Code','Neighborhood', 'Borough', 'Latitude', 'Longitude', 'Score']]
df_final = df_final.reset_index()
df_final = df_final.drop(columns = ['index'])
df_final

Unnamed: 0,Postal Code,Neighborhood,Borough,Latitude,Longitude,Score
0,M1L,"Golden Mile, Clairlea, Oakridge",Scarborough,43.711112,-79.284577,100
1,M7Y,"Business reply mail Processing Centre, South C...",East Toronto,43.662744,-79.321558,100
2,M4R,"North Toronto West, Lawrence Park",Central Toronto,43.715383,-79.405678,50
3,M4C,Woodbine Heights,East York,43.695344,-79.318389,50
4,M4N,Lawrence Park,Central Toronto,43.72802,-79.38879,40
5,M4S,Davisville,Central Toronto,43.704324,-79.38879,35
6,M6E,Caledonia-Fairbanks,York,43.689026,-79.453512,30
7,M6C,Humewood-Cedarvale,York,43.693781,-79.428191,30
8,M2P,York Mills West,North York,43.752758,-79.400049,30
9,M3K,Downsview,North York,43.737473,-79.464763,30


### Map Visualization

In [14]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(df_final['Latitude'], df_final['Longitude'], df_final['Postal Code'], df_final['Borough'], df_final['Neighborhood'], df_final.index):
    label = folium.Popup('{} ({}): {} - Position {}'.format(bor, post, poi, cluster + 1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Conclusion

This results of this project are clear. By determining the venues around a particular location we were able to find out the best 10 areas in Toronto that shall be best for setting up a college. Moreover, we determined the heirarchical order in which these areas are better. All of this is possible via the objective function that we have calculated. This objective function helped us determine this result.