#### *IBM Data Science Certification Capstone Project - authored by Julien Girault, data scientist scholar*
# Find your Favourite Tube Station before Moving to London!

## Project introduction

Many French people move from Paris to London to improve their English and live a new experience there.

But what is the best place they should choose for settling down?

When moving to a new city, one will likely choose their new address based on public transports: they will search the lines that can bring them quickly and directly from their new home to their office.

__"London tube" and " Paris métro" public transports are comparably big with approximately 400 stations__... So here is the question that comes next: how can one know the kind of surroundings they are likely to find around each station? How can they decide where to settle down? Which station should they prefer with so many possibilities?

This project will give French newcomers the opportunity to find the list of tube stations in London that best fit their taste, based on a comparison with something they already know: Paris' stations!

For each London tube station, we will expose a list of Paris stations that are similar.

To reach that goal, we will train an AI clustering model to find which tube stations are similar to each French station.


## Data Description

For this project, I downloaded the transports stations coordinates released by the public services for both London [1] and Paris [2].

In the main section of the project, I will use the Foursquare API with each station coordinates to explore the list of venues reachable within a five-minute walk.

It might be necessary to help manually with the matching of venues categories, for instance, to transform "pubs" into "bars", or change "bed & breakfast" to "hotel".

In [1]:
import pandas as pd
import io
import requests

#London data:
req=requests.get("https://www.whatdotheyknow.com/request/512947/response/1238210/attach/3/Stations%2020180921.csv.txt?cookie_passthrough=1").content
df_london=pd.read_csv(io.StringIO(req.decode('utf-8')))

#Paris data:
req=requests.get("https://data.iledefrance-mobilites.fr/explore/dataset/emplacement-des-gares-idf/download/?format=json&refine.mode=Metro&timezone=Europe/Berlin&lang=fr").content
df_paris=pd.read_json(io.StringIO(req.decode('utf-8')))
df_paris
df_london


# récup les coords
# cc le clustering
# voir ce que ça donne sans rien modifier

Unnamed: 0,FID,OBJECTID,NAME,EASTING,NORTHING,LINES,NETWORK,Zone,x,y
0,0,78,Temple,530959,180803,"District, Circle",London Underground,1,-0.112644,51.510474
1,1,79,Blackfriars,531694,180893,"District, Circle",London Underground,1,-0.102020,51.511114
2,2,80,Mansion House,532354,180932,"District, Circle",London Underground,1,-0.092495,51.511306
3,3,81,Cannon Street,532611,180900,"District, Circle",London Underground,1,-0.088801,51.510963
4,4,82,Monument,532912,180824,"District, Circle",London Underground,1,-0.084502,51.510209
...,...,...,...,...,...,...,...,...,...,...
474,474,381,Crystal Palace,534111,170555,,London Overground,0,-0.071128,51.417633
475,475,393,Brentwood,559339,193033,,TfL Rail,0,0.301589,51.613071
476,476,394,Shenfield,561361,194981,,TfL Rail,0,0.331671,51.630001
477,477,363,Woodgrange Park,541821,185350,,London Overground,0,0.045631,51.548716


In [2]:
save=df_paris
#df_paris=save

df_paris
#save

Unnamed: 0,datasetid,recordid,fields,geometry,record_timestamp
0,emplacement-des-gares-idf,723289fe50c959f7e63d75b17870762aa8eaddd4,"{'res_stif': 110.0, 'cod_ligf': 14.0, 'nom_iv'...","{'type': 'Point', 'coordinates': [2.3891158073...",2020-01-15T11:22:48.576+01:00
1,emplacement-des-gares-idf,2f98f2e1ee73e414cf64bae428caa96ba114be23,"{'res_stif': 110.0, 'cod_ligf': 4.0, 'nom_iv':...","{'type': 'Point', 'coordinates': [2.3209981919...",2020-01-15T11:22:48.576+01:00
2,emplacement-des-gares-idf,dafc950d65ec51317aa65aaba7a12fb5a0cfc396,"{'res_stif': 110.0, 'cod_ligf': 15.0, 'nom_iv'...","{'type': 'Point', 'coordinates': [2.2781616712...",2020-01-15T11:22:48.576+01:00
3,emplacement-des-gares-idf,5bc1c5091428bb56801455343b0cd58fca8d4179,"{'res_stif': 110.0, 'cod_ligf': 1.0, 'nom_iv':...","{'type': 'Point', 'coordinates': [2.3693205849...",2020-01-15T11:22:48.576+01:00
4,emplacement-des-gares-idf,57e3f6dfb550022b892dba19b4215ce6fad943d2,"{'res_stif': 110.0, 'cod_ligf': 12.0, 'nom_iv'...","{'type': 'Point', 'coordinates': [2.4009185381...",2020-01-15T11:22:48.576+01:00
...,...,...,...,...,...
378,emplacement-des-gares-idf,f36ebb69c7054f9269845b16b0bf74b5b0a13b55,"{'res_stif': 110.0, 'cod_ligf': 16.0, 'nom_iv'...","{'type': 'Point', 'coordinates': [2.3807187858...",2020-01-15T11:22:48.576+01:00
379,emplacement-des-gares-idf,32f66e415cedd37378d7f27c49db63c94b95d477,"{'res_stif': 110.0, 'cod_ligf': 8.0, 'nom_iv':...","{'type': 'Point', 'coordinates': [2.3046744405...",2020-01-15T11:22:48.576+01:00
380,emplacement-des-gares-idf,6ae65004ec961e1c9ec0c0f1df90993dfdbf81b0,"{'res_stif': 110.0, 'cod_ligf': 6.0, 'nom_iv':...","{'type': 'Point', 'coordinates': [2.3242560326...",2020-01-15T11:22:48.576+01:00
381,emplacement-des-gares-idf,e1305b572bd8551817c872857a8931a2b1ae2d0f,"{'res_stif': 110.0, 'cod_ligf': 4.0, 'nom_iv':...","{'type': 'Point', 'coordinates': [2.3325430273...",2020-01-15T11:22:48.576+01:00


In [3]:
for i, station in enumerate(df_paris["fields"]):
    df_paris["fields"][i]=station["nom_iv"]
for i, geom in enumerate(df_paris["geometry"]):
    df_paris["geometry"][i]=geom["coordinates"][0]
    df_paris["record_timestamp"][i]=geom["coordinates"][1]

In [4]:
df_paris.drop({"datasetid"}, 1, inplace=True)
df_paris.rename(columns={"recordid":"City","fields": "Station", "geometry": "Longitude","record_timestamp": "Latitude"}, inplace=True)
df_london.rename(columns={"FID": "City","x": "Longitude", "y": "Latitude", "NAME": "Station"}, inplace=True)
df_london.drop({'NORTHING', 'Zone', 'LINES', 'OBJECTID', 'EASTING', 'NETWORK'}, 1, inplace=True)
df_london["City"]="London"
df_paris["City"]="Paris"
df=df_london.append(df_paris[df_london.columns], ignore_index=True)

In [5]:
df

Unnamed: 0,City,Station,Longitude,Latitude
0,London,Temple,-0.112644,51.5105
1,London,Blackfriars,-0.10202,51.5111
2,London,Mansion House,-0.0924953,51.5113
3,London,Cannon Street,-0.088801,51.511
4,London,Monument,-0.0845023,51.5102
...,...,...,...,...
857,Paris,Voltaire,2.38072,48.8575
858,Paris,Wagram,2.30467,48.8838
859,Paris,Saint-Lazare,2.32426,48.8758
860,Paris,Trinité d'Estienne d'Orves,2.33254,48.8763


In [6]:
#vérifier les stations qui portent le même nom :
res=df_paris.merge(df_london, on="Station", how="inner")
res

Unnamed: 0,City_x,Station,Longitude_x,Latitude_x,City_y,Longitude_y,Latitude_y
0,Paris,Temple,2.36154,48.8667,London,-0.112644,51.510474


## Methodology

## References

[1] [London Tube Stations List (CSV)](https://www.whatdotheyknow.com/request/512947/response/1238210/attach/3/Stations%2020180921.csv.txt?cookie_passthrough=1),

[2] [Paris Transports Stations List website](https://data.iledefrance-mobilites.fr/explore/dataset/emplacement-des-gares-idf/download/?format=json&refine.mode=Metro&timezone=Europe/Berlin&lang=fr)

In [7]:
Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any
inferential statistical testing that you performed, if any, and what machine learnings were used and why.
Results section where you discuss the results.
Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
Conclusion section where you conclude the report.

SyntaxError: invalid syntax (<ipython-input-7-843cdcb9629a>, line 1)

### 1. import libraries, load Foursquare credentials and init getNearbyVenues function (borrowed from previous lab)


In [8]:
f=open("/resources/IBM Capstone Project/Coursera_Capstone/credentials.txt","r")
lines=f.readlines()
API_id=lines[4][:-1]
API_secret=lines[7][:-1]
f.close()
 

CLIENT_ID = API_id
CLIENT_SECRET = API_secret
VERSION = '20180605' # Foursquare API version
LIMIT = 200

def getNearbyVenues(stations_done_list, venues_list, names, cities, latitudes, longitudes, radius=300):
    
    for name, city, lat, lng in zip(names, cities, latitudes, longitudes):
        print(name)
        if (city+name) not in stations_done_list:
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)
                
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
            # return only relevant information for each nearby venue
            venues_list.append([(
                city, 
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
            stations_done_list.append(city+name)        

    

<a id='item2'></a>


### 2. Get top venues for each Station


In [9]:
if 'venues_list' not in globals():
    venues_list=[]
    stations_done_list=[]

In [10]:
venues_list=[]
stations_done_list=[]

In [13]:
getNearbyVenues(stations_done_list, venues_list,names=df['Station'],cities=df['City'], 
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'], radius=300
                                  )

Temple
Blackfriars
Mansion House
Cannon Street
Monument
Tower Hill
Aldgate
Liverpool Street
Moorgate
Barbican
Farringdon
King's Cross St. Pancras
Euston Square
Great Portland Street
Baker Street
Edgware Road (Circle Line)
Paddington
Bayswater
High Street Kensington
Gloucester Road
South Kensington
Victoria
Pimlico
Warren Street
Queensway
Hyde Park Corner
Knightsbridge
Leicester Square
Covent Garden
Russell Square
Earl's Court
Notting Hill Gate
Lancaster Gate
Marble Arch
Bond Street
Bank
Oxford Circus
Holborn
Chancery Lane
St. Paul's
Paddington
Embankment
Westminster
Euston
Waterloo
Green Park
Paddington (H&C Line)
Piccadilly Circus
Charing Cross
Lambeth North
Edgware Road (Bakerloo)
Marylebone
Regent's Park
Sloane Square
St. James's Park
Goodge Street
Tottenham Court Road
Borough
Old Street
Angel
Shoreditch High Street
Aldgate East
Southwark
London Bridge
Liverpool Street
Liverpool Street
TOWER GATEWAY - DLR
BANK - DLR
Euston
East Putney
Putney Bridge
Parsons Green
White City
Shepherd'

In [None]:
#stations_done_list

In [14]:
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['City', 'Station',
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

In [15]:
nearby_venues

Unnamed: 0,City,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,London,Temple,51.510474,-0.112644,Two Temple Place,51.511523,-0.112236,History Museum
1,London,Temple,51.510474,-0.112644,Temple Gardens,51.511154,-0.111472,Park
2,London,Temple,51.510474,-0.112644,The Southbank Observation Point,51.508297,-0.111180,Scenic Lookout
3,London,Temple,51.510474,-0.112644,HQS Wellington,51.510679,-0.112214,Boat or Ferry
4,London,Temple,51.510474,-0.112644,The Queen's Walk,51.508308,-0.110853,Scenic Lookout
...,...,...,...,...,...,...,...,...
17348,Paris,Trinité d'Estienne d'Orves,48.876318,2.332543,It Rocks,48.874297,2.335070,Burger Joint
17349,Paris,Créteil-Pointe du Lac,48.768715,2.464565,Stade Dominique-Duvauchelle,48.768030,2.461323,Soccer Stadium
17350,Paris,Créteil-Pointe du Lac,48.768715,2.464565,Arrêt Pointe du Lac [393],48.768706,2.463673,Bus Stop
17351,Paris,Créteil-Pointe du Lac,48.768715,2.464565,Ligue de Tennis du Val de Marne,48.767096,2.462937,Stadium


Check the size of the resulting dataframe


In [17]:
print(nearby_venues.shape)
nearby_venues.head(10)



#l = range(20945)
#nearby_venues['City'] = "Paris"

#nearby_venues.head(10)

(17353, 8)


Unnamed: 0,City,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,London,Temple,51.510474,-0.112644,Two Temple Place,51.511523,-0.112236,History Museum
1,London,Temple,51.510474,-0.112644,Temple Gardens,51.511154,-0.111472,Park
2,London,Temple,51.510474,-0.112644,The Southbank Observation Point,51.508297,-0.11118,Scenic Lookout
3,London,Temple,51.510474,-0.112644,HQS Wellington,51.510679,-0.112214,Boat or Ferry
4,London,Temple,51.510474,-0.112644,The Queen's Walk,51.508308,-0.110853,Scenic Lookout
5,London,Temple,51.510474,-0.112644,The Edgar Wallace,51.512585,-0.112819,Pub
6,London,Temple,51.510474,-0.112644,Middle Temple,51.512441,-0.111619,Building
7,London,Temple,51.510474,-0.112644,Temple Brew House,51.51294,-0.113029,Pub
8,London,Temple,51.510474,-0.112644,180 The Strand,51.512671,-0.115009,Art Gallery
9,London,Temple,51.510474,-0.112644,Inner Temple Garden,51.512594,-0.110355,Garden


In [27]:
savevenues=nearby_venues
#nearby_venues=savevenues

In [19]:
#nearby_venues.loc("Station"==df["Stations"])
#nearby_venues["Station"==df["Stations"]]

#nearby_venues.loc[True, 'City'] =
#venuesCities=nearby_venues
#venuesCities.loc(venuesCities["Station"]!='Temple', )=nearby_venues.merge(df, on='Station', how='left')
#venuesCities.head(100)

#nearby_venues.drop(columns={"City"})


#nearby_venues.merge(df, how='left', left_on='Station', right_on='Station')['City']
#nearby_venues["Station"]==df["Station"], 'City')


#table1.loc[table1['colX'].isna(),'colX'] = table.merge(table2, how='left', left_on='colA', right_on='colB')['colY']


#nearby_venues.loc[nearby_venues["Station"]==df["Station"],['City']] = df["City"]
#nearby_venues.merge(df, how='inner', left_on='colA', right_on='colB')['colY']

#stationsmerged.loc[stationsmerged['Cluster Labels'] == 0, ['Cluster explicit']] = "Cafes & restaurants district"





## Sans traitement, on voit qu'on a pas assez de trucs en commun

In [20]:
pd.set_option('display.max_rows', 500)
nearby_venues.groupby('Venue Category').count().sort_values("City").tail(50)

Unnamed: 0_level_0,City,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Dessert Shop,73,73,73,73,73,73,73
Korean Restaurant,74,74,74,74,74,74,74
Ice Cream Shop,76,76,76,76,76,76,76
Breakfast Spot,79,79,79,79,79,79,79
Gym,80,80,80,80,80,80,80
Art Gallery,80,80,80,80,80,80,80
Middle Eastern Restaurant,84,84,84,84,84,84,84
Garden,84,84,84,84,84,84,84
Tea Room,85,85,85,85,85,85,85
Creperie,86,86,86,86,86,86,86


In [28]:
pd.set_option('display.max_rows', 500)
nearby_venues.loc[nearby_venues["Venue Category"]=="French Restaurant", "Venue Category"]="Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Brasserie", "Venue Category"]="Gastropub"
nearby_venues.loc[nearby_venues["Venue Category"]=="Gym / Fitness Center", "Venue Category"]="Gym"
nearby_venues.loc[nearby_venues["Venue Category"]=="Pub", "Venue Category"]="Bar"
nearby_venues.loc[nearby_venues["Venue Category"]=="Wine Bar", "Venue Category"]="Bar"
nearby_venues.loc[nearby_venues["Venue Category"]=="Bistro", "Venue Category"]="Bar"
nearby_venues.loc[nearby_venues["Venue Category"]=="Cocktail Bar", "Venue Category"]="Bar"
nearby_venues.loc[nearby_venues["Venue Category"]=="Coffee Shop", "Venue Category"]="Café"
nearby_venues.loc[nearby_venues["Venue Category"]=="Grocery Store", "Venue Category"]="Supermarket"
nearby_venues.loc[nearby_venues["Venue Category"]=="Pizza Place", "Venue Category"]="Italian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Sushi Restaurant", "Venue Category"]="Japanese Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Creperie", "Venue Category"]="Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Chinese Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Vietnamese Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Thai Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Korean Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Cambodian Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Dim Sum Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Cantonese Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Szechuan Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Ramen Restaurant", "Venue Category"]="Asian Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Lebanese Restaurant", "Venue Category"]="Mediterranean Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Israeli Restaurant", "Venue Category"]="Mediterranean Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Corsican Restaurant", "Venue Category"]="Mediterranean Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Falafel Restaurant", "Venue Category"]="Mediterranean Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Greek Restaurant", "Venue Category"]="Mediterranean Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Turkish Restaurant", "Venue Category"]="Middle Eastern Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Moroccan Restaurant", "Venue Category"]="Middle Eastern Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Kebab Restaurant", "Venue Category"]="Middle Eastern Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Persian Restaurant", "Venue Category"]="Middle Eastern Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="English Restaurant", "Venue Category"]="Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Auvergne Restaurant", "Venue Category"]="Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Alsatian Restaurant", "Venue Category"]="Restaurant"
nearby_venues.loc[nearby_venues["Venue Category"]=="Tapas Restaurant", "Venue Category"]="Spanish Restaurant"
nearby_venues=nearby_venues[~nearby_venues["Venue Category"].str.contains('Station')] #drops bus, metro and gaz stations
nearby_venues=nearby_venues[nearby_venues["Venue Category"]!="Platform"]
nearby_venues=nearby_venues[nearby_venues["Venue Category"]!="Bus Stop"]





nearby_venues.groupby(['Venue Category', 'City']).count().sort_values("Station", ascending=False).head(500)
#nearby_venues.groupby(['Venue Category', 'City']).count().sort_values("Station", ascending=False).to_csv("categories_to_merge")




Unnamed: 0_level_0,Unnamed: 1_level_0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,City,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Restaurant,Paris,1277,1277,1277,1277,1277,1277
Café,London,1110,1110,1110,1110,1110,1110
Bar,London,806,806,806,806,806,806
Bar,Paris,720,720,720,720,720,720
Hotel,Paris,659,659,659,659,659,659
Italian Restaurant,Paris,470,470,470,470,470,470
Asian Restaurant,Paris,408,408,408,408,408,408
Italian Restaurant,London,405,405,405,405,405,405
Supermarket,London,390,390,390,390,390,390
Café,Paris,361,361,361,361,361,361


## du coup faisons en sorte que ça soit mieux

Check how many venues were returned for each Station



In [29]:
venues=nearby_venues
venues.groupby('Station').count().head()

Unnamed: 0_level_0,City,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ABBEY ROAD - DLR,1,1,1,1,1,1,1
ALL SAINTS - DLR,13,13,13,13,13,13,13
Abbesses,70,70,70,70,70,70,70
Acton Central,5,5,5,5,5,5,5
Acton Town,10,10,10,10,10,10,10


In [36]:
len(venues['Station'].unique())

751

In [37]:
# vs

len(df['Station'])

862

<p style="color:red;">IMPORTANT : we notice here that 111 Stations returned zero venues we will need to create a cluster for "no venue" Stations</p>

#### Let's find out how many unique categories can be curated from all the returned venues

In [38]:
print('There are {} uniques categories, here\'s the list:'.format(len(venues['Venue Category'].unique())))
venues['Venue Category'].unique()

There are 406 uniques categories, here's the list:


array(['History Museum', 'Park', 'Scenic Lookout', 'Boat or Ferry', 'Bar',
       'Building', 'Art Gallery', 'Garden', 'Mediterranean Restaurant',
       'Café', 'Asian Restaurant', 'South American Restaurant',
       'Modern European Restaurant', 'Burger Joint', 'Butcher',
       'Scandinavian Restaurant', 'Hotel', 'Seafood Restaurant',
       'Udon Restaurant', 'Indian Restaurant', 'Bakery', 'Gym',
       'Pedestrian Plaza', 'Restaurant', 'Bookstore',
       'Italian Restaurant', 'Steakhouse', 'Boutique', 'Donut Shop',
       'Nightclub', 'New American Restaurant', 'Sandwich Place',
       'Fast Food Restaurant', 'Juice Bar', 'Japanese Restaurant',
       'Burrito Place', 'Monument / Landmark', 'Trail', 'Beer Bar',
       'Historic Site', 'Spanish Restaurant', 'Market',
       'Middle Eastern Restaurant', 'Castle', 'Exhibit',
       'General Entertainment', 'Hotel Bar', 'Argentinian Restaurant',
       'Gastropub', 'American Restaurant', 'Boxing Gym',
       'Organic Grocery', 'Break

<a id='item3'></a>


### 3. Analyze the typology of venues found in each Station


We will now add 10 columns to the Station dataframe we built at first.
These 10 new columns will represent the 10 most common categories.
For each Station, in these 10 new columns, we will calculate the frequency index for each venue category.

e.g. In the Station called "Alderwood, Long Branch" the 1st most common venue category is "Pizza Places", with an index of 0.29

In [39]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# one hot encoding
stations_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")
# add Station column back to dataframe
stations_onehot['Station'] = venues['Station'] 

# move Station column to the first column
fixed_columns = [stations_onehot.columns[-1]] + list(stations_onehot.columns[:-1])
stations_onehot = stations_onehot[fixed_columns]

stations_onehot.head()

Unnamed: 0,Station,Acai House,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Betting Shop,Big Box Store,Bike Rental / Bike Share,Bike Shop,Boarding House,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Bulgarian Restaurant,Burger Joint,Burgundian Restaurant,Burrito Place,Bus Line,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Canal,Canal Lock,Candy Store,Caribbean Restaurant,Casino,Castle,Caucasian Restaurant,Ch'ti Restaurant,Champagne Bar,Cheese Shop,Chocolate Shop,Church,Cigkofte Place,Circus,Climbing Gym,Clothing Store,College Cafeteria,College Gym,College Quad,College Rec Center,College Residence Hall,College Science Building,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Cricket Ground,Cuban Restaurant,Cultural Center,Cupcake Shop,Currency Exchange,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Dive Bar,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Embassy / Consulate,Empanada Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Fountain,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Gym,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Iraqi Restaurant,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jiangxi Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Lake,Laser Tag,Latin American Restaurant,Laundromat,Lawyer,Leather Goods Store,Library,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Lyonese Bouchon,Mac & Cheese Joint,Malay Restaurant,Mamak Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Newsstand,Night Market,Nightclub,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Planetarium,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Post Office,Print Shop,Provençal Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Rugby Pitch,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Savoyard Restaurant,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shaanxi Restaurant,Shabu-Shabu Restaurant,Shandong Restaurant,Shanxi Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Street Art,Street Fair,Street Food Gathering,Student Center,Supermarket,Taco Place,Tailor Shop,Taiwanese Restaurant,Tattoo Parlor,Taxi Stand,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Trattoria/Osteria,Tunnel,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Vineyard,Warehouse Store,Watch Shop,Water Park,Waterfront,Whisky Bar,Windmill,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Temple,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Temple,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Temple,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Temple,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Temple,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [31]:
stations_onehot.shape

(16648, 414)

#### Next, let's group rows by Station and by taking the mean of the frequency of occurrence of each category


In [32]:
stations_grouped = stations_onehot.groupby('Station').mean().reset_index()
stations_grouped.head(3)

Unnamed: 0,Station,Acai House,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Betting Shop,Big Box Store,Bike Rental / Bike Share,Bike Shop,Boarding House,Boat Rental,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Bulgarian Restaurant,Burger Joint,Burgundian Restaurant,Burrito Place,Bus Line,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Canal,Canal Lock,Candy Store,Caribbean Restaurant,Casino,Castle,Caucasian Restaurant,Ch'ti Restaurant,Champagne Bar,Cheese Shop,Chocolate Shop,Church,Cigkofte Place,Circus,Climbing Gym,Clothing Store,College Cafeteria,College Gym,College Rec Center,College Residence Hall,College Science Building,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Cuban Restaurant,Cultural Center,Cupcake Shop,Currency Exchange,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Dive Bar,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Embassy / Consulate,Empanada Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Fountain,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Funeral Home,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Gym,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Housing Development,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Iraqi Restaurant,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jiangxi Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Lake,Laser Tag,Latin American Restaurant,Laundromat,Lawyer,Leather Goods Store,Library,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Lyonese Bouchon,Mac & Cheese Joint,Malay Restaurant,Mamak Restaurant,Market,Martial Arts School,Massage Studio,Medical Center,Medical Supply Store,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Planetarium,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Post Office,Print Shop,Provençal Restaurant,Record Shop,Recording Studio,Recreation Center,Recycling Facility,Rental Car Location,Rental Service,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Rock Club,Romanian Restaurant,Roof Deck,Rugby Pitch,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Savoyard Restaurant,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shaanxi Restaurant,Shabu-Shabu Restaurant,Shandong Restaurant,Shanxi Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Street Art,Street Fair,Street Food Gathering,Strip Club,Student Center,Supermarket,Taco Place,Tailor Shop,Taiwanese Restaurant,Tattoo Parlor,Taxi,Taxi Stand,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Trattoria/Osteria,Tunnel,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vineyard,Warehouse Store,Watch Shop,Water Park,Waterfall,Waterfront,Whisky Bar,Windmill,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,ALL SAINTS - DLR,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Abbesses,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.185714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.271429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0
2,Acton Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [33]:
stations_grouped.shape

(751, 414)

#### To illustrate the process, let's print 3 example Stations along with the top 10 most common venues


In [36]:
num_top_venues = 7

In [37]:
i=0
for hood in stations_grouped['Station']:
    if i<3:
        print("----"+hood+"----")
        temp = stations_grouped[stations_grouped['Station'] == hood].T.reset_index()
        temp.columns = ['venue','freq']
        temp = temp.iloc[1:]
        temp['freq'] = temp['freq'].astype(float)
        temp = temp.round({'freq': 2})
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('\n')
    else:break
    i=i+1

----ALL SAINTS - DLR----
                venue  freq
0    Asian Restaurant  0.13
1         Supermarket  0.13
2  Italian Restaurant  0.13
3                Café  0.13
4               Hotel  0.07


----Abbesses----
                venue  freq
0          Restaurant  0.27
1                 Bar  0.19
2  Italian Restaurant  0.06
3    Asian Restaurant  0.06
4  Seafood Restaurant  0.03


----Acton Central----
        venue  freq
0         Bar   0.4
1      Bakery   0.2
2        Park   0.2
3   Mini Golf   0.2
4  Acai House   0.0




#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 7 venues for each Station.


In [47]:
import numpy as np

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Station']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Stations_venues_sorted = pd.DataFrame(columns=columns)
Stations_venues_sorted['Station'] = stations_grouped['Station']

for ind in np.arange(stations_grouped.shape[0]):
    Stations_venues_sorted.iloc[ind, 1:] = return_most_common_venues(stations_grouped.iloc[ind, :], num_top_venues)

<a id='item4'></a>


### 4. Cluster Stations


Run _k_-means to cluster the Station into 15 clusters.


In [48]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 15

stations_grouped_clustering = stations_grouped.drop('Station', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(stations_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30] 

array([ 6, 11, 10,  1, 11,  0,  6,  6, 11, 11, 10, 11,  0,  6, 11, 10,  1,
       11,  6, 13,  4,  7,  0, 11,  0, 11, 11,  1,  4, 11], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 most common venues category for each Station.


In [49]:
# add clustering labels
Stations_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Stations_venues_sorted.head(10)
Stations_venues_sorted['Cluster Labels'].value_counts()


1     171
6     143
11    136
4      84
0      69
10     52
5      41
12     20
7      16
2       5
9       4
14      3
3       3
13      2
8       2
Name: Cluster Labels, dtype: int64

In [50]:
Stations_venues_sorted.head(500).sort_values("Cluster Labels")


Unnamed: 0,Cluster Labels,Station,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
137,0,Cheshunt,Convenience Store,Breakfast Spot,River,Indian Restaurant,Hostel
401,0,Mairie des Lilas,Concert Hall,Supermarket,Italian Restaurant,Restaurant,Flea Market
400,0,Mairie de Saint-Ouen,Italian Restaurant,Mediterranean Restaurant,Fast Food Restaurant,Zoo Exhibit,Filipino Restaurant
198,0,EAST INDIA - DLR,Convenience Store,Nature Preserve,Italian Restaurant,Gym,Sandwich Place
391,0,MONTGALLET,Asian Restaurant,Garden,Playground,Supermarket,Sandwich Place
206,0,East Ham,Indian Restaurant,Fast Food Restaurant,Bakery,Sporting Goods Shop,Gym
41,0,Barking,Supermarket,Fast Food Restaurant,Bar,Italian Restaurant,Discount Store
42,0,Barkingside,Soccer Field,Construction & Landscaping,Empanada Restaurant,Escape Room,Ethiopian Restaurant
214,0,Elephant & Castle,Bar,Gym,Asian Restaurant,Supermarket,Music Venue
219,0,Enfield Town,Clothing Store,Café,Bar,Supermarket,Optical Shop


In [None]:
#stations_merged[stations_merged['Cluster Labels']==np.nan].head()
# merge stations_grouped with stations_data to add latitude/longitude for each Station
stations_merged = df.join(Stations_venues_sorted.set_index('Station'), on='Station')
stations_merged.head()

When we added Station without venues, we introduced some cells with class "nan", let's create one last group so now we have kclusters+1 clusters.

In [None]:
stations_merged.replace({'Cluster Labels': np.nan},kclusters, inplace=True)
stations_merged=stations_merged.astype({'Cluster Labels': 'int32'}, copy=False)
kclusters=kclusters+1

### 5. Examine Clusters and add explicit names



In [None]:
stations_merged['Cluster explicit']=""

Cluster 0 seems to be "Cafes & restaurants district"


In [None]:
stations_merged.loc[stations_merged['Cluster Labels'] == 0, ['Cluster explicit']] = "Cafes & restaurants district"

stations_merged.loc[stations_merged['Cluster Labels'] == 0, stations_merged.columns[[1] + list(range(5, stations_merged.shape[1]))]].head(6)


Cluster 1 seems to be "suburban district" with big facilities like Distribution centers or baseball fields


In [None]:
stations_merged.loc[stations_merged['Cluster Labels'] == 1, ['Cluster explicit']] = "suburban district"

stations_merged.loc[stations_merged['Cluster Labels'] == 1, stations_merged.columns[[1] + list(range(5, stations_merged.shape[1]))]].head(6)

Cluster 2 seems to be "Fastfood and distribution district"


In [None]:
stations_merged.loc[stations_merged['Cluster Labels'] == 2, ['Cluster explicit']] = "Fastfood and distribution district"

stations_merged.loc[stations_merged['Cluster Labels'] == 2, stations_merged.columns[[1] + list(range(5, stations_merged.shape[1]))]].head(6)

Cluster 3 seems to be "district with parks"


In [None]:
stations_merged.loc[stations_merged['Cluster Labels'] == 3, ['Cluster explicit']] = "Parks district"

stations_merged.loc[stations_merged['Cluster Labels'] == 3, stations_merged.columns[[1] + list(range(5, stations_merged.shape[1]))]].head(6)

Cluster 4 was added because there are no venues so let's call it "no venues district"

In [None]:
stations_merged.loc[stations_merged['Cluster Labels'] == 4, ['Cluster explicit']] = "no venues district"

stations_merged.loc[stations_merged['Cluster Labels'] == 4, stations_merged.columns[[1] + list(range(5, stations_merged.shape[1]))]].head(6)

Finally, let's visualize the resulting clusters


In [None]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if needed
import folium # map rendering library
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map zoomed on Toronto
map_clusters = folium.Map(location=[43.7, -79.3831843], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, expl in zip(stations_merged['Latitude'], stations_merged['Longitude'], stations_merged['Station'], stations_merged['Cluster Labels'], stations_merged['Cluster explicit']):
    label = folium.Popup(str(poi) + " - " + expl, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters