# IBM Applied Data Science Capstone

## Week 5 Final Report - Opening a coffee shop in Germany
1/ Build a dataframe of cities in Germany by combining 2 sources of data (loading a csv file & web scraping the data from Wikipedia page)
<br>
2/ Get the geographical coordinates of the cities
<br>
3/ Obtain the venue data for the cities from Foursquare API
<br>
4/ Explore and cluster the neighborhoods
<br>
5/ Select the best cluster to open a new coffee shop
<br>

In [1]:
# import libraries
import pandas as pd
import numpy as np
import requests
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from geopy.geocoders import Nominatim
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
from bs4 import BeautifulSoup
print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


### 1/ Build a dataframe of cities in Germany by combining 2 sources of data (loading a csv file & web scraping the data from Wikipedia page)
#### Load CSV file about from Github & data cleaning

In [2]:
url = 'https://raw.githubusercontent.com/mtuantruong/coursera.capstone/master/de.csv'
df1 = pd.read_csv(url, error_bad_lines=False)
df1.head()

Unnamed: 0,city,lat,lng,country,iso2,admin,capital,population,population_proper
0,Berlin,52.516667,13.4,Germany,DE,Berlin,primary,3406000.0,3094014.0
1,Stuttgart,48.782343,9.180819,Germany,DE,Baden-Württemberg,admin,2944700.0,606588.0
2,Frankfurt,50.11552,8.684167,Germany,DE,Hesse,minor,2895000.0,679664.0
3,Mannheim,49.496706,8.479547,Germany,DE,Baden-Württemberg,minor,2362000.0,313174.0
4,Hamburg,53.575323,10.01534,Germany,DE,Hamburg,admin,1757000.0,1739117.0


In [3]:
columns = ['country', 'iso2','capital','population']
df1.drop(columns, inplace=True, axis=1)

In [4]:
df1.columns = ["city", "latitude", "longitude", "state", "population"]

In [5]:
df1 = df1.dropna(how='any',axis=0) 
df1.shape

(57, 5)

#### Web scraping the data from Wikipedia page using BeautifulSoup package & data cleaning

In [6]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_German_cities_by_GDP').text

#Using BeautifulSoup package
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})
df2 = pd.read_html(str(table))[0]

In [7]:
columns = ['Rank', 'State','Gross Domestic Productin million €','Gross Domestic Productper employee in €']
df2.drop(columns, inplace=True, axis=1)

In [8]:
df2.columns = ["city", "gdp per capita"]
df2.head()

Unnamed: 0,city,gdp per capita
0,Berlin,36798
1,Hamburg,62793
2,Munich,75186
3,Frankfurt am Main,91099
4,Cologne,59407


#### Combining 2 sources of data

In [9]:
df1['city'] = df1['city'].astype(str)
df2['city'] = df2['city'].astype(str)
de_df=pd.merge(df1, df2, on="city",sort='True')
de_df

Unnamed: 0,city,latitude,longitude,state,population,gdp per capita
0,Augsburg,48.371538,10.898514,Bavaria,259196.0,48824
1,Berlin,52.516667,13.4,Berlin,3094014.0,36798
2,Bielefeld,52.028332,8.542002,North Rhine-Westphalia,291573.0,38588
3,Bonn,50.73438,7.095485,North Rhine-Westphalia,313125.0,71222
4,Braunschweig,52.265939,10.526726,Lower Saxony,235054.0,46928
5,Bremen,53.073789,8.826754,Bremen,546501.0,50052
6,Bremerhaven,53.550209,8.576735,Bremen,117446.0,34771
7,Coburg,50.259366,10.963843,Bavaria,41901.0,83501
8,Cologne,50.933333,6.95,North Rhine-Westphalia,963395.0,59407
9,Cottbus,51.757691,14.328875,Brandenburg,84754.0,33067


### 2/ Get the geographical coordinates of the cities & create a map of Germany

In [10]:
# get the coordinates of Germany
address = 'Germany'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Germany are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Germany are 51.0834196, 10.4234469.


In [11]:
# create map of Germany using latitude and longitude values
map_de = folium.Map(location=[latitude, longitude], zoom_start=6)

for lat, lng, borough, neighborhood in zip(de_df['latitude'], de_df['longitude'], de_df['state'], de_df['city']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_de)  
    
map_de

### 3/ Obtain the venue data for the cities from Foursquare API 

In [12]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: 0PG2TM2OMITIZP3HSM5PUHCCSWHW1ZGO4IKLHEGHNAYVFTLM
CLIENT_SECRET:VMNFCGLSLECYBXQOX15TLKBQXWXM1OYMHAOFEJ2ODJOWUVCR


#### Now, let's get the top 100 venues that are within a 5km radius of city center.

In [13]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(de_df['latitude'], de_df['longitude'], de_df['city']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [14]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['city', 'latitude', 'longitude', 'venuename', 'venuelatitude', 'venuelongitude', 'venuecategory']

print(venues_df.shape)
venues_df.head()

(4114, 7)


Unnamed: 0,city,latitude,longitude,venuename,venuelatitude,venuelongitude,venuecategory
0,Augsburg,48.371538,10.898514,ShuShu Falafel,48.370752,10.897762,Falafel Restaurant
1,Augsburg,48.371538,10.898514,Stadtmarkt,48.368509,10.894675,Farmers Market
2,Augsburg,48.371538,10.898514,Rathausplatz,48.368974,10.897766,Plaza
3,Augsburg,48.371538,10.898514,Liliom,48.372056,10.900675,Indie Movie Theater
4,Augsburg,48.371538,10.898514,Hofgarten,48.372415,10.893419,Garden


In [15]:
venues_df.groupby(["city"]).count()

Unnamed: 0_level_0,latitude,longitude,venuename,venuelatitude,venuelongitude,venuecategory
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Augsburg,100,100,100,100,100,100
Berlin,100,100,100,100,100,100
Bielefeld,73,73,73,73,73,73
Bonn,100,100,100,100,100,100
Braunschweig,100,100,100,100,100,100
Bremen,100,100,100,100,100,100
Bremerhaven,48,48,48,48,48,48
Coburg,47,47,47,47,47,47
Cologne,100,100,100,100,100,100
Cottbus,52,52,52,52,52,52


In [16]:
print('There are {} uniques categories.'.format(len(venues_df['venuecategory'].unique())))
# print out the list of categories
venues_df['venuecategory'].unique()[:50]

There are 308 uniques categories.


array(['Falafel Restaurant', 'Farmers Market', 'Plaza',
       'Indie Movie Theater', 'Garden', 'Roof Deck', 'Café',
       'Pedestrian Plaza', 'Indian Restaurant', 'Bar', 'Taverna', 'Hotel',
       'German Restaurant', 'Clothing Store', 'Historic Site',
       'Electronics Store', 'Burger Joint', 'Cocktail Bar', 'Bakery',
       'Italian Restaurant', 'French Restaurant', 'Steakhouse',
       'Gym Pool', 'Brewery', 'Seafood Restaurant', 'Drugstore',
       'Gym / Fitness Center', 'Beer Garden', 'Ice Cream Shop',
       'Trattoria/Osteria', 'Park', 'Theater', 'Pub', 'Hockey Arena',
       'Greek Restaurant', 'Lounge', 'Turkish Restaurant',
       'Botanical Garden', 'Restaurant', 'Multiplex',
       'Japanese Restaurant', 'Climbing Gym',
       'Modern European Restaurant', 'Zoo', 'Forest', 'Department Store',
       'Amphitheater', 'Gastropub', 'Museum', 'Grocery Store'],
      dtype=object)

In [17]:
# check if the results contain "city"
"city" in venues_df['venuecategory'].unique()

False

In [18]:
# one hot encoding
de_onehot = pd.get_dummies(venues_df[['venuecategory']], prefix="", prefix_sep="")

# add city column back to dataframe
de_onehot['city'] = venues_df['city'] 

# move city column to the first column
fixed_columns = [de_onehot.columns[-1]] + list(de_onehot.columns[:-1])
de_onehot = de_onehot[fixed_columns]

print(de_onehot.shape)
de_onehot.head()

(4114, 309)


Unnamed: 0,city,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Vineyard,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Augsburg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Augsburg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Augsburg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Augsburg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Augsburg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
de_grouped = de_onehot.groupby(["city"]).mean().reset_index()

print(de_grouped.shape)
de_grouped

(47, 309)


Unnamed: 0,city,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Vineyard,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Augsburg,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Berlin,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,...,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0
2,Bielefeld,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,...,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0
3,Bonn,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0
4,Braunschweig,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bremen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01
6,Bremerhaven,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.020833,0.0
7,Coburg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cologne,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
9,Cottbus,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0


In [20]:
len(de_grouped[de_grouped["Café"] > 0])

46

In [22]:
list(de_grouped.columns.values)

['city',
 'Afghan Restaurant',
 'African Restaurant',
 'Airport',
 'American Restaurant',
 'Amphitheater',
 'Aquarium',
 'Argentinian Restaurant',
 'Art Gallery',
 'Art Museum',
 'Arts & Crafts Store',
 'Arts & Entertainment',
 'Asian Restaurant',
 'Athletics & Sports',
 'Australian Restaurant',
 'Austrian Restaurant',
 'Auto Dealership',
 'Automotive Shop',
 'BBQ Joint',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Baseball Stadium',
 'Basketball Stadium',
 'Bavarian Restaurant',
 'Beach',
 'Beach Bar',
 'Bed & Breakfast',
 'Beer Bar',
 'Beer Garden',
 'Beer Store',
 'Belgian Restaurant',
 'Big Box Store',
 'Bike Rental / Bike Share',
 'Bistro',
 'Boat or Ferry',
 'Bookstore',
 'Botanical Garden',
 'Boutique',
 'Bowling Alley',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bridge',
 'Building',
 'Burger Joint',
 'Burrito Place',
 'Bus Stop',
 'Business Service',
 'Butcher',
 'Cable Car',
 'Café',
 'Camera Store',
 'Canal',
 'Canal Lock',
 'Candy Store',
 'Capitol Building'

In [23]:
#create a new column named coffee, which is the sum of all coffee-related columns
de_grouped['coffee'] = de_grouped['Café'] + de_grouped['Coffee Roaster']+ de_grouped['Coffee Shop']+ de_grouped['College Cafeteria']

In [24]:
de_coffee = de_grouped[["city","coffee"]]
de_coffee

Unnamed: 0,city,coffee
0,Augsburg,0.06
1,Berlin,0.07
2,Bielefeld,0.054795
3,Bonn,0.08
4,Braunschweig,0.16
5,Bremen,0.09
6,Bremerhaven,0.041667
7,Coburg,0.106383
8,Cologne,0.11
9,Cottbus,0.076923


In [25]:
de_coffee['city'] = de_coffee['city'].astype(str)
de_df2=pd.merge(de_df, de_coffee, on="city",sort='True')
de_df2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,city,latitude,longitude,state,population,gdp per capita,coffee
0,Augsburg,48.371538,10.898514,Bavaria,259196.0,48824,0.06
1,Berlin,52.516667,13.4,Berlin,3094014.0,36798,0.07
2,Bielefeld,52.028332,8.542002,North Rhine-Westphalia,291573.0,38588,0.054795
3,Bonn,50.73438,7.095485,North Rhine-Westphalia,313125.0,71222,0.08
4,Braunschweig,52.265939,10.526726,Lower Saxony,235054.0,46928,0.16
5,Bremen,53.073789,8.826754,Bremen,546501.0,50052,0.09
6,Bremerhaven,53.550209,8.576735,Bremen,117446.0,34771,0.041667
7,Coburg,50.259366,10.963843,Bavaria,41901.0,83501,0.106383
8,Cologne,50.933333,6.95,North Rhine-Westphalia,963395.0,59407,0.11
9,Cottbus,51.757691,14.328875,Brandenburg,84754.0,33067,0.076923


In [26]:
de_df2['population'] = de_df2['population'].astype(int)
de_df3 = de_df2[["city","coffee","population","gdp per capita"]]
de_df3

Unnamed: 0,city,coffee,population,gdp per capita
0,Augsburg,0.06,259196,48824
1,Berlin,0.07,3094014,36798
2,Bielefeld,0.054795,291573,38588
3,Bonn,0.08,313125,71222
4,Braunschweig,0.16,235054,46928
5,Bremen,0.09,546501,50052
6,Bremerhaven,0.041667,117446,34771
7,Coburg,0.106383,41901,83501
8,Cologne,0.11,963395,59407
9,Cottbus,0.076923,84754,33067


### 4/ Explore and cluster the neighborhoods 

In [27]:
# set number of clusters
kclusters = 4

de_clustering = de_df3.drop(["city"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(de_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 0, 0, 0, 3, 0, 0, 3, 0], dtype=int32)

In [28]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
de_merged = de_df3.copy()

# add clustering labels
de_merged["Cluster Labels"] = kmeans.labels_

de_merged

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels
0,Augsburg,0.06,259196,48824,0
1,Berlin,0.07,3094014,36798,2
2,Bielefeld,0.054795,291573,38588,0
3,Bonn,0.08,313125,71222,0
4,Braunschweig,0.16,235054,46928,0
5,Bremen,0.09,546501,50052,3
6,Bremerhaven,0.041667,117446,34771,0
7,Coburg,0.106383,41901,83501,0
8,Cologne,0.11,963395,59407,3
9,Cottbus,0.076923,84754,33067,0


In [29]:
de_df4=de_df2[['city','latitude','longitude']]
de_df4

Unnamed: 0,city,latitude,longitude
0,Augsburg,48.371538,10.898514
1,Berlin,52.516667,13.4
2,Bielefeld,52.028332,8.542002
3,Bonn,50.73438,7.095485
4,Braunschweig,52.265939,10.526726
5,Bremen,53.073789,8.826754
6,Bremerhaven,53.550209,8.576735
7,Coburg,50.259366,10.963843
8,Cologne,50.933333,6.95
9,Cottbus,51.757691,14.328875


In [30]:
de_df5=pd.merge(de_merged, de_df4, on="city",sort='True')
de_df5

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels,latitude,longitude
0,Augsburg,0.06,259196,48824,0,48.371538,10.898514
1,Berlin,0.07,3094014,36798,2,52.516667,13.4
2,Bielefeld,0.054795,291573,38588,0,52.028332,8.542002
3,Bonn,0.08,313125,71222,0,50.73438,7.095485
4,Braunschweig,0.16,235054,46928,0,52.265939,10.526726
5,Bremen,0.09,546501,50052,3,53.073789,8.826754
6,Bremerhaven,0.041667,117446,34771,0,53.550209,8.576735
7,Coburg,0.106383,41901,83501,0,50.259366,10.963843
8,Cologne,0.11,963395,59407,3,50.933333,6.95
9,Cottbus,0.076923,84754,33067,0,51.757691,14.328875


In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=6)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(de_df5['latitude'], de_df5['longitude'], de_df5['city'], de_df5['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 5/ Select the best cluster to open a new coffee shop 

In [32]:
de_df5.loc[de_df5['Cluster Labels'] == 0]

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels,latitude,longitude
0,Augsburg,0.06,259196,48824,0,48.371538,10.898514
2,Bielefeld,0.054795,291573,38588,0,52.028332,8.542002
3,Bonn,0.08,313125,71222,0,50.73438,7.095485
4,Braunschweig,0.16,235054,46928,0,52.265939,10.526726
6,Bremerhaven,0.041667,117446,34771,0,53.550209,8.576735
7,Coburg,0.106383,41901,83501,0,50.259366,10.963843
9,Cottbus,0.076923,84754,33067,0,51.757691,14.328875
14,Emden,0.03125,48046,70448,0,53.367454,7.207776
15,Erfurt,0.11,175476,38284,0,50.978695,11.032831
17,Flensburg,0.1,85838,42827,0,54.78431,9.439611


In [33]:
de_df5.loc[de_df5['Cluster Labels'] == 1]

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels,latitude,longitude
20,Hamburg,0.14,1739117,62793,1,53.575323,10.01534
33,Munich,0.15,1260391,75186,1,48.15,11.583333


In [34]:
de_df5.loc[de_df5['Cluster Labels'] == 2]

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels,latitude,longitude
1,Berlin,0.07,3094014,36798,2,52.516667,13.4


In [35]:
de_df5.loc[de_df5['Cluster Labels'] == 3]

Unnamed: 0,city,coffee,population,gdp per capita,Cluster Labels,latitude,longitude
5,Bremen,0.09,546501,50052,3,53.073789,8.826754
8,Cologne,0.11,963395,59407,3,50.933333,6.95
10,Dortmund,0.11,588462,36781,3,51.514942,7.465997
11,Dresden,0.07,486854,37993,3,51.048562,13.745794
12,Duisburg,0.093333,488005,33634,3,51.432469,6.765161
13,Düsseldorf,0.11,592393,79619,3,51.228304,6.793849
16,Essen,0.1,573468,41512,3,51.45657,7.012282
28,Leipzig,0.08,504971,35123,3,51.34419,12.386504
42,Stuttgart,0.08,606588,82397,3,48.782343,9.180819


##### Conclusion:
Examining the data, we can conclude that cluster 2, Berlin specifically, is the best location city to open a new coffee shop in Germany. Berliners are coffee-underserved because the city has the biggest population across Germany while having a relatively small amount of coffee shops in the 5km radius of city center. On the other hand, in cluster 1, despite having a huge population and high gdp per capita, Munich & Hamburg are not ideal cities to kick off your coffee business. The two cities is the most populous areas in Germany, in terms of cafeteria, which results much more competition. Another cluster, which is also worth-mentioning, is cluster 3. Among cluster 3, the cities such as Bremen and Stuttgart are highly potential locations for coffee start up, given low competition, average popolation, and high gdp per capita.
<br> 
<br> If I have to use one word to describe the 4 clusters, they will be:
<br>
Cluster 1: 50/50 chance cities
<br>
Cluster 2: most competitive cities
<br>
Cluster 3: under-served cities
<br>
Cluster 4: potential cities
<br>