 # <h1><center>**Ancient and rare books store relocation**</center></h1>

Guillaume Borquet

## 1 - Introduction

For this assignment, I have created an imaginary situation where the owner of an bookstore that sells ancient and rare books wants to relocate his store in a new city. To ensure is business is successful, he needs to find a place where potential customer base is at least as important as the one he currently has. 
His current store is situated Paris city center, near the Louvres museum. He is considers installing his new "rare and ancient books" store in 6 possible locations : Munich, Berlin, Oslo, Trondheim, Budapest, and Barcelona. We know that his customers are educated people, interested in culture and that they usually come and browse his shop after a visit at a nearby museum or after visiting other book stores. The business in his current book store in Paris is good, so he should try to install his new book store in a place that is similar to where he is now. His store is situated in a "cultural" area of Paris, with museums and other book stores.  
To relocate him in a place that is beneficial to his business, we need to compare those city centers and find which is the most similar to where he is at the moment. As his bookstore is in a cultural neighborhood in Paris, we need to find for him a city that offers the most cultural venues around the city center, because it is most likely that his customers are people generally attracted by a "cultural" neighborhood.
We will use data science and machine learning to provide answers as to the best business location. The methods that will be developed will not only be useful to the "rare and antique books" store owner but to any business that might profit from being located in a "cultural" area.

It is possible that not all cultural places are equal, so we will cluster the cities according to what type of cultural venues they have, and see if one is similar to Paris, where our original bookshop is. 
We will create the clusters using 

## 2 - Data  

I used Foursquare API to extract all venues in a 2km radius around each city center of the possible listed cities. 
Base on our study problem, the venues of interest are cultural places such as museums and other book store. 

### 2.1 - Find the coordinates of the city centers of interest.


In [7]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [None]:
# this cell is hidden to protect my Foursquare credentials

In [9]:
# First I create the list of cities the client has selected for possible book shop

cities = ["Munich, GERMANY", "Barcelona, SPAIN", "Budapest, HUNGARY", "Berlin, GERMANY", "Paris, FRANCE", "Oslo, NORWAY", "Trondheim, NORWAY"]

cities

['Munich, GERMANY',
 'Barcelona, SPAIN',
 'Budapest, HUNGARY',
 'Berlin, GERMANY',
 'Paris, FRANCE',
 'Oslo, NORWAY',
 'Trondheim, NORWAY']

In [10]:
# Here I get the coordinates for each cities

df = []
for city in cities:
    geolocator = Nominatim(user_agent="world")
    location = geolocator.geocode(city)
    latitude = location.latitude
    longitude = location.longitude
    

    df.append([city,latitude,longitude])
                         
    dataset = np.array(df)

    dataset=pd.DataFrame({"City":dataset[:,0],"Lat":dataset[:,1],"long":dataset[:,2]})

    
dataset=dataset.astype({'Lat': 'float',"long":"float"})

dataset




Unnamed: 0,City,Lat,long
0,"Munich, GERMANY",48.137108,11.575382
1,"Barcelona, SPAIN",41.382894,2.177432
2,"Budapest, HUNGARY",47.498382,19.040471
3,"Berlin, GERMANY",52.517037,13.38886
4,"Paris, FRANCE",48.856697,2.351462
5,"Oslo, NORWAY",59.91333,10.73897
6,"Trondheim, NORWAY",63.430566,10.395193


Let's plot them on a map !

In [11]:
# I plot the possible cities on the map to verify the coordinates

address = 'Europe'

geolocator = Nominatim(user_agent="world")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(' {}, {}.'.format(latitude, longitude))



city_maps = folium.Map(location=[latitude, longitude],zoom_start=4)

for lat, lng, cit in zip(dataset['Lat'], dataset['long'], dataset['City']):
    label = '{}, {}'.format(cit, cit)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(city_maps)  
    
city_maps

 51.0, 10.0.


So here we have the city centers for the possible cities in europe where the shop could be installed.

### 2.2 Use Foursquare to get the venues in a 2km radius of the city centers.


In [12]:
# I get all the venues in a 2km Radius around the city centers

def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# Get the venues
cities_venues = getNearbyVenues(names=dataset['City'],
                                   latitudes=dataset['Lat'],
                                   longitudes=dataset['long']
                                  )

Munich, GERMANY
Barcelona, SPAIN
Budapest, HUNGARY
Berlin, GERMANY
Paris, FRANCE
Oslo, NORWAY
Trondheim, NORWAY


In [14]:
cities_venues.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Munich, GERMANY",48.137108,11.575382,Marienplatz,48.137125,11.575483,Plaza
1,"Munich, GERMANY",48.137108,11.575382,Viktualienmarkt,48.135296,11.576368,Farmers Market
2,"Munich, GERMANY",48.137108,11.575382,Fischbrunnen,48.137211,11.576047,Fountain
3,"Munich, GERMANY",48.137108,11.575382,Alois Dallmayr,48.138554,11.57675,Gourmet Shop
4,"Munich, GERMANY",48.137108,11.575382,Kustermann,48.136242,11.574897,Department Store


In [15]:

# I look at the venues category to select only the cultural venues
cities_venues["Venue Category"].unique()

array(['Plaza', 'Farmers Market', 'Fountain', 'Gourmet Shop',
       'Department Store', 'Organic Grocery', 'Church',
       'Falafel Restaurant', 'Beer Garden', 'Fish Market', 'Coffee Shop',
       'Bavarian Restaurant', 'Opera House', 'German Restaurant',
       'Irish Pub', 'Café', 'Art Museum', 'Snack Place', 'Hotel',
       "Men's Store", 'Vegetarian / Vegan Restaurant', 'Bookstore',
       'Boutique', 'Cocktail Bar', 'Neighborhood', 'Spanish Restaurant',
       'Hot Dog Joint', 'Bridge', 'Camera Store', 'Italian Restaurant',
       'Wine Bar', 'Tapas Restaurant', 'Ice Cream Shop', 'Hotel Pool',
       'Dessert Shop', 'Market', 'Concert Hall', 'Monument / Landmark',
       'Tea Room', 'Castle', 'Scenic Lookout', 'Wine Shop',
       'French Restaurant', 'Pub', 'Historic Site', 'Bar',
       'Sushi Restaurant', 'Waterfront', 'Hungarian Restaurant', 'Bistro',
       'Chocolate Shop', 'Modern European Restaurant',
       'Outdoor Sculpture', 'Cosmetics Shop', 'Sandwich Place', 'Theate

Here we see all the categories of venues that have been fetched, we have to decide what are the one that are important for our book store location. We chose from our business knowledge the venues that will attract potential costumers for our bookshop.

In [16]:
# List of venues I consider as cultural
cultural_venues = ["Plaza", "Fountain", "Church", "Opera House", "Art Museum", "Bookstore", "Bridge", "Wine Bar", 
                  "Concert Hall", "Monument / Landmark", "Tea Room", "Castle", "Wine Shop", "Historic Site", "Outdoor Sculpture", 
                  "Theater", "History Museum", "Museum", "Art Gallery", "Pedestrian Plaza"]

print("The most cultural venues in our dataset that are likely to attract the right kind of customers are:", cultural_venues)

The most cultural venues in our dataset that are likely to attract the right kind of customers are: ['Plaza', 'Fountain', 'Church', 'Opera House', 'Art Museum', 'Bookstore', 'Bridge', 'Wine Bar', 'Concert Hall', 'Monument / Landmark', 'Tea Room', 'Castle', 'Wine Shop', 'Historic Site', 'Outdoor Sculpture', 'Theater', 'History Museum', 'Museum', 'Art Gallery', 'Pedestrian Plaza']


Now we can get rid of all the useless venues in our dataset.

In [17]:
# Keep only cultural venues
cultural_df=cities_venues[cities_venues["Venue Category"].isin(cultural_venues)]

cultural_df.head()



Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Munich, GERMANY",48.137108,11.575382,Marienplatz,48.137125,11.575483,Plaza
2,"Munich, GERMANY",48.137108,11.575382,Fischbrunnen,48.137211,11.576047,Fountain
6,"Munich, GERMANY",48.137108,11.575382,St. Peter,48.13653,11.575615,Church
13,"Munich, GERMANY",48.137108,11.575382,Bayerische Staatsoper,48.139639,11.578933,Opera House
17,"Munich, GERMANY",48.137108,11.575382,Kunsthalle München,48.139967,11.57592,Art Museum


Now we have a dataset ready for analysis, that contains our city centers, and all the venues we are interested in.


## 3 - Methods

We want to select for the new bookshop the place that has the most cultural places in a 2km radius from the city center and that is the most similar to Paris. I have summarized the number of cultural venues and of bookstores in each cities, and calculated a proportion of bookstores per cultural venue per city. To measure similarity or dissimilarity between cities in terms of cultural venue category, I created clusters of similar cities using the K-means method. Because of the small number of cities in the data, I used k=2. to differentiate cities that are similar to Paris or not. I used Folium to map the cities and the clusters. All analysis was done in Jupyter Notebook using Python.

# 4 - Results

# 4.1 - Number of cultural venues and proportion of bookstores

In [19]:

# I get the number of cultural venues per city

venuecount=cultural_df["City"].value_counts()

venuecount=venuecount.to_frame().reset_index()

venuecount.rename(columns={'index':'Cities',
                          'City':'Cultural Venues',}, 
                 inplace=True)
venuecount


Unnamed: 0,Cities,Cultural Venues
0,"Berlin, GERMANY",17
1,"Budapest, HUNGARY",11
2,"Barcelona, SPAIN",9
3,"Paris, FRANCE",9
4,"Munich, GERMANY",7
5,"Oslo, NORWAY",5
6,"Trondheim, NORWAY",2




We see that the city with the most cultural venues around its center is Berlin, followed by Budapest. 


In [20]:
# Number of bookstore per city
bookstores=cities_venues[cities_venues["Venue Category"].str.contains("Bookstore")]

bookstorescount = bookstores["City"].value_counts()

bookstorescount=bookstorescount.to_frame().reset_index()

bookstorescount.rename(columns={'index':'Cities',
                          'City':'Bookstores',}, 
                 inplace=True)
bookstorescount



Unnamed: 0,Cities,Bookstores
0,"Berlin, GERMANY",2
1,"Paris, FRANCE",2
2,"Oslo, NORWAY",2
3,"Munich, GERMANY",1


In [21]:
# Proportion of bookstores per cultural venues per city
venuecount=pd.merge(venuecount,
bookstorescount,
how='left',
on="Cities")
venuecount["proportion"]=venuecount["Cultural Venues"]/venuecount["Bookstores"]



Berlin, Paris and Oslo have 2 book stores in the radius, and Munich has one.

In [22]:
# I just make a clear dataframe with all the info
venuecount.fillna(0, inplace=True)

venuecount.sort_values(by=["Cultural Venues","proportion"], ascending=[False,True])


Unnamed: 0,Cities,Cultural Venues,Bookstores,proportion
0,"Berlin, GERMANY",17,2.0,8.5
1,"Budapest, HUNGARY",11,0.0,0.0
2,"Barcelona, SPAIN",9,0.0,0.0
3,"Paris, FRANCE",9,2.0,4.5
4,"Munich, GERMANY",7,1.0,7.0
5,"Oslo, NORWAY",5,2.0,2.5
6,"Trondheim, NORWAY",2,0.0,0.0


We see that althought Berlin has the highest number of Cultural Venues, it also has the highest proportion of bookstores in the area compared to total venues. Budapest has no bookestores, and is at the second place in the number of cultural venues. Paris is number 4, after Barcelona, which has the same number of Cultural venues but no bookstores in the area.

# 4.2 - Similarities between cities (Clustering)

In [25]:
# one hot encoding
cities_onehot = pd.get_dummies(cultural_df[['Venue Category']], prefix="", prefix_sep="")

cities_onehot['City'] = cultural_df['City'] 

fixed_columns = [cities_onehot.columns[-1]] + list(cities_onehot.columns[:-1])
cities_onehot = cities_onehot[fixed_columns]

cities_onehot.head()

Unnamed: 0,City,Art Gallery,Art Museum,Bookstore,Bridge,Castle,Church,Concert Hall,Fountain,Historic Site,History Museum,Monument / Landmark,Museum,Opera House,Outdoor Sculpture,Pedestrian Plaza,Plaza,Tea Room,Theater,Wine Bar,Wine Shop
0,"Munich, GERMANY",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,"Munich, GERMANY",0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
6,"Munich, GERMANY",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
13,"Munich, GERMANY",0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
17,"Munich, GERMANY",0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [67]:
# I make a dataframe for clustering, with the number of each category of cultural venue per city, to compare them
cities_grouped = cities_onehot.groupby('City').sum().reset_index()
cities_grouped

Unnamed: 0,City,Art Gallery,Art Museum,Bookstore,Bridge,Castle,Church,Concert Hall,Fountain,Historic Site,History Museum,Monument / Landmark,Museum,Opera House,Outdoor Sculpture,Pedestrian Plaza,Plaza,Tea Room,Theater,Wine Bar,Wine Shop
0,"Barcelona, SPAIN",0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,4,0,0,1,0
1,"Berlin, GERMANY",1,0,2,0,0,0,2,0,1,2,2,1,1,0,0,2,0,2,1,0
2,"Budapest, HUNGARY",0,1,0,0,1,0,0,0,2,0,0,0,0,1,0,2,1,0,2,1
3,"Munich, GERMANY",0,1,1,0,0,1,0,1,0,0,0,0,2,0,0,1,0,0,0,0
4,"Oslo, NORWAY",1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0
5,"Paris, FRANCE",0,2,2,0,0,1,0,0,0,0,0,0,0,0,1,2,0,1,0,0
6,"Trondheim, NORWAY",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1


In [68]:
# Clustering
# set number of clusters
kclusters = 2

cities_grouped_clustering = cities_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cities_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 1, 1, 1, 1], dtype=int32)

In [69]:
# add clustering labels
cities_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

cities_merged = dataset

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
cities_merged = cities_merged.join(cities_grouped.set_index('City'), on='City')



In [70]:
cities_merged

Unnamed: 0,City,Lat,long,Cluster Labels,Art Gallery,Art Museum,Bookstore,Bridge,Castle,Church,Concert Hall,Fountain,Historic Site,History Museum,Monument / Landmark,Museum,Opera House,Outdoor Sculpture,Pedestrian Plaza,Plaza,Tea Room,Theater,Wine Bar,Wine Shop
0,"Munich, GERMANY",48.137108,11.575382,1,0,1,1,0,0,1,0,1,0,0,0,0,2,0,0,1,0,0,0,0
1,"Barcelona, SPAIN",41.382894,2.177432,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,4,0,0,1,0
2,"Budapest, HUNGARY",47.498382,19.040471,1,0,1,0,0,1,0,0,0,2,0,0,0,0,1,0,2,1,0,2,1
3,"Berlin, GERMANY",52.517037,13.38886,0,1,0,2,0,0,0,2,0,1,2,2,1,1,0,0,2,0,2,1,0
4,"Paris, FRANCE",48.856697,2.351462,1,0,2,2,0,0,1,0,0,0,0,0,0,0,0,1,2,0,1,0,0
5,"Oslo, NORWAY",59.91333,10.73897,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0
6,"Trondheim, NORWAY",63.430566,10.395193,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1


Berlin and Barcelona are placed together in a separate cluster from the other cities. It seems to be because thay both have Monument/Landmark and Concert Halls and Opera House in the area of the potential book shop.

In [71]:
# I make a clear dataframe with all the results in the study

venuecount.rename(columns={'Cities':'City'}, 
                 inplace=True)




cititest = cities_merged[["City","Cluster Labels"]]


venuecount.join(cititest.set_index("City"), on='City')



Unnamed: 0,City,Cultural Venues,Bookstores,proportion,Cluster Labels
0,"Berlin, GERMANY",17,2.0,8.5,0
1,"Budapest, HUNGARY",11,0.0,0.0,1
2,"Barcelona, SPAIN",9,0.0,0.0,0
3,"Paris, FRANCE",9,2.0,4.5,1
4,"Munich, GERMANY",7,1.0,7.0,1
5,"Oslo, NORWAY",5,2.0,2.5,1
6,"Trondheim, NORWAY",2,0.0,0.0,1


The city with the most Cultural Venues that is the most similar to Paris is Budapest.
Let's see that on a map.

In [73]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start= 4 )

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cities_merged['Lat'], cities_merged['long'], cities_merged['City'], cities_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 5 - Discussion


We found out that there are 3 cities that have more cultural venues than Paris (where the current shop is) in the vicinity of the potential new book shop. They are Berlin, Budapest and Barcelona. Berlin has the most cultural venues in the area, but it also has the highest proportion of bookshop. We also found out that Berlin and Barcelona are clustered together separately from the other cities, because they both have Concert Hall and Opera House while the other cities don't. In this respect, the city with more cultural venues than Paris, that is the most similar to Paris in terms of the cultural venues category is Budapest. However, this result should be interpreted with caution because of the small sample size and the fact that I created only two clusters.
In addition to being similar to Paris in term of cultural venues category, Budapest have no bookshop in the area of the potential new bookshop. It could either be beneficial or negative, depending if the potential customers like to visit bookshops one after the other and browse, or if they come specifically to the bookshop to get a special book item. It is also possible that installing a bookstore in areas rich with music cultural venues like Berlin and Barcelona could be beneficial for the business instead of installing the new bookshop in the area that is the most similar to Paris. To assess the best area possible for the new book store, two question should be addressed in a new study : 1/ Are customers more likely to buy items in the bookshop when the browse different bookshops during the day, or when there is only one bookshop in the area, and 2/ Does the presence of music cultural venues such as Opera House and Concert Hall increase the flow of potential customers for "rare and antique" books. 

# 6 - Conclusion

In conclusion, because our client specifically asked for the location that is the most similar to Paris in venue category but has more cultural venues around, we would recommend Budapest as a first choice, with the warning that further research on consumer habits and best neighbors venues category could be beneficial to potentially increase sales. Because Budapest is similar to Paris in term of cultural venues category and has no bookshops in the area, it appears to be a safe place to implement a new bookstore.