# Michelin Rated Restaurants

In [1]:
import pandas as pd

## Initial Exploration & Preparation of Data

In [2]:
michelin = pd.read_csv("../data/Michelin/michelin_data_2023.csv")
michelin.head(10)

Unnamed: 0,Name,Address,Location,Price,Cuisine,Longitude,Latitude,PhoneNumber,Url,WebsiteUrl,Award,FacilitiesAndServices
0,noma,"Refshalevej 96, Copenhagen, 1432 K, Denmark","Copenhagen, Denmark",€€€€,Creative,12.610618,55.683312,4532963000.0,https://guide.michelin.com/en/capital-region/c...,https://noma.dk,3 MICHELIN Stars,"American Express credit card,Credit card / Deb..."
1,Maaemo,"Dronning Eufemias gate 23, Oslo, 0194, Norway","Oslo, Norway",€€€€,"Modern Cuisine, Creative",10.758636,59.907529,4722180000.0,https://guide.michelin.com/en/oslo-region/oslo...,http://www.maaemo.no,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
2,Frantzén,"Klara Norra Kyrkogata 26, Stockholm, 111 22, S...","Stockholm, Sweden",€€€€,Modern Cuisine,18.059757,59.334167,468208600.0,https://guide.michelin.com/en/stockholm-region...,https://www.restaurantfrantzen.com/,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
3,Geranium,"Per Henrik Lings Allé, Parken National Stadium...","Copenhagen, Denmark",€€€€,"Creative, Contemporary",12.572529,55.704085,4569960000.0,https://guide.michelin.com/en/capital-region/c...,https://www.geranium.dk/,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
4,Ta Vie,"2F, The Pottinger Hotel, 21 Stanley Street, Ho...",Hong Kong,$$$$,Innovative,114.15529,22.282766,85226690000.0,https://guide.michelin.com/en/hong-kong-region...,https://www.tavie.com.hk,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
5,Caprice,"6F, Four Seasons Hotel, 8 Finance Street, Hong...",Hong Kong,$$$$,French Contemporary,114.15905,22.285715,85231970000.0,https://guide.michelin.com/en/hong-kong-region...,https://www.fourseasons.com/hongkong,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
6,Sushi Shikon,"7F, The Landmark Mandarin Oriental Hotel, 15 Q...",Hong Kong,$$$$,Sushi,114.157416,22.280857,85226440000.0,https://guide.michelin.com/en/hong-kong-region...,https://sushi-shikon.com,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
7,T'ang Court,"1-2F, The Langham Hotel, 8 Peking Road, Hong Kong",Hong Kong,$$$,Cantonese,114.1698,22.296572,85221330000.0,https://guide.michelin.com/en/hong-kong-region...,https://www.langhamhotels.com/en/the-langham/h...,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
8,Robuchon au Dôme,"43F, Grand Lisboa Hotel, Avenida de Lisboa, Macau",Macau,$$$$,French Contemporary,113.54395,22.189949,85388040000.0,https://guide.michelin.com/en/macau-region/mac...,https://www.grandlisboahotels.com,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."
9,8 1/2 Otto e Mezzo - Bombana,"Shop 202, 2F, Alexandra House, 18 Chater Road,...",Hong Kong,$$$$,Italian,114.15867,22.281464,85225380000.0,https://guide.michelin.com/en/hong-kong-region...,https://www.ottoemezzobombana.com,3 MICHELIN Stars,"Air conditioning,American Express credit card,..."


In [3]:
print(f"Columns:\n{michelin.columns.to_list()}")

Columns:
['Name', 'Address', 'Location', 'Price', 'Cuisine', 'Longitude', 'Latitude', 'PhoneNumber', 'Url', 'WebsiteUrl', 'Award', 'FacilitiesAndServices']


We aim to plot the coordinates on a map and search for population density correlation and compare UK and France.

We drop `Url`, `PhoneNumber` and `FacilitiesAndService`. `Url` could perhaps become useful

In [4]:
michelin = michelin[['Name', 'Address', 'Location', 'Price', 'Cuisine', 'WebsiteUrl', 'Award', 'Longitude', 'Latitude']]
michelin.head()

Unnamed: 0,Name,Address,Location,Price,Cuisine,WebsiteUrl,Award,Longitude,Latitude
0,noma,"Refshalevej 96, Copenhagen, 1432 K, Denmark","Copenhagen, Denmark",€€€€,Creative,https://noma.dk,3 MICHELIN Stars,12.610618,55.683312
1,Maaemo,"Dronning Eufemias gate 23, Oslo, 0194, Norway","Oslo, Norway",€€€€,"Modern Cuisine, Creative",http://www.maaemo.no,3 MICHELIN Stars,10.758636,59.907529
2,Frantzén,"Klara Norra Kyrkogata 26, Stockholm, 111 22, S...","Stockholm, Sweden",€€€€,Modern Cuisine,https://www.restaurantfrantzen.com/,3 MICHELIN Stars,18.059757,59.334167
3,Geranium,"Per Henrik Lings Allé, Parken National Stadium...","Copenhagen, Denmark",€€€€,"Creative, Contemporary",https://www.geranium.dk/,3 MICHELIN Stars,12.572529,55.704085
4,Ta Vie,"2F, The Pottinger Hotel, 21 Stanley Street, Ho...",Hong Kong,$$$$,Innovative,https://www.tavie.com.hk,3 MICHELIN Stars,114.15529,22.282766


In [5]:
# Columns are converted to lowercase for convenience
michelin.columns = michelin.columns.str.lower()

In [6]:
michelin.rename({'websiteurl': 'url'}, axis=1, inplace=True)
print(f"Columns:\n{michelin.columns.tolist()}")

Columns:
['name', 'address', 'location', 'price', 'cuisine', 'url', 'award', 'longitude', 'latitude']


In [7]:
michelin.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6832 entries, 0 to 6831
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       6832 non-null   object 
 1   address    6832 non-null   object 
 2   location   6832 non-null   object 
 3   price      6783 non-null   object 
 4   cuisine    6832 non-null   object 
 5   url        5557 non-null   object 
 6   award      6832 non-null   object 
 7   longitude  6832 non-null   float64
 8   latitude   6832 non-null   float64
dtypes: float64(2), object(7)
memory usage: 480.5+ KB


There exist missing values which will be dealt with once partitioned by location

Range of values for specific columns

In [8]:
ignore = ['longitude', 'latitude', 'url', 'cuisine']

for column in michelin:
    if column in ignore:
        pass
    else:
        print(f"\nUnique {column}s: {michelin[column].unique()}\nTotal Unique: {len(michelin[column].unique())} values")


Unique names: ['noma' 'Maaemo' 'Frantzén' ... 'Zazie' 'Butchery & Wine' 'alewino']
Total Unique: 6687 values

Unique addresss: ['Refshalevej 96, Copenhagen, 1432 K, Denmark'
 'Dronning Eufemias gate 23, Oslo, 0194, Norway'
 'Klara Norra Kyrkogata 26, Stockholm, 111 22, Sweden' ...
 'ul. Józefa 34, Cracow, 32 056, Poland'
 'ul. Żurawia 22, Warsaw, 00 515, Poland'
 'ul. Mokotowska 48, Warsaw, 00 543, Poland']
Total Unique: 6706 values

Unique locations: ['Copenhagen, Denmark' 'Oslo, Norway' 'Stockholm, Sweden' ...
 'Almuñécar, Spain' 'Vitoria-Gasteiz, Spain' 'Warsaw, Poland']
Total Unique: 2645 values

Unique prices: ['€€€€' '$$$$' '$$$' '££££' '¥¥¥' '¥¥¥¥' '₩₩₩₩' '$$' '€€€' '¥¥' '฿฿฿฿'
 '₺₺₺₺' nan '€€' '₫₫' '₫₫₫₫' '£££' '££' '฿฿฿' '฿฿' '¥' '₩₩₩' '₺₺' '₺₺₺'
 '$' '€' '₫' '£' '฿' '₩' '₩₩' '₺']
Total Unique: 32 values

Unique awards: ['3 MICHELIN Stars' '2 MICHELIN Stars' '1 MICHELIN Star' 'Bib Gourmand']
Total Unique: 4 values


----
&nbsp;
## Separate `location` column

- `country` column. eg, 'USA'
- `city` column. eg, 'San Fransisco'

`location` is comma separated "country, city"

In [9]:
# Are there non-comma separated entries in `location`?
no_comma = michelin[~michelin['location'].str.contains(',')]
no_comma['location'].unique().tolist()

['Hong Kong', 'Macau', 'Singapore', 'Dubai', 'Luxembourg', 'Abu Dhabi']

These are all 'city states' or principalities.

In [10]:
# Create a dictionary for special cases
special_cases = {'Hong Kong': 'Hong Kong, Hong Kong SAR China',
                 'Macau': 'Macau, Macau SAR China',
                 'Singapore': 'Singapore, Singapore',
                 'Dubai': 'Dubai, United Arab Emirates',
                 'Luxembourg': 'Luxembourg, Luxembourg',
                 'Abu Dhabi': 'Abu Dhabi, United Arab Emirates'}

In [11]:
# Apply special cases to the 'location' column
michelin['location'] = michelin['location'].replace(special_cases)

# Now split the 'location' column
locations = michelin['location'].str.split(',', expand=True)
locations.columns = ['city', 'country']

# Remove leading or trailing whitespace from 'city' and 'country' columns
locations['city'] = locations['city'].str.strip()
locations['country'] = locations['country'].str.strip()

# Replace the original 'location' column with the new 'country' and 'city' columns
michelin = michelin.drop('location', axis=1).join(locations)

In [12]:
print(f"New Columns: {michelin.columns.tolist()}")

New Columns: ['name', 'address', 'price', 'cuisine', 'url', 'award', 'longitude', 'latitude', 'city', 'country']


In [13]:
print(f"Unique Countries: {michelin['country'].unique()}"
      f"\nTotal Unique = {len(michelin['country'].unique())} values")

Unique Countries: ['Denmark' 'Norway' 'Sweden' 'Hong Kong SAR China' 'Macau SAR China'
 'Netherlands' 'Germany' 'United Kingdom' 'Belgium' 'France' 'Austria'
 'China Mainland' 'USA' 'Spain' 'Japan' 'Italy' 'Switzerland'
 'South Korea' 'Taiwan' 'Singapore' 'Finland' 'Estonia'
 'United Arab Emirates' 'Ireland' 'Luxembourg' 'Thailand' 'Portugal'
 'Hungary' 'Türkiye' 'Greece' 'Canada' 'Slovenia' 'Brazil' 'Iceland'
 'Vietnam' 'Malta' 'Malaysia' 'Andorra' 'Croatia' 'Czech Republic'
 'Poland' 'Serbia']
Total Unique = 42 values


In [14]:
print(f"Unique Cities: {michelin['city'].unique()}"
      f"\nTotal Unique = {len(michelin['city'].unique())} values")

Unique Cities: ['Copenhagen' 'Oslo' 'Stockholm' ... 'Almuñécar' 'Vitoria-Gasteiz'
 'Warsaw']
Total Unique = 2642 values


In [15]:
michelin = michelin[['name', 'address', 'city', 'country', 'price', 'cuisine', 'url', 'award', 'longitude', 'latitude']]
michelin.head()

Unnamed: 0,name,address,city,country,price,cuisine,url,award,longitude,latitude
0,noma,"Refshalevej 96, Copenhagen, 1432 K, Denmark",Copenhagen,Denmark,€€€€,Creative,https://noma.dk,3 MICHELIN Stars,12.610618,55.683312
1,Maaemo,"Dronning Eufemias gate 23, Oslo, 0194, Norway",Oslo,Norway,€€€€,"Modern Cuisine, Creative",http://www.maaemo.no,3 MICHELIN Stars,10.758636,59.907529
2,Frantzén,"Klara Norra Kyrkogata 26, Stockholm, 111 22, S...",Stockholm,Sweden,€€€€,Modern Cuisine,https://www.restaurantfrantzen.com/,3 MICHELIN Stars,18.059757,59.334167
3,Geranium,"Per Henrik Lings Allé, Parken National Stadium...",Copenhagen,Denmark,€€€€,"Creative, Contemporary",https://www.geranium.dk/,3 MICHELIN Stars,12.572529,55.704085
4,Ta Vie,"2F, The Pottinger Hotel, 21 Stanley Street, Ho...",Hong Kong,Hong Kong SAR China,$$$$,Innovative,https://www.tavie.com.hk,3 MICHELIN Stars,114.15529,22.282766


----

## `awards` columns

In the Michelin dataset, we have a column named 'awards' that designates the level of recognition a restaurant has achieved according to Michelin's rating system. These awards are '3 MICHELIN Stars', '2 MICHELIN Stars', '1 MICHELIN Star', and 'Bib Gourmand', which is a different award for good quality, good value restaurants.

However, in order to make the analysis more tractable and to create a more uniform scale, we transform these awards into numerical values. This transformation will allow us to perform quantitative analysis and make mathematical computations with this data, which wouldn't be possible with the original textual data.

The '3 MICHELIN Stars', '2 MICHELIN Stars', and '1 MICHELIN Star' awards are straightforwardly transformed into the numerical values 3, 2, and 1, respectively. However, the 'Bib Gourmand' award doesn't fit directly into this star system. In consideration of the prestige and value attached to this award, we've decided to map 'Bib Gourmand' to the value 0.5. It's important to note that this decision, while somewhat arbitrary, is made with the understanding that the 'Bib Gourmand' recognizes a different aspect of restaurant quality and is not strictly comparable to the star awards.

In [16]:
award_dict = {'3 MICHELIN Stars': 3, '2 MICHELIN Stars': 2, '1 MICHELIN Star': 1, 'Bib Gourmand': 0.5}
michelin['stars'] = michelin['award'].replace(award_dict)

In [17]:
cols = michelin.columns.tolist()

cols.remove('stars')
# insert 'stars' at the desired position next to 'award' which we retain
cols.insert(-2, 'stars')

# reindex the DataFrame
michelin = michelin.reindex(columns=cols)

In [18]:
michelin.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6832 entries, 0 to 6831
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       6832 non-null   object 
 1   address    6832 non-null   object 
 2   city       6832 non-null   object 
 3   country    6832 non-null   object 
 4   price      6783 non-null   object 
 5   cuisine    6832 non-null   object 
 6   url        5557 non-null   object 
 7   award      6832 non-null   object 
 8   stars      6832 non-null   float64
 9   longitude  6832 non-null   float64
 10  latitude   6832 non-null   float64
dtypes: float64(3), object(8)
memory usage: 587.2+ KB


----
&nbsp;
## `price` column

There is also the price column to organise which lists a number of different currencies. It is unclear if, for example, $ refers to USD, HKD etc..

This attribute is easier to deal with piecewise by country

----
&nbsp;
## Export `UK` and `France` datasets for further analysis

In [19]:
# Filter the DataFrame for records where country == 'UK'
uk_data = michelin[michelin['country'] == 'United Kingdom']
uk_data.head()

Unnamed: 0,name,address,city,country,price,cuisine,url,award,stars,longitude,latitude
25,Sketch (The Lecture Room & Library),"9 Conduit Street, London, W1S 2XG, United Kingdom",London,United Kingdom,££££,Modern French,https://sketch.london/,3 MICHELIN Stars,3.0,-0.141537,51.512678
26,L'Enclume,"Cavendish Street, Cartmel, LA11 6PZ, United Ki...",Cartmel,United Kingdom,££££,Creative British,https://www.lenclume.co.uk/,3 MICHELIN Stars,3.0,-2.953857,54.201725
27,Alain Ducasse at The Dorchester,"Park Lane, London, W1K 1QA, United Kingdom",London,United Kingdom,££££,French,https://www.alainducasse-dorchester.com/,3 MICHELIN Stars,3.0,-0.152575,51.507338
28,Hélène Darroze at The Connaught,"Carlos Place, London, W1K 2AL, United Kingdom",London,United Kingdom,££££,Modern Cuisine,https://www.the-connaught.co.uk/restaurants-ba...,3 MICHELIN Stars,3.0,-0.14929,51.510188
29,Fat Duck,"High Street, Bray, SL6 2AQ, United Kingdom",Bray,United Kingdom,££££,Creative,https://thefatduck.co.uk/,3 MICHELIN Stars,3.0,-0.701753,51.507858


In [20]:
# Export the UK data to a csv file
uk_data.to_csv('../data/UK/uk_data.csv', index=False)

In [22]:
# Filter the DataFrame for records where country == 'France'
france_data = michelin[michelin['country'] == 'France']
france_data.head()

Unnamed: 0,name,address,city,country,price,cuisine,url,award,stars,longitude,latitude
36,La Vague d'Or - Cheval Blanc St-Tropez,"Plage de la Bouillabaisse, Saint-Tropez, 83990...",Saint-Tropez,France,€€€€,"Creative, Modern Cuisine",https://www.chevalblanc.com/fr/maison/st-tropez/,3 MICHELIN Stars,3.0,6.626154,43.266585
37,René et Maxime Meilleur,"Hameau de Saint-Marcel, Saint-Martin-de-Belle...",Saint-Martin-de-Belleville,France,€€€€,"Creative, Regional Cuisine",https://www.la-bouitte.com/fr/,3 MICHELIN Stars,3.0,6.513306,45.369046
38,Kei,"5 rue du Coq-Héron, Paris, 75001, France",Paris,France,€€€€,"Modern Cuisine, Creative",https://www.restaurant-kei.fr/,3 MICHELIN Stars,3.0,2.342285,48.864395
39,Auberge du Vieux Puits,"5 avenue Saint-Victor, Fontjoncouse, 11360, Fr...",Fontjoncouse,France,€€€€,Creative,https://www.aubergeduvieuxpuits.fr/fr/,3 MICHELIN Stars,3.0,2.789329,43.048173
40,Régis et Jacques Marcon,"Larsiallas, Saint-Bonnet-le-Froid, 43290, France",Saint-Bonnet-le-Froid,France,€€€€,Creative,https://www.lesmaisonsmarcon.fr/,3 MICHELIN Stars,3.0,4.434268,45.138673


In [23]:
# Define and export Monaco
monaco = france_data[france_data['city'] == 'Monaco']
monaco.to_csv('../data/France/monaco.csv', index=False)

We remove Monaco from metropolitan France

In [24]:
france_data = france_data[france_data['city'] != 'Monaco']
france_data.shape

(1033, 11)

In [25]:
# Export the France data to a csv file
france_data.to_csv('../data/France/france_master.csv', index=False)

----
&nbsp;
## Restaurants could be further partitioned by country from this point