# Exploration

Exploration of data provided by Jerry Ng on [Kaggle](https://www.kaggle.com/datasets/ngshiheng/michelin-guide-restaurants-2021) on the Michelin Star Restaurant Guide. 

Available columns:
* Name
* Address
* Location
* MinPrice
* MaxPrice
* Currency
* Longitude
* Latitude
* PhoneNumber
* Url (Link to the restaurant on guide.michelin.com)
* WebsiteUrl (The restaurant's website)
* Award (1 to 3 MICHELIN Stars and Bib Gourmand)

## Data loading

In [None]:
import pandas as pd
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import Bar

init_notebook_mode(connected=True)

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/michelin_by_Jerry_Ng.csv")

In [None]:
df.head()

## Data cleaning

First, for the column 'FacilitiesAndServices', the comma delimited string is converted to a list.

In [None]:
df["FacilitiesAndServices"] = df["FacilitiesAndServices"].str.split(",")
df["FacilitiesAndServices"]

Lastly, we convert the GreenStar column values (0,1) to a boolean field.

In [None]:
df["GreenStar"] = df["GreenStar"].astype(bool)
df["GreenStar"].dtype

In [None]:
df["Cuisine"] = df["Cuisine"].str.split(", ")

In [None]:
df["Cuisine"].value_counts()

## Features

To simplify creating the visualization later on, we will add a number of features. 

To start, the dataset only contains a 'Location' column. This column will be used to create two new columns: 'City' and 'Country'.

In [None]:
df["Location"].str.count(",").value_counts()

In [None]:
df[["City", "Country"]] = df["Location"].str.split(", ", n=1, expand=True)
df[["Location", "City", "Country"]]

In [None]:
df[df["Country"].isna()]["Location"].unique()

As shown above, the current implementation doesn't account for locations where the city and country are the same value (e.g. Singapore). We will fix this with the following code

In [None]:
df.loc[df["Country"].isnull(), "Country"] = df["Location"]

In [None]:
df[df["Country"].isna()]["Location"].unique()

## Stats

In [None]:
number_of_countries = len(df[df["Country"].notna()]["Country"].unique())
print(f"Number of countries: {number_of_countries}")

In [None]:
number_of_restaurants = len(df["Name"].unique())
print(f"Number of restaurants: {number_of_restaurants}")

In [None]:
top_cuisine = df["Cuisine"].value_counts().index[0][0]
print(f"Top cuisine: {top_cuisine}")

## Visualization

In [None]:
award_counts = df["Award"].value_counts()

iplot([Bar(x=award_counts.index, y=award_counts)])