# Serving Data with Dash: Exploring the Michelin Star Restaurant Guide

The Michelin Guide has long been synonymous with culinary excellence, serving as a global benchmark for top-tier dining experiences. In this blog post, I take on the [Plotly Autumn App Challenge](https://community.plotly.com/t/autumn-app-challenge/87373) of exploring Michelin Star Restaurant Guide data using Dash, a powerful Python framework for building analytical web applications. The dataset is provided by Jerry Ng on [Kaggle](https://www.kaggle.com/datasets/ngshiheng/michelin-guide-restaurants-2021). As per the challenge, the goal is to create a dashboard that reveals insights from the dataset, has great UI/UX design and creative usage of Plotly maps. I’ll walk you through my approach to data exploration, visualization, and the design decisions that went into building the final app.

## Data exploration

The first step of exploring the dataset, was checking which columns were available:

Columns in dataset:
* **Name**: name of the restaurant.
* **Address**: Address of the restaurant.
* **Location**: Location of the restaurant.
* **Price**: Price range of the restaurant (e.g., $, $$, $$$).
* **Cuisine**: Type of cuisines served at the restaurant.
* **Longitude**: Longitude coordinates of the restaurant.
* **Latitude**: Latitude coordinates of the restaurant.
* **PhoneNumber**: Contact phone number of the restaurant.
* **Url**: MICHELIN Guide URL of the restaurant's listing.
* **WebsiteUrl**: URL of the restaurant's official website.
* **Award**: The culinary distinctions.
* **GreenStar**: Award for sustainable restaurant practices.
* **FacilitiesAndServices**: A list of facilities and services offered by the restaurant.
* **Description**: A short description of the restaurant.

### Data loading

After checking the initial CSV dataset, I'll load the dataset into a Pandas DataFrame.

In [None]:
import pandas as pd
from plotly.offline import init_notebook_mode
import plotly.express as px

init_notebook_mode(connected=True)

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/michelin_by_Jerry_Ng.csv")

In [None]:
df.head()

### Data cleaning

After loading the data, I'll preform a few data cleaning operations. First, for the column 'FacilitiesAndServices', the comma delimited string is converted to a list.

In [None]:
df["FacilitiesAndServices"] = df["FacilitiesAndServices"].str.split(",")
df["FacilitiesAndServices"]

Next, I convert the GreenStar column values (0,1) to a boolean field.

In [None]:
df["GreenStar"] = df["GreenStar"].astype(bool)
df["GreenStar"].dtype

In [None]:
# df["Cuisine"] = df["Cuisine"].str.split(", ")

In [None]:
df["Cuisine"].value_counts()

### Features

To simplify filtering and visualizing the data later on, I will add a number of features. To start, the dataset only contains a 'Location' column. This column will be used to create two new columns: 'City' and 'Country'.

In [None]:
df["Location"].str.count(",").value_counts()

In [None]:
df[["City", "Country"]] = df["Location"].str.split(", ", n=1, expand=True)
df[["Location", "City", "Country"]]

In [None]:
df[df["Country"].isna()]["Location"].unique()

As shown above, the current implementation doesn't account for locations where the city and country are the same value (e.g. Singapore). We will fix this with the following line of code:

In [None]:
df.loc[df["Country"].isnull(), "Country"] = df["Location"]

In [None]:
df[df["Country"].isna()]["Location"].unique()

Next, I apply the size mapping, as shown in the Plotly example.

In [None]:
def size_mapping(award):
    if award == "3 Stars":
        return 30
    elif award == "2 Stars":
        return 15
    elif award == "1 Star":
        return 10
    elif award == "Bib Gourmand":
        return 5
    else:
        return 2


df["Award (Map Size)"] = df["Award"].apply(size_mapping)

During the exploration, I also noticed that the price is shown in different currencies. This makes it difficult to visualize the data. Adding a new column, in which I will 'normalize' the price ranges, will make this easier.

In [None]:
def price_mapping(price):
    if pd.isna(price):
        return price

    length = len(price)

    if length == 1:
        return "Budget-Friendly"
    elif length == 2:
        return "Moderate"
    elif length == 3:
        return "Premium"
    elif length == 4:
        return "Luxury"
    raise ValueError("Unknown price range")


df["Price (normalized)"] = df["Price"].apply(price_mapping)
df["Price (normalized)"].value_counts()

## Stats

After cleaning the data and creating some features, we dive in to the analysis. 

In [None]:
number_of_countries = len(df[df["Country"].notna()]["Country"].unique())
print(f"Number of countries: {number_of_countries}")

In [None]:
number_of_restaurants = len(df.index)
print(f"Number of restaurants: {number_of_restaurants}")

In [None]:
top_cuisine = df["Cuisine"].value_counts().index[:5]
print("Top 5 cuisine:")
for i, cuisine in enumerate(top_cuisine, start=1):
    print(f"{i}. {cuisine[0]}")

There are 15.520 restaurants in the Micheline Guide, which are located in 49 different countries. The top cuisine is the 'Modern Cuisine', followed by 'Traditional Cuisine'.

### Visualizations

#### Awards Distribution

The following distinctions are available for restaurants to receive:

* **3 Stars**: Exceptional cuisine
* **2 Stars**: Excellent cooking
* **1 Star**: High quality cooking
* **Bib Gourmand**: Good quality, good value cooking
* **Selected Restaurants**: Good cooking


In [None]:
award_counts = df["Award"].value_counts()
award_counts

In [None]:
fig = px.bar(award_counts, labels={"value": "Number of restaurants"}, text_auto=True)
fig.update(layout_showlegend=False)
fig.show()

The visualization shows that less that only 145 of the restaurants in the Michelin Guide have a 3 Star distinction. This is less than 1 percent!

#### Cuisine popularity

In [None]:
cuisine_counts = df["Cuisine"].value_counts()[:10]
cuisine_counts

In [None]:
fig = px.bar(cuisine_counts, labels={"value": "Number of restaurants"}, text_auto=True)
fig.update(layout_showlegend=False)
fig.show()

We see that the 'Modern Cuisine' is the most popular cuisine, by a large margin. 

#### Price Range Distribution

In [None]:
price_counts = df["Price (normalized)"].value_counts()
price_counts

In [None]:
fig = px.bar(
    price_counts,
    labels={"value": "Number of restaurants"},
    text_auto=True,
    category_orders={"Price (normalized)": ["Budget-Friendly", "Moderate", "Premium", "Luxury"]},
)
fig.update(layout_showlegend=False)
fig.show()

#### Green Stars

In [None]:
greenstar_counts = df["GreenStar"].value_counts()
greenstar_counts

In [None]:
fig = px.bar(greenstar_counts, labels={"value": "Number of restaurants"}, text_auto=True)
fig.update(layout_showlegend=False)
fig.show()

#### Top Locations

In [None]:
top_locations = df["City"].value_counts()[:10]
top_locations

In [None]:
fig = px.bar(top_locations, labels={"value": "Number of restaurants"}, text_auto=True)
fig.update(layout_showlegend=False)
fig.show()

#### Top Countries

In [None]:
top_countries = df["Country"].value_counts()[:10]
top_countries

In [None]:
fig = px.bar(top_countries, labels={"value": "Number of restaurants"}, text_auto=True)
fig.update(layout_showlegend=False)
fig.show()