# Michelin Star Restaurant Guide Dashboard

## Dataset Attributes

- **Name**: The name of the Michelin-starred restaurant.
- **Address**: The full street address of the restaurant.
- **Location**: The city and country where the restaurant is located.
- **Price**: Price range indicator, using $ symbols (e.g. $$$$ for very expensive).
- **Cuisine**: The type or style of cuisine served at the restaurant.
- **Longitude**: The geographic longitude coordinate of the restaurant's location.
- **Latitude**: The geographic latitude coordinate of the restaurant's location.
- **PhoneNumber**: The contact phone number for the restaurant.
- **Url**: The URL of the restaurant's page on the official Michelin Guide website.
- **WebsiteUrl**: The URL of the restaurant's own official website.
- **Award**: The Michelin star rating awarded to the restaurant (e.g. "3 Stars").
- **GreenStar**: A binary indicator (0 or 1) of whether the restaurant has received a Michelin Green Star for sustainability.
- **FacilitiesAndServices**: A list of amenities and services offered by the restaurant.
- **Description**: A brief description of the restaurant, often including details about the chef and cuisine.

## Dependency

In [None]:
# %pip install -r .\requirements-dev.txt
# %pip install -q pandas plotly dash dash-bootstrap-components pyarrow python-dotenv ipykernel nbformat

# %pip freeze > requirements.txt # WARNING!! run this only on a linux distro or wsl with only prod dependencies

### Imports

In [None]:
import pandas as pd

# import pyarrow as pa
import plotly.express as px
import plotly.io as pio

from pandas import DataFrame
from IPython.core.interactiveshell import InteractiveShell

pio.renderers.default = "notebook_connected"


InteractiveShell.ast_node_interactivity = "all"

pd.set_option("display.max_columns", None)
pd.options.mode.copy_on_write = True

## Dataset

In [None]:
from src.data_cleaning import CSV_PATH, read_csv


df = read_csv(CSV_PATH)

In [None]:
df.head()

## Data cleaning

In [None]:
df.info()

In [None]:
from src.data_cleaning import (
    clean_data,
    select_unique_location_city_where_location_country_is_missing,
)


df_clean = clean_data(df.copy())
df_clean.head()


missing_countries = select_unique_location_city_where_location_country_is_missing(
    df_clean
)
if missing_countries.size > 0:
    missing_countries
    raise Exception("Missing countries found")

### FacilitiesAndServices columns

In [None]:
from src.data_cleaning import get_facilitiesandservices_df


df_facilitiesandservices = get_facilitiesandservices_df(df_clean)
df_facilitiesandservices.head()

### Cuisine columns

In [None]:
from src.data_cleaning import get_cuisine_df


df_cuisine = get_cuisine_df(df_clean)
df_cuisine.head()

## Data quality

### Duplicate rows

#### Primary column

In [None]:
primary_col = df_clean[["Name", "Address"]].value_counts()

if primary_col[primary_col > 1].size > 0:
    primary_col[primary_col > 1]
    raise Exception("Duplicate records found")

### Missing values

In [None]:
_ = df_clean.isna().sum()
_[_ > 0]

## EDA

In [None]:
pd.concat(
    [
        df_clean.describe(include=["object"]).loc[
            :,
            [
                "Location_city",
                "Location_country",
                "Standardized_Price",
                "Award",
            ],
        ],
        df["GreenStar"].astype("object").describe(),
        df_cuisine.describe()["Cuisine"],
        df_facilitiesandservices.describe()["FacilitiesAndServices"],
    ],
    axis=1,
)


### Awards

In [None]:
df_clean.groupby("Award")["Name"].count().sort_values(ascending=False)

## Plots

### Scatter map

In [None]:
from src.figures import award_by_city_scattermap

fig = award_by_city_scattermap(df_clean, "Dubai")
fig.show()


### Bar

In [None]:
from src.figures import awards_by_city_bar


fig = awards_by_city_bar(df_clean, "Dubai")
fig.show()

### Table

In [None]:
# TODO def facilitiesandservices_by_city_table():