<a href="https://colab.research.google.com/github/renan-peres/mfin-python-restaurant-data-analysis/blob/main/restaurant-data-analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Restaurant Data Analysis**
![Restaurant Analysis Introduction](https://github.com/renan-peres/mfin-python-restaurant-data-analysis/blob/main/images/intro.jpeg?raw=1)

### **Overview**
This is a project developed by Team 5 for the **Python for Data Analysts: Methods & Tools - DAT-7466 - BMFIN1** Course (Spring 2025) at Hult International business School -- led by Professor [Michael de la Maza](https://www.linkedin.com/in/michaeldelamaza/).

The purpose of the analysis is to evaluate the performace for a restaurant chain located in the **New York City** during the year of 2018 and indentify trends and potentia areas that the restaurant could expand to.

### **Team 5 Members**
- [Daniela Salgari](https://www.linkedin.com/in/daniela-salgar/)
- [Alessandro Frullani](https://www.linkedin.com/in/alessandro-frullani-8526b4132/)
- [Gianmaria Betta](https://www.linkedin.com/in/gianmariabetta/)
- [Marco Primatesta](https://www.linkedin.com/in/marco-primatesta/)
- [Renan Peres](https://www.linkedin.com/in/renanperes/)

### **Contents**
- [Prepare Environmnet](#prepare-environment)
- [Download & Import Datasets](#download--import-datasets)
- [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
- [Data Cleaning & Transformation](#data-cleaning--transformation)
- [Data Analysis & Visualization](#data-analysis--visualization)
  - [Orders](#orders)
  - [Items](#items)
  - [Order Type](#order-type)
- [Insights & Recommendations](#insights--recommendations)
  - [Where should the new restaurant be located?](#q1-where-should-the-new-restaurant-be-located)
  - [Which items should be included in the new restaurant?](#q2-which-items-should-be-included-in-the-new-restaurant)
  - [What type of restaurant should it be?](#q3-what-type-of-restaurant-should-it-be)

## **Prepare Environment**

Have a jupyter environment ready, and `pip install` these libraries:


In [50]:
!pip install -q gdown -q highcharts-core -q kaleido
import gdown # Google Drive Connector

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import plotly.express as px
import plotly.graph_objects as go
from highcharts_core.chart import Chart

from typing import Tuple, List, Dict
import os
import warnings
from IPython.display import HTML
from IPython.display import Image
import IPython

# Suppress FutureWarning and FormatterWarning
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=IPython.core.formatters.FormatterWarning)

default_colors = [
    '#2E86C1',  # Ocean Blue
    '#E74C3C',  # Bright Red
    '#2ECC71',  # Emerald Green
    '#F39C12',  # Orange
    '#9B59B6',  # Amethyst Purple
    '#1ABC9C',  # Turquoise
    '#E67E22',  # Carrot Orange
    '#34495E',  # Dark Slate
    '#27AE60',  # Nephritis Green
    '#8E44AD',  # Wisteria Purple
    '#3498DB',  # Bright Blue
    '#D35400',  # Pumpkin Orange
    '#16A085',  # Green Sea
    '#C0392B',  # Dark Red
    '#7D3C98',  # Deep Purple
    '#2980B9',  # Belize Blue
    '#F1C40F',  # Sunflower Yellow
    '#17A589',  # Light Sea Green
    '#E91E63',  # Pink
    '#5D6D7E'   # Slate Gray
    ]

# Sort months chronologically
month_order = ['January',
               'February',
               'March',
               'April',
               'May',
               'June',
               'July',
               'August',
               'September',
               'October',
               'November',
               'December'
               ]

## **Download & Import Datasets**
The datasets used in this report was retrieved from following directory: https://drive.google.com/drive/folders/1GtIfSS0K3wyBkOPD0jOy7V8fIXW3JEfj

In [None]:
# Define the directory URL
directory_url = 'https://drive.google.com/drive/folders/1GtIfSS0K3wyBkOPD0jOy7V8fIXW3JEfj'

# Try to download files first, if fails, use local path
try:
    # Attempt to download all files within the directory
    gdown.download_folder(url=directory_url, quiet=False, use_cookies=False)
    base_path = '/content/Student Data/Copy of'
except Exception as e:
    print(f"Failed to download files: {e}")
    print("Using local path instead...")
    base_path = '/content'

# Dictionary of file paths and their corresponding dataframe names
files = {
    'items.pickle': 'df_items',
    'restaurants.pickle': 'df_restaurants',
    'orders.pickle': 'df_orders',
    'orders_7.pickle': 'df_orders_7',
    'students.pickle': 'df_students',
    'summarized_orders.pickle': 'df_summarized_orders'
}

# Load pickle files
for file, df_name in files.items():
    try:
        if base_path == '/content/Student Data/Copy of':
            globals()[df_name] = pd.read_pickle(f'{base_path} {file}')
        else:
            globals()[df_name] = pd.read_pickle(f'{base_path}/{file}')
    except Exception as e:
        print(f"Error loading {file}: {e}")

# Load Excel file separately since it has different extension
try:
    if base_path == '/content/Student Data/Copy of':
        df_university = pd.read_excel(f'{base_path} university.xlsx')
    else:
        df_university = pd.read_excel(f'{base_path}/university.xlsx')
except Exception as e:
    print(f"Error loading university.xlsx: {e}")

Retrieving folder contents


## **Exploratory Data Analysis (EDA)**

### Profile Initial Data

In [None]:
def profile_table(table_name: str, df: pd.DataFrame) -> Tuple[str, int, int, int, int, List[str], Dict[str, float], List[str]]:
    print(f"\n{'='*80}\nStarting to process table: {table_name}\n{'='*80}")

    # Calculate metrics and gather information
    total_rows = df.shape[0]
    total_columns = df.shape[1]
    null_rows = df.isnull().any(axis=1).sum()
    duplicate_rows = df.duplicated().sum()

    # Return the gathered information as a tuple
    return table_name, total_rows, total_columns, null_rows, duplicate_rows

# List of your DataFrames
dataframes = [df_items, df_restaurants, df_orders, df_orders_7, df_students, df_summarized_orders, df_university]
dataframe_names = ['df_items', 'df_restaurants', 'df_orders', 'df_orders_7', 'df_students', 'df_summarized_orders', 'df_university']

# Create an empty list to store the results
results = []

# Loop through all tables and profile them
for i, df in enumerate(dataframes):
    result = profile_table(dataframe_names[i], df)
    results.append(result)
    print(f"Completed processing table: {dataframe_names[i]}")
    print("---\n")

# Create a DataFrame from the results
result_df = pd.DataFrame({
    "table_name": [r[0] for r in results],
    "total_rows": [r[1] for r in results],
    "total_columns": [r[2] for r in results],
    "null_rows": [r[3] for r in results],
    "duplicate_rows": [r[4] for r in results]
})

# Display the results
display(result_df)

### Inspect Data Types

In [None]:
def get_dataframe_info(dataframes, dataframe_names):
  """
  Gets column names and data types for multiple dataframes.

  Args:
      dataframes: A list of pandas DataFrames.
      dataframe_names: A list of strings, where each string is the name of the corresponding DataFrame.

  Returns:
      A pandas DataFrame containing the dataframe name, column name, and data type.
  """

  all_info = []
  for df, name in zip(dataframes, dataframe_names):
    if isinstance(df, pd.DataFrame):  # Check if it's actually a DataFrame
      for col in df.columns:
        all_info.append([name, col, df[col].dtype])
    else:
      print(f"Warning: {name} is not a DataFrame. Skipping...")

  return pd.DataFrame(all_info, columns=["dataframe", "column_name", "type"])

# Example usage (replace with your actual dataframes and names)
dataframes = [df_items, df_restaurants, df_orders, df_orders_7, df_students, df_summarized_orders, df_university]
dataframe_names = ['df_items', 'df_restaurants', 'df_orders', 'df_orders_7', 'df_students', 'df_summarized_orders', 'df_university']

# Display the results
get_dataframe_info(dataframes, dataframe_names)

## **Data Cleaning & Transformation**

### Drop Null Rows

In [None]:
def clean_dataframes(result_df: pd.DataFrame, dataframe_names: List[str], dataframes: List[pd.DataFrame]) -> None:
    """
    Cleans the dataframes by dropping null rows and null columns based on the profiling results.

    Parameters:
        result_df (pd.DataFrame): The DataFrame containing profiling results.
        dataframe_names (List[str]): List of names corresponding to the DataFrames.
        dataframes (List[pd.DataFrame]): List of DataFrames to be cleaned.
    """
    for i in range(len(dataframes)):
        df_name = dataframe_names[i]
        df = dataframes[i]
        # Get the corresponding row from result_df (same order)
        row = result_df.iloc[i]

        # Drop rows with any null values if there were null rows
        if row['null_rows'] > 0:
            df.dropna(axis=0, how='any', inplace=True)

# Usage after profiling
clean_dataframes(result_df, dataframe_names, dataframes)

# Optionally, reprofile to verify cleaning
results_after_cleaning = []
for i, df in enumerate(dataframes):
    result = profile_table(dataframe_names[i], df)
    results_after_cleaning.append(result)

result_after_clean_df = pd.DataFrame({
    "table_name": [r[0] for r in results_after_cleaning],
    "total_rows": [r[1] for r in results_after_cleaning],
    "total_columns": [r[2] for r in results_after_cleaning],
    "null_rows": [r[3] for r in results_after_cleaning],
    "duplicate_rows": [r[4] for r in results_after_cleaning]
})

display(result_after_clean_df)

### Cast Data Types

In [None]:
# Convert 'OPENING_DATE' and 'DELIVERY_START' columns to datetime objects
for df_name in ['df_restaurants']:
    try:
        df = globals()[df_name]
        for col in ['OPENING_DATE', 'DELIVERY_START']:
            if col in df.columns:
                df[col] = pd.to_datetime(df[col], errors='coerce')  # Use errors='coerce' to handle invalid dates
    except KeyError:
        print(f"DataFrame '{df_name}' or column not found.")
    except Exception as e:
        print(f"An error occurred while converting '{df_name}' column: {e}")

# Display the results
get_dataframe_info(dataframes, dataframe_names)

### Add Date Dimensions

In [None]:
df_orders_7['DATE'] = pd.to_datetime(df_orders_7['DATETIME']).dt.date
df_orders_7['HOUR'] = pd.to_datetime(df_orders_7['DATETIME']).dt.hour
df_orders_7['QUARTER'] = pd.to_datetime(df_orders_7['DATETIME']).dt.quarter
df_orders_7['YEAR'] = pd.to_datetime(df_orders_7['DATETIME']).dt.year

df_orders = df_orders_7
df_orders.head()

### Normalize Data (Orders)

In [None]:
# Assuming 'df_orders' is your DataFrame
def normalize_item_column(df):
    """Normalizes 'MAIN_NAME', 'BASE_NAME', 'SIDE_1_NAME', 'SIDE_2_NAME' into an 'ITEM_NAME'."""

    # Melt the DataFrame to combine the columns into a single column
    df_melted = pd.melt(df,
                        id_vars=df.columns.difference(['MAIN_NAME', 'BASE_NAME', 'SIDE_1_NAME', 'SIDE_2_NAME']),
                        value_vars=['MAIN_NAME', 'BASE_NAME', 'SIDE_1_NAME', 'SIDE_2_NAME'],
                        var_name='ITEM_CATEGORY',
                        value_name='ITEM_NAME')

    # Remove rows where 'ITEM_CATEGORY' is NaN
    df_melted = df_melted.dropna(subset=['ITEM_CATEGORY'])

    return df_melted

# Select Only Columns Needed
selected_columns = ['ORDER_ID', 'DATE', 'HOUR', 'QUARTER',	'YEAR', 'TYPE', 'RESTAURANT_NAME', 'MAIN_NAME', 'BASE_NAME', 'SIDE_1_NAME', 'SIDE_2_NAME']
df_orders_norm = df_orders[selected_columns]

# Display the results
df_orders_norm = normalize_item_column(df_orders_norm).sort_values(by='ORDER_ID').reset_index(drop=True)
df_orders_norm.head()

## DataFrames Available for Analysis

In [None]:
# Print the names of all dataframes in the current namespace, excluding those starting with "_" and "dataframe_info"
for var_name in dir():
    if isinstance(globals()[var_name], pd.DataFrame) and not var_name.startswith("_") and var_name != "dataframe_info":
        # Print the variable name if it meets the conditions
        print(f"DataFrame name: {var_name}")
        # Display the top 6 rows of the DataFrame
        display(globals()[var_name].head(6))
        print("-" * 20)  # Print a separator line for clarity

## **Data Analysis & Visualization**

### Orders

#### Data Preparation

In [None]:
orders_total = (df_summarized_orders
                     .groupby('DATE')
                     .agg({'NUM_ORDERS': 'sum', 'PERC_DELIVERY': 'mean'})
                     .reset_index())

# Convert 'DATE' column to datetime objects if not already
orders_total['DATE'] = pd.to_datetime(orders_total['DATE'])

# Extract month names and aggregate orders by month
monthly_orders = (orders_total
                 .assign(MONTH_NAME=orders_total['DATE'].dt.strftime('%B'),
                        MONTH_NUM=orders_total['DATE'].dt.month)  # Add month number for sorting
                 .groupby('MONTH_NAME')['NUM_ORDERS']
                 .sum()
                 .reset_index())

# Sort by month number
monthly_orders['MONTH_NAME'] = pd.Categorical(monthly_orders['MONTH_NAME'],
                                            categories=month_order,
                                            ordered=True)

monthly_orders = monthly_orders.sort_values('MONTH_NAME').reset_index(drop=True)
monthly_orders

In [None]:
orders_restaurant = (df_summarized_orders
                     .sort_values(by=['DATE', 'NUM_ORDERS'], ascending=[True, False])
                     .reset_index(drop=True))

# Convert 'DATE' column to datetime objects
orders_restaurant['DATE'] = pd.to_datetime(orders_restaurant['DATE'])

# Extract month names and aggregate orders by month and restaurant
monthly_restaurant_orders = (orders_restaurant
    .assign(MONTH_NAME=orders_restaurant['DATE'].dt.strftime('%B'))
    .groupby(['MONTH_NAME', 'RESTAURANT_NAME'])
    .agg({
        'NUM_ORDERS': 'sum',
        'PERC_DELIVERY': 'mean'  # Taking average of delivery percentage
    })
    .reset_index())

monthly_restaurant_orders['MONTH_NAME'] = pd.Categorical(
    monthly_restaurant_orders['MONTH_NAME'],
    categories=month_order,
    ordered=True
)

monthly_restaurant_orders = monthly_restaurant_orders.sort_values(['MONTH_NAME', 'RESTAURANT_NAME']).reset_index(drop=True)

# First, calculate total orders per restaurant across all months
total_orders_per_restaurant = orders_restaurant.groupby('RESTAURANT_NAME')['NUM_ORDERS'].sum().reset_index()

# Merge with the restaurant locations and monthly data
restaurant_orders_map = (orders_restaurant
    .merge(df_restaurants, left_on='RESTAURANT_NAME', right_on='NAME', how='left')
    .merge(total_orders_per_restaurant,
           on='RESTAURANT_NAME',
           suffixes=('_monthly', '_total'))
)

# Group by month and restaurant while keeping the total orders
monthly_restaurant_map = (restaurant_orders_map
    .assign(MONTH_NAME=lambda x: x['DATE'].dt.strftime('%B'))
    .groupby(['RESTAURANT_NAME', 'LAT', 'LONG', 'MONTH_NAME', 'NUM_ORDERS_total'])
    .agg({'NUM_ORDERS_monthly': 'sum'})
    .reset_index())

monthly_restaurant_map

In [None]:
orders_restaurant_total = (monthly_restaurant_map
                         [['RESTAURANT_NAME', 'NUM_ORDERS_total']]
                         .reset_index(drop=True)
                         .drop_duplicates()
                         .sort_values(by=['NUM_ORDERS_total'], ascending=False)
                         .reset_index(drop=True)
                         .rename(columns={'NUM_ORDERS_total': 'TOTAL_ORDERS'}))
orders_restaurant_total

In [None]:
orders_restaurant_monthly_avg = (monthly_restaurant_map
                                 .groupby('RESTAURANT_NAME')[['NUM_ORDERS_monthly']]
                                 .mean()
                                 .sort_values(by=['NUM_ORDERS_monthly'], ascending=False)
                                 .reset_index()
                                 .rename(columns={'NUM_ORDERS_monthly': 'MONTHLY_AVG'}))
orders_restaurant_monthly_avg

In [None]:
orders_restaurant_daily_avg = (orders_restaurant
                               .groupby('RESTAURANT_NAME')[['NUM_ORDERS']]
                               .mean()
                               .sort_values(by=['NUM_ORDERS'], ascending=False)
                               .reset_index()
                               .rename(columns={'NUM_ORDERS': 'DAILY_AVG'}))
orders_restaurant_daily_avg

#### Visualizations

In [None]:
# Create color mapping for restaurants
restaurant_color_map = dict(zip(
    orders_restaurant_total['RESTAURANT_NAME'].unique(),
    [default_colors[i % len(default_colors)] for i in range(len(orders_restaurant_total['RESTAURANT_NAME'].unique()))]
))

display(restaurant_color_map)

##### Total Orders (by Month)

In [None]:
# Create line chart using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=monthly_orders['MONTH_NAME'],
    y=monthly_orders['NUM_ORDERS'],
    mode='lines+markers'
))

# Update layout
fig.update_layout(
    title='2018 Total Orders (by Month)',
    xaxis_title='Month',
    yaxis_title='Number of Orders',
    width=1000,
    height=500
)

# Show the plot
fig.show()

##### Total Orders (by Restaurant)

In [None]:
orders_restaurant_total = orders_restaurant_total.sort_values('TOTAL_ORDERS', ascending=True)

# Create the horizontal bar chart with custom colors
plt.figure(figsize=(12, 6))
bars = plt.barh(
    orders_restaurant_total['RESTAURANT_NAME'],
    orders_restaurant_total['TOTAL_ORDERS'],
    color=[restaurant_color_map[restaurant] for restaurant in orders_restaurant_total['RESTAURANT_NAME']]
)

plt.ylabel('Restaurant Name')
plt.xlabel('Total Orders')
plt.title('2018 Total Orders per Restaurant')

# Format x-axis ticks with commas
plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p: format(int(x), ',')))

# Add values inside the bars, at the end
for bar in bars:
    width = bar.get_width()
    plt.text(
        width - 50,  # Position text near the end of the bar
        bar.get_y() + bar.get_height() / 2,
        f'{int(width):,}',  # Format value with commas
        ha='right',  # Align text to the right
        va='center',  # Vertically center the text
        color='white',  # Set text color to white
        fontweight='bold',  # Make text bold for better visibility
        fontsize=9
    )

plt.tight_layout()
plt.show()

##### Monthly Orders (by Restaurant)

In [None]:
# Group the data by restaurant and calculate total orders
restaurant_totals = monthly_restaurant_map.groupby('RESTAURANT_NAME')['NUM_ORDERS_monthly'].sum().reset_index()
restaurant_totals = restaurant_totals.sort_values('NUM_ORDERS_monthly', ascending=False)
sorted_restaurants = restaurant_totals['RESTAURANT_NAME'].tolist()

# Create subplots for each restaurant
num_restaurants = len(sorted_restaurants)
num_cols = 3  # Number of columns in the subplot grid
num_rows = (num_restaurants + num_cols - 1) // num_cols

fig, axes = plt.subplots(num_rows, num_cols, figsize=(15, 5 * num_rows))
axes = axes.ravel()  # Flatten the axes array for easy iteration

for i, restaurant in enumerate(sorted_restaurants):
    restaurant_data = monthly_restaurant_map[monthly_restaurant_map['RESTAURANT_NAME'] == restaurant]

    # Sort by month name for correct order on the x-axis
    restaurant_data['MONTH_NAME'] = pd.Categorical(restaurant_data['MONTH_NAME'],
                                                categories=month_order,
                                                ordered=True)
    restaurant_data = restaurant_data.sort_values('MONTH_NAME')

    # Create bar plot with custom color
    bars = axes[i].bar(
        restaurant_data['MONTH_NAME'],
        restaurant_data['NUM_ORDERS_monthly'],
        color=restaurant_color_map[restaurant]
    )

    # Add value labels on top of each bar with smaller font size
    for bar in bars:
        height = bar.get_height()
        axes[i].text(
            bar.get_x() + bar.get_width()/2.,
            height,
            f'{int(height):,}',
            ha='center',
            va='bottom',
            fontsize=9
        )

    # Add total orders to the title
    total_orders = restaurant_totals.loc[restaurant_totals['RESTAURANT_NAME'] == restaurant, 'NUM_ORDERS_monthly'].iloc[0]
    axes[i].set_title(f'Monthly Orders - {restaurant}\nTotal Orders: {int(total_orders):,}')
    axes[i].set_xlabel('Month')
    axes[i].set_ylabel('Number of Orders')
    axes[i].tick_params(axis='x', rotation=45)

    # Format y-axis with comma separator
    axes[i].yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p: format(int(x), ',')))

# Add a tight layout to prevent overlapping labels
plt.tight_layout()

# Hide any unused subplots
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.show()

In [None]:
monthly_restaurant_map = monthly_restaurant_map.sort_values('NUM_ORDERS_total', ascending=False)

# Apply categorical ordering to 'MONTH_NAME'
monthly_restaurant_map['MONTH_NAME'] = pd.Categorical(
    monthly_restaurant_map['MONTH_NAME'],
    categories=month_order,
    ordered=True
)

# Sort the DataFrame by 'MONTH_NAME'
monthly_restaurant_map = monthly_restaurant_map.sort_values(['MONTH_NAME']).reset_index(drop=True)

# Create animated map
fig = px.scatter_mapbox(
    monthly_restaurant_map.sort_values('MONTH_NAME'),
    lat='LAT',
    lon='LONG',
    size='NUM_ORDERS_total',
    color='RESTAURANT_NAME',
    animation_frame='MONTH_NAME',
    hover_name='RESTAURANT_NAME',
    hover_data={
        'LAT': False,
        'LONG': False,
        'NUM_ORDERS_monthly': True,
        'NUM_ORDERS_total': True,
        'MONTH_NAME': True
    },
    labels={
        'MONTH_NAME': 'Month ',
        'NUM_ORDERS_monthly': 'Month Orders ',
        'NUM_ORDERS_total': 'Total Orders ',
        'RESTAURANT_NAME': 'Restaurant'
    },
    color_discrete_map=restaurant_color_map,
    zoom=11.5,
    title='2018 Orders (by Restaurant)'
)

# Update the map style and layout
fig.update_layout(
    mapbox_style='carto-positron',
    width=1200,
    height=800,
    margin={"r":0,"t":30,"l":0,"b":0}
)

fig.update_traces(
    hovertemplate="<b>%{hovertext}</b><br>" +
                  "%{customdata[4]} Orders: %{customdata[2]:,}<br>" +
                  "Total Yearly Orders: %{customdata[3]:,}<br>"
)

# Show the animated map
fig.show()

##### Monthly Average (by Restaurant)

In [None]:
orders_restaurant_monthly_avg = orders_restaurant_monthly_avg.sort_values(by='MONTHLY_AVG', ascending=True)

plt.figure(figsize=(12, 6))

# Create horizontal bars with custom colors
bars = plt.barh(
    orders_restaurant_monthly_avg['RESTAURANT_NAME'],
    orders_restaurant_monthly_avg['MONTHLY_AVG'],
    color=[restaurant_color_map[restaurant] for restaurant in orders_restaurant_monthly_avg['RESTAURANT_NAME']]
)

plt.xlabel('Monthly Average Orders')
plt.ylabel('Restaurant Name')
plt.title('2018 Monthly Average Orders per Restaurant')

# Format x-axis ticks with commas
plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p: format(int(x), ',')))

# Add values on the bars
for bar in bars:
    width = bar.get_width()
    plt.text(
        width - 10,  # Adjust position slightly to the left of the bar end
        bar.get_y() + bar.get_height() / 2,
        f'{int(width):,}',  # Format value with commas
        ha='right',  # Align text to the right
        va='center',  # Vertically center the text
        color='white',  # Set text color to white
        fontweight='bold',  # Make text bold for better visibility
        fontsize=9
    )

plt.tight_layout()
plt.show()

##### Daily Average (by Restaurant)

In [None]:
orders_restaurant_daily_avg = orders_restaurant_daily_avg.sort_values(by='DAILY_AVG', ascending=True)
plt.figure(figsize=(12, 6))

# Create horizontal bars with custom colors
bars = plt.barh(
    orders_restaurant_daily_avg['RESTAURANT_NAME'],
    orders_restaurant_daily_avg['DAILY_AVG'],
    color=[restaurant_color_map[restaurant] for restaurant in orders_restaurant_daily_avg['RESTAURANT_NAME']]
)

plt.xlabel('Daily Average Orders')
plt.ylabel('Restaurant Name')
plt.title('2018 Daily Average Orders per Restaurant')

# Format x-axis ticks with commas
plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p: format(int(x), ',')))

# Add values on the bars
for bar in bars:
    width = bar.get_width()
    plt.text(
        width - 10,  # Adjust position slightly to the left of the bar end
        bar.get_y() + bar.get_height() / 2,
        f'{int(width):,}',  # Format value with commas
        ha='right',  # Align text to the right
        va='center',  # Vertically center the text
        color='white',  # Set text color to white
        fontweight='bold',  # Make text bold for better visibility
        fontsize=9
    )

plt.tight_layout()
plt.show()

### Items

#### Data Preparation

In [None]:
orders_item_restaurant = (df_orders_norm
                          .groupby(['DATE', 'RESTAURANT_NAME', 'ITEM_NAME'])['ORDER_ID']
                          .count()
                          .reset_index()
                          .rename(columns={'ORDER_ID': 'COUNT'}))

orders_item_restaurant

In [None]:
monthly_item_restaurant = (orders_item_restaurant
    .assign(MONTH_NAME=pd.to_datetime(orders_item_restaurant['DATE']).dt.strftime('%B'))
    .groupby(['MONTH_NAME', 'RESTAURANT_NAME', 'ITEM_NAME'])
    .agg({'COUNT': 'sum'})
    .reset_index())

# Sort months chronologically
monthly_item_restaurant['MONTH_NAME'] = pd.Categorical(
    monthly_item_restaurant['MONTH_NAME'],
    categories=month_order,
    ordered=True
)

monthly_item_restaurant = monthly_item_restaurant.sort_values(['MONTH_NAME', 'RESTAURANT_NAME']).reset_index(drop=True)
monthly_item_restaurant

In [None]:
avg_orders_by_restaurant_item = (monthly_item_restaurant
                                 .groupby(['RESTAURANT_NAME', 'ITEM_NAME'])['COUNT']
                                 .mean()
                                 .reset_index()
                                 .rename(columns={'COUNT': 'AVG_COUNT'})
                                 .sort_values(by=['AVG_COUNT'], ascending=True)
                                 .reset_index(drop=True)
                                )
avg_orders_by_restaurant_item

In [None]:
orders_item_restaurant_total = (monthly_item_restaurant
                                .groupby(['RESTAURANT_NAME', 'ITEM_NAME'])['COUNT']
                                .sum()
                                .reset_index())
orders_item_restaurant_total

In [None]:
monthly_item_counts = (monthly_item_restaurant
                       .groupby(['MONTH_NAME', 'RESTAURANT_NAME', 'ITEM_NAME'])['COUNT']
                       .sum()
                       .reset_index())

monthly_item_counts['MONTH_NAME'] = pd.Categorical(monthly_item_counts['MONTH_NAME'],
                                                 categories=month_order,
                                                 ordered=True)
monthly_item_counts

In [None]:
item_counts = (monthly_item_restaurant
               .groupby(['ITEM_NAME'])['COUNT']
               .sum()
               .reset_index()
               .sort_values(by='COUNT', ascending=True)
               .reset_index(drop=True))
item_counts

In [None]:
orders_drinks = (df_orders
                 .groupby(['DATE'])[['DRINKS', 'COOKIES']]
                 .sum()
                 .sort_values(by=['DATE'], ascending=True)
                 .reset_index())
orders_drinks

#### Visualizations

In [None]:
# Create color mapping for items
item_color_map = dict(zip(
    item_counts['ITEM_NAME'].unique(),
    [default_colors[i % len(default_colors)] for i in range(len(item_counts['ITEM_NAME'].unique()))]
))

display(item_color_map)

##### Total Items Sold

In [None]:
# Sort the dataframe by COUNT in descending order
item_counts_sorted = item_counts.sort_values('COUNT', ascending=False)

# Create the horizontal bar chart using Plotly with custom tooltip
fig = px.bar(
    item_counts_sorted,
    y='ITEM_NAME',
    x='COUNT',
    title='Total Items Sold in 2018',
    orientation='h',
    color='ITEM_NAME',
    color_discrete_map=item_color_map,
    custom_data=['ITEM_NAME', 'COUNT'],
    text=item_counts_sorted['COUNT'].apply(lambda x: f'{x:,.0f}')
)

# Update layout and configure hover template
fig.update_layout(
    yaxis_title='Item Name',
    xaxis_title='Total Count',
    margin=dict(l=200),
    width=1200,
    height=800,
    xaxis=dict(
        tickformat=',.0f',
    ),
    showlegend=False
)

# Customize the text position and appearance
fig.update_traces(
    textposition='auto',
    textfont=dict(
        color='white',
        size=12
    )
)

# Convert to static image
fig.update_layout(
    dragmode=False,  # Disable drag mode
    hovermode=False  # Disable hover tooltips
)

# Remove all interactivity
config = {
    'staticPlot': True,  # Make the plot static
    'displayModeBar': False  # Remove the mode bar
}


# Display the static chart
fig.show(config=config)

# Save plot to display it in the recommendations later
fig.write_image("total_items_sold.png")

##### Monthly Item Sales

In [None]:
# 1. Prepare data for Highcharts
series_data = []
for item_name in monthly_item_counts['ITEM_NAME'].unique():
    item_data = []
    for month in month_order:
        monthly_data = monthly_item_counts[
            (monthly_item_counts['ITEM_NAME'] == item_name) & (monthly_item_counts['MONTH_NAME'] == month)
        ]
        if not monthly_data.empty:
            # Get unique restaurant names
            restaurants = monthly_data['RESTAURANT_NAME'].unique()
            restaurant_list = '<br/>'.join(restaurants)

            total_count = monthly_data['COUNT'].sum()

            item_data.append({
                'y': int(total_count),
                'restaurants': restaurant_list
            })
        else:
            item_data.append(None)

    series_data.append({
        'name': item_name,
        'data': item_data,
        'type': 'line',
        'color': item_color_map.get(item_name, '#000000'),  # Default to black if color not found
        'tooltip': {
            'headerFormat': '<span style="font-size: 14px">{point.key}</span><br/>',
            'pointFormat': (
                '<span style="color:{series.color}">{series.name}</span>: <b>{point.y}</b><br/>'
            )
        }
    })

# 2. Define chart options
options = {
    'title': {'text': 'Monthly Sales by Item Name'},
    'xAxis': {'categories': month_order},
    'yAxis': {'title': {'text': 'Number of Orders'}},
    'series': series_data,
    'chart': {
        'height': 700,
        'width': 1200
    },
    'colors': default_colors  # Apply color scheme
}

# 3. Create and display the chart
chart = Chart(options=options)
chart

##### Monthly Average Item Sales (by Restaurant)

In [None]:
# Create the horizontal bar chart with custom colors
fig = px.bar(
    avg_orders_by_restaurant_item,
    y='ITEM_NAME',
    x='AVG_COUNT',
    color='RESTAURANT_NAME',
    orientation='h',
    title='Monthly Average Item Sales (by Item and Restaurant)',
    labels={
        'ITEM_NAME': 'Item Name',
        'AVG_COUNT': 'Monthly Average Orders',
        'RESTAURANT_NAME': 'Restaurant'
    },
    height=800,
    # Pass the 'restaurant_color_map' directly to color_discrete_map
    color_discrete_map=restaurant_color_map
)

# Update layout for better readability
fig.update_layout(
    showlegend=True,
    legend_title='Restaurant',
    barmode='group',
    yaxis={'categoryorder': 'total ascending'},
    xaxis_title='Monthly Average Orders',
    yaxis_title='Item Name',
    margin=dict(l=200),
    width=1200,
    height=800,
    font=dict(size=12),
    title_font_size=20,
    xaxis=dict(
        tickformat=',.0f'
    )
)

# Add formatted tooltips
fig.update_traces(
    hovertemplate="<b>Restaurant:</b> %{customdata}<br>" +
                  "<b>Average Orders:</b> %{x:,.0f}<br>" +
                  "<b>Item:</b> %{y}<br>" +
                  "<extra></extra>",
    customdata=avg_orders_by_restaurant_item['RESTAURANT_NAME']
)

fig.show()

##### Drinks & Cookies Sold

In [None]:
# Create the line chart using Plotly
fig = go.Figure()

fig.add_trace(go.Scatter(x=orders_drinks['DATE'], y=orders_drinks['DRINKS'], mode='lines+markers', name='Drinks'))
fig.add_trace(go.Scatter(x=orders_drinks['DATE'], y=orders_drinks['COOKIES'], mode='lines+markers', name='Cookies'))

# Update layout
fig.update_layout(
    title='Drinks and Cookies Sold',
    xaxis_title='Date',
    yaxis_title='Quantity',
    width=1000,
    height=500
)

# Show the plot
fig.show()

### Order Type

#### Data Preparation

In [None]:
order_type_count = (df_orders
                    .groupby(['DATE', 'TYPE'])['TYPE']
                    .count()
                    .reset_index(name='COUNT'))
order_type_count

In [None]:
order_type_drinks = (df_orders
                     .groupby(['DATE', 'TYPE'])[['DRINKS', 'COOKIES']]
                     .sum()
                     .sort_values(by=['DATE', 'TYPE'], ascending=True)
                     .reset_index())

order_type_drinks

In [None]:
order_type_restaurant = (df_orders
                         .groupby(['DATE', 'RESTAURANT_NAME', 'TYPE'])['TYPE']
                         .count()
                         .reset_index(name='COUNT'))
order_type_restaurant

In [None]:
order_type_restaurant_monthly = (order_type_restaurant
                               .assign(MONTH_NAME=pd.to_datetime(order_type_restaurant['DATE']).dt.strftime('%B'))
                               .groupby(['MONTH_NAME', 'RESTAURANT_NAME', 'TYPE'])['COUNT']
                               .sum()
                               .reset_index())

# Convert 'MONTH_NAME' to Categorical and sort in one step
order_type_restaurant_monthly['MONTH_NAME'] = pd.Categorical(
    order_type_restaurant_monthly['MONTH_NAME'],
    categories=month_order,
    ordered=True
)

order_type_restaurant_monthly = order_type_restaurant_monthly.sort_values(
    ['MONTH_NAME', 'RESTAURANT_NAME', 'COUNT'], ascending=[True, True, False]
).reset_index(drop=True)

order_type_restaurant_monthly

In [None]:
order_type_restaurant_monthly_avg = (order_type_restaurant_monthly
                                     .groupby(['RESTAURANT_NAME', 'TYPE'])['COUNT']
                                     .mean()
                                     .reset_index()
                                     .sort_values(['RESTAURANT_NAME', 'COUNT'], ascending=[True, False])
                                     .rename(columns={'COUNT': 'MONTHLY_AVG'})
                                     .reset_index(drop=True))
order_type_restaurant_monthly_avg

In [None]:
order_type_restaurant_daily_avg = (order_type_restaurant
                                   .groupby(['RESTAURANT_NAME', 'TYPE'])['COUNT']
                                   .mean()
                                   .reset_index()
                                   .sort_values(['RESTAURANT_NAME', 'COUNT'], ascending=[True, False])
                                   .rename(columns={'COUNT': 'DAILY_AVG'})
                                   .reset_index(drop=True))
order_type_restaurant_daily_avg

In [None]:
order_type_item = (df_orders_norm
                   .groupby(['DATE', 'ITEM_NAME', 'TYPE'])['TYPE']
                   .count()
                   .reset_index(name='COUNT'))
order_type_item

In [None]:
order_type_item_daily_avg = (order_type_item
                             .groupby(['ITEM_NAME', 'TYPE'])['COUNT']
                             .mean()
                             .reset_index()
                             .sort_values(['ITEM_NAME', 'COUNT'], ascending=[True, False])
                             .rename(columns={'COUNT': 'DAILY_AVG'})
                             .reset_index(drop=True))
order_type_item_daily_avg

In [None]:
order_type_item_monthly = (order_type_item
                           .assign(MONTH_NAME=pd.to_datetime(orders_item_restaurant['DATE']).dt.strftime('%B'))
                           .groupby(['MONTH_NAME', 'ITEM_NAME', 'TYPE'])['COUNT']
                           .sum()
                           .reset_index())

# Convert 'MONTH_NAME' to Categorical and sort in one step
order_type_item_monthly['MONTH_NAME'] = pd.Categorical(
    order_type_item_monthly['MONTH_NAME'],
    categories=month_order,
    ordered=True
)

order_type_item_monthly = order_type_item_monthly.sort_values(
    ['MONTH_NAME', 'ITEM_NAME', 'COUNT'], ascending=[True, True, False]
).reset_index(drop=True)

order_type_item_monthly

In [None]:
order_type_item_monthly_avg = (order_type_item_monthly
                               .groupby(['ITEM_NAME', 'TYPE'])['COUNT']
                               .mean()
                               .reset_index()
                               .sort_values(['ITEM_NAME', 'COUNT'], ascending=[True, False])
                               .rename(columns={'COUNT': 'MONTHLY_AVG'})
                               .reset_index(drop=True))
order_type_item_monthly_avg

#### Visualizations

##### Order Type (Total)

In [None]:
fig = go.Figure()

for order_type in order_type_count['TYPE'].unique():
    subset = order_type_count[order_type_count['TYPE'] == order_type]
    fig.add_trace(go.Scatter(x=subset['DATE'], y=subset['COUNT'], mode='lines+markers', name=order_type))

fig.update_layout(
    title='Order Type Counts Over Time',
    xaxis_title='Date',
    yaxis_title='Count',
    width=1000,
    height=500
)

fig.show()

##### Order Type (Drinks & Cookies)

##### Order Type (by Restaurant)

##### Order Type (Daily Average by Restaurant)

##### Order Type (by Item)

##### Order Type (Daily Average by Item)

## **Insights & Recommendations**

### **Q1: Where should the new restaurant be located?**

**Recommendation**: Long Island City, Queens, NY.

**Rationale**:
- Untapped Market: This area presents an expansion opportunity as the ***restaurant chain has currently no presence*** in the region.
- High Density and Diversity: Long Island City is a densely populated and diverse area, offering a large potential customer base.
-Attractive Destination: The area boasts popular attractions and cultural venues, indicating a vibrant local scene.

**Supporting Evidence**:

Long Island City offers various unique attractions including Gantry Plaza State Park, MoMA PS1, a thriving brewery scene, and Michelin-starred restaurants (source: [NewYorkSimply](https://newyorksimply.com/lic-things-to-do-long-island-city/)).

![New Restaurant Location](https://github.com/renan-peres/mfin-python-restaurant-data-analysis/blob/main/images/new-restaurant-location.png?raw=1)

### **Q2: Which items should be included in the new restaurant?**

In [None]:
Image(filename='/content/total_items_sold.png')

### **Q3: What type of restaurant should it be?**
- In-Store
- Delivery

## Conclusion