# Day 13: New Milkshake Flavor Selection for Launch

You are a Product Analyst working with the Shake Shack R&D team to evaluate customer ratings for experimental milkshake flavors. Your team has collected ratings data from a small sampling test. Your task is to systematically analyze and clean the ratings data to identify top-performing flavors.

In [1]:
import pandas as pd
import numpy as np

milkshake_ratings_data = [
  {
    "flavor": "Classic Chocolate",
    "rating": 4.5,
    "customer_id": "CUST001",
    "rating_date": "2024-07-05"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 3.8,
    "customer_id": "CUST002",
    "rating_date": "2024-07-10"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4.2,
    "customer_id": "CUST003",
    "rating_date": "2024-07-15"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 3.5,
    "customer_id": "CUST004",
    "rating_date": "2024-07-20"
  },
  {
    "flavor": "Mocha Bean",
    "rating": None,
    "customer_id": "CUST005",
    "rating_date": "2024-07-25"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.5,
    "customer_id": "CUST001",
    "rating_date": "2024-07-05"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 5,
    "customer_id": "CUST006",
    "rating_date": "2024-08-01"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4,
    "customer_id": "CUST007",
    "rating_date": "2024-08-02"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3.9,
    "customer_id": "CUST008",
    "rating_date": "2024-08-03"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.8,
    "customer_id": "CUST009",
    "rating_date": "2024-10-04"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 2.5,
    "customer_id": "CUST010",
    "rating_date": "2024-09-05"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.7,
    "customer_id": "CUST011",
    "rating_date": "2024-10-06"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": None,
    "customer_id": "CUST012",
    "rating_date": "2024-10-07"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4.3,
    "customer_id": "CUST013",
    "rating_date": "2024-10-08"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.9,
    "customer_id": "CUST014",
    "rating_date": "2024-10-09"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 3.3,
    "customer_id": "CUST015",
    "rating_date": "2024-08-10"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 1,
    "customer_id": "CUST016",
    "rating_date": "2024-08-11"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 6,
    "customer_id": "CUST017",
    "rating_date": "2024-08-12"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3,
    "customer_id": "CUST018",
    "rating_date": "2024-08-13"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.2,
    "customer_id": "CUST019",
    "rating_date": "2024-08-14"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 4.1,
    "customer_id": "CUST020",
    "rating_date": "2024-08-15"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 3.7,
    "customer_id": "CUST021",
    "rating_date": "2024-08-16"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 3.9,
    "customer_id": "CUST022",
    "rating_date": "2024-08-17"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4.4,
    "customer_id": "CUST023",
    "rating_date": "2024-08-18"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 3.6,
    "customer_id": "CUST024",
    "rating_date": "2024-08-19"
  },
  {
    "flavor": "Mocha Bean",
    "rating": None,
    "customer_id": "CUST025",
    "rating_date": "2024-08-20"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.8,
    "customer_id": "CUST026",
    "rating_date": "2024-08-21"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4.6,
    "customer_id": "CUST027",
    "rating_date": "2024-08-22"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4,
    "customer_id": "CUST028",
    "rating_date": "2024-08-23"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.4,
    "customer_id": "CUST029",
    "rating_date": "2024-08-24"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 3.2,
    "customer_id": "CUST030",
    "rating_date": "2024-11-25"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.9,
    "customer_id": "CUST031",
    "rating_date": "2024-11-26"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4.1,
    "customer_id": "CUST032",
    "rating_date": "2024-11-27"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3.3,
    "customer_id": "CUST033",
    "rating_date": "2024-11-28"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 3.8,
    "customer_id": "CUST034",
    "rating_date": "2024-11-29"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 4,
    "customer_id": "CUST035",
    "rating_date": "2024-11-30"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.3,
    "customer_id": "CUST036",
    "rating_date": "2024-12-01"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": None,
    "customer_id": "CUST037",
    "rating_date": "2024-12-02"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3.7,
    "customer_id": "CUST038",
    "rating_date": "2024-12-03"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.5,
    "customer_id": "CUST039",
    "rating_date": "2024-12-04"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 3.9,
    "customer_id": "CUST040",
    "rating_date": "2024-12-05"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.4,
    "customer_id": "CUST041",
    "rating_date": "2024-12-06"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 3.5,
    "customer_id": "CUST042",
    "rating_date": "2024-12-07"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4.6,
    "customer_id": "CUST043",
    "rating_date": "2024-12-08"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.2,
    "customer_id": "CUST044",
    "rating_date": "2025-02-09"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 3.4,
    "customer_id": "CUST045",
    "rating_date": "2025-02-10"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": None,
    "customer_id": "CUST046",
    "rating_date": "2025-02-11"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4,
    "customer_id": "CUST047",
    "rating_date": "2025-02-12"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 4.1,
    "customer_id": "CUST048",
    "rating_date": "2025-02-13"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4.3,
    "customer_id": "CUST049",
    "rating_date": "2025-04-14"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 3.7,
    "customer_id": "CUST050",
    "rating_date": "2025-04-15"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4.6,
    "customer_id": "CUST051",
    "rating_date": "2025-04-16"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4.3,
    "customer_id": "CUST052",
    "rating_date": "2025-04-17"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3.8,
    "customer_id": "CUST053",
    "rating_date": "2025-04-18"
  },
  {
    "flavor": "Caramel Delight",
    "rating": None,
    "customer_id": "CUST054",
    "rating_date": "2025-06-19"
  },
  {
    "flavor": "Mocha Bean",
    "rating": 4.7,
    "customer_id": "CUST055",
    "rating_date": "2025-06-20"
  },
  {
    "flavor": "Classic Chocolate",
    "rating": 4,
    "customer_id": "CUST056",
    "rating_date": "2025-06-21"
  },
  {
    "flavor": "Strawberry Swirl",
    "rating": 4.2,
    "customer_id": "CUST057",
    "rating_date": "2025-06-22"
  },
  {
    "flavor": "Vanilla Bean",
    "rating": 3.6,
    "customer_id": "CUST058",
    "rating_date": "2025-06-23"
  },
  {
    "flavor": "Caramel Delight",
    "rating": 4,
    "customer_id": "CUST059",
    "rating_date": "2025-06-24"
  }
]
milkshake_ratings = pd.DataFrame(milkshake_ratings_data)


## Question 1

There was an error in our data collection process, and we unknowingly introduced duplciate rows into our data. Remove any duplicate entries in the customer ratings data to ensure the accuracy of the analysis.

In [2]:
milkshake_ratings = milkshake_ratings.drop_duplicates()

print(milkshake_ratings)

               flavor  rating customer_id rating_date
0   Classic Chocolate     4.5     CUST001  2024-07-05
1    Strawberry Swirl     3.8     CUST002  2024-07-10
2        Vanilla Bean     4.2     CUST003  2024-07-15
3     Caramel Delight     3.5     CUST004  2024-07-20
4          Mocha Bean     NaN     CUST005  2024-07-25
6   Classic Chocolate     5.0     CUST006  2024-08-01
7    Strawberry Swirl     4.0     CUST007  2024-08-02
8        Vanilla Bean     3.9     CUST008  2024-08-03
9     Caramel Delight     4.8     CUST009  2024-10-04
10         Mocha Bean     2.5     CUST010  2024-09-05
11  Classic Chocolate     4.7     CUST011  2024-10-06
12   Strawberry Swirl     NaN     CUST012  2024-10-07
13       Vanilla Bean     4.3     CUST013  2024-10-08
14    Caramel Delight     4.9     CUST014  2024-10-09
15         Mocha Bean     3.3     CUST015  2024-08-10
16  Classic Chocolate     1.0     CUST016  2024-08-11
17   Strawberry Swirl     6.0     CUST017  2024-08-12
18       Vanilla Bean     3.

## Question 2

For each milkshake flavor, calculate the average customer rating and append this as a new column to the milkshake_ratings DataFrame. Don't forget to clean the DataFrame first by dropping duplicate values.

In [3]:
# Remove duplicate
milkshake_ratings = milkshake_ratings.drop_duplicates()

# Average rating
milkshake_ratings['avg_rating'] = milkshake_ratings.groupby('flavor')['rating'].transform('mean')

print(milkshake_ratings)

               flavor  rating customer_id rating_date  avg_rating
0   Classic Chocolate     4.5     CUST001  2024-07-05    4.172727
1    Strawberry Swirl     3.8     CUST002  2024-07-10    4.240000
2        Vanilla Bean     4.2     CUST003  2024-07-15    3.908333
3     Caramel Delight     3.5     CUST004  2024-07-20    4.200000
4          Mocha Bean     NaN     CUST005  2024-07-25    3.644444
6   Classic Chocolate     5.0     CUST006  2024-08-01    4.172727
7    Strawberry Swirl     4.0     CUST007  2024-08-02    4.240000
8        Vanilla Bean     3.9     CUST008  2024-08-03    3.908333
9     Caramel Delight     4.8     CUST009  2024-10-04    4.200000
10         Mocha Bean     2.5     CUST010  2024-09-05    3.644444
11  Classic Chocolate     4.7     CUST011  2024-10-06    4.172727
12   Strawberry Swirl     NaN     CUST012  2024-10-07    4.240000
13       Vanilla Bean     4.3     CUST013  2024-10-08    3.908333
14    Caramel Delight     4.9     CUST014  2024-10-09    4.200000
15        

## Question 3

For each row in dataset, calculate the difference between that customer's rating and the average rating for the flavor. Don't forget to clean the DataFrame first by dropping duplicate values.

In [4]:
# Remove duplicate
milkshake_ratings = milkshake_ratings.drop_duplicates()

# Average rating
milkshake_ratings['avg_rating'] = milkshake_ratings.groupby('flavor')['rating'].transform('mean')

# Difference between customer's rating and avg_rating
milkshake_ratings['diff'] = milkshake_ratings['rating'] - milkshake_ratings['avg_rating']

print(milkshake_ratings)

               flavor  rating customer_id rating_date  avg_rating      diff
0   Classic Chocolate     4.5     CUST001  2024-07-05    4.172727  0.327273
1    Strawberry Swirl     3.8     CUST002  2024-07-10    4.240000 -0.440000
2        Vanilla Bean     4.2     CUST003  2024-07-15    3.908333  0.291667
3     Caramel Delight     3.5     CUST004  2024-07-20    4.200000 -0.700000
4          Mocha Bean     NaN     CUST005  2024-07-25    3.644444       NaN
6   Classic Chocolate     5.0     CUST006  2024-08-01    4.172727  0.827273
7    Strawberry Swirl     4.0     CUST007  2024-08-02    4.240000 -0.240000
8        Vanilla Bean     3.9     CUST008  2024-08-03    3.908333 -0.008333
9     Caramel Delight     4.8     CUST009  2024-10-04    4.200000  0.600000
10         Mocha Bean     2.5     CUST010  2024-09-05    3.644444 -1.144444
11  Classic Chocolate     4.7     CUST011  2024-10-06    4.172727  0.527273
12   Strawberry Swirl     NaN     CUST012  2024-10-07    4.240000       NaN
13       Van

Made with ❤️ by [Interview Master](https://www.interviewmaster.ai)