In [19]:
import pandas as pd


You are a Product Analyst working with the Shake Shack R&D team to evaluate customer ratings for experimental milkshake flavors. Your team has collected ratings data from a small sampling test. Your task is to systematically analyze and clean the ratings data to identify top-performing flavors.

In [20]:
# Load the data
milkshake_ratings = pd.read_csv('milkshake_ratings.csv')

# Display the dataset
print(milkshake_ratings)


               flavor  rating customer_id rating_date
0   Classic Chocolate     4.5     CUST001  2024-07-05
1    Strawberry Swirl     3.8     CUST002  2024-07-10
2        Vanilla Bean     4.2     CUST003  2024-07-15
3     Caramel Delight     3.5     CUST004  2024-07-20
4          Mocha Bean     NaN     CUST005  2024-07-25
5   Classic Chocolate     4.5     CUST001  2024-07-05
6   Classic Chocolate     5.0     CUST006  2024-08-01
7    Strawberry Swirl     4.0     CUST007  2024-08-02
8        Vanilla Bean     3.9     CUST008  2024-08-03
9     Caramel Delight     4.8     CUST009  2024-10-04
10         Mocha Bean     2.5     CUST010  2024-09-05
11  Classic Chocolate     4.7     CUST011  2024-10-06
12   Strawberry Swirl     NaN     CUST012  2024-10-07
13       Vanilla Bean     4.3     CUST013  2024-10-08
14    Caramel Delight     4.9     CUST014  2024-10-09
15         Mocha Bean     3.3     CUST015  2024-08-10
16  Classic Chocolate     1.0     CUST016  2024-08-11
17   Strawberry Swirl     6.

### Question 1 of 3

There was an error in our data collection process, and we unknowingly introduced duplicate rows into our data. Remove any duplicate entries in the customer ratings data to ensure the accuracy of the analysis.

In [21]:
# Remove duplicate rows
milkshake_ratings = milkshake_ratings.drop_duplicates()


### Question 2 of 3

For each milkshake flavor, calculate the average customer rating and append this as a new column to the milkshake_ratings DataFrame. Don't forget to clean the DataFrame first by dropping duplicate values.

In [22]:
# Compute average rating per flavor and append as a new column
milkshake_ratings['avg_flavor_rating'] = (
    milkshake_ratings
        .groupby('flavor')['rating']
        .transform('mean')
)

# Show results
print(milkshake_ratings)


               flavor  rating customer_id rating_date  avg_flavor_rating
0   Classic Chocolate     4.5     CUST001  2024-07-05           4.172727
1    Strawberry Swirl     3.8     CUST002  2024-07-10           4.240000
2        Vanilla Bean     4.2     CUST003  2024-07-15           3.908333
3     Caramel Delight     3.5     CUST004  2024-07-20           4.200000
4          Mocha Bean     NaN     CUST005  2024-07-25           3.644444
6   Classic Chocolate     5.0     CUST006  2024-08-01           4.172727
7    Strawberry Swirl     4.0     CUST007  2024-08-02           4.240000
8        Vanilla Bean     3.9     CUST008  2024-08-03           3.908333
9     Caramel Delight     4.8     CUST009  2024-10-04           4.200000
10         Mocha Bean     2.5     CUST010  2024-09-05           3.644444
11  Classic Chocolate     4.7     CUST011  2024-10-06           4.172727
12   Strawberry Swirl     NaN     CUST012  2024-10-07           4.240000
13       Vanilla Bean     4.3     CUST013  2024-10-

### Question 3 of 3

For each row in the dataset, calculate the difference between that customer's rating and the average rating for the flavor. Don't forget to clean the DataFrame first by dropping duplicate values.

In [23]:
# Calculate per-row difference from the flavor's average rating
milkshake_ratings['rating_vs_flavor_avg'] = (
    milkshake_ratings['rating'] - milkshake_ratings['avg_flavor_rating']
)

# Show results
print(milkshake_ratings)


               flavor  rating customer_id rating_date  avg_flavor_rating  \
0   Classic Chocolate     4.5     CUST001  2024-07-05           4.172727   
1    Strawberry Swirl     3.8     CUST002  2024-07-10           4.240000   
2        Vanilla Bean     4.2     CUST003  2024-07-15           3.908333   
3     Caramel Delight     3.5     CUST004  2024-07-20           4.200000   
4          Mocha Bean     NaN     CUST005  2024-07-25           3.644444   
6   Classic Chocolate     5.0     CUST006  2024-08-01           4.172727   
7    Strawberry Swirl     4.0     CUST007  2024-08-02           4.240000   
8        Vanilla Bean     3.9     CUST008  2024-08-03           3.908333   
9     Caramel Delight     4.8     CUST009  2024-10-04           4.200000   
10         Mocha Bean     2.5     CUST010  2024-09-05           3.644444   
11  Classic Chocolate     4.7     CUST011  2024-10-06           4.172727   
12   Strawberry Swirl     NaN     CUST012  2024-10-07           4.240000   
13       Van