# üå∏‚òÄÔ∏èüçÇ‚ùÑÔ∏è Seasonal Coffee Drink Classifier (Student Version)

Welcome to **Starbrews ML Lab**! ‚òïüíª

You're going to help a fictional coffee shop classify its drinks into seasons:

- üå∏ **Spring drink**
- ‚òÄÔ∏è **Summer drink**
- üçÇ **Fall drink**
- ‚ùÑÔ∏è **Winter drink**

Each drink has these features:

- `spice` (0‚Äì10)
- `temperature` (0‚Äì10, 0 = iced, 10 = very hot)
- `flavor_notes` (0‚Äì10, overall depth / sweetness / richness)
- `fruitiness` (0‚Äì10)
- `color_tone` (0‚Äì10, 0 = light/pastel, 10 = dark/rich)
- `foaminess` (0‚Äì10)

### Your goals (about 20 minutes)

1. **Load and explore** the fake drink dataset.
2. Use **PCA** to reduce from 6 features down to 2 components and visualize the drinks.
3. Train an **SVM classifier** to predict the drink's season.
4. **Invent 3 new drinks**, classify them, and decide if you agree with the model.
5. Answer some short reflection questions.

There are no strict right or wrong answers ‚Äì explain your reasoning and have fun! ‚ú®

## 1Ô∏è‚É£ Setup and Dataset

First we'll import some libraries and create a fake seasonal drink dataset.

Run the cell below. You don't need to change anything yet.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

np.random.seed(42)

def make_season_drinks(n_per_season=20):
    seasons = []
    rows = []

    # Helper to sample and clip between 0 and 10
    def sample_vals(means, std=1.0):
        vals = np.random.normal(loc=means, scale=std)
        return np.clip(vals, 0, 10)

    # Spring: floral, medium temp, medium fruitiness, pastel
    for _ in range(n_per_season):
        spice, temperature, flavor_notes, fruitiness, color_tone, foaminess = sample_vals(
            [3, 6, 6, 7, 3, 4], std=1.2
        )
        rows.append([spice, temperature, flavor_notes, fruitiness, color_tone, foaminess])
        seasons.append("spring")

    # Summer: cold, fruity, light in color, low spice
    for _ in range(n_per_season):
        spice, temperature, flavor_notes, fruitiness, color_tone, foaminess = sample_vals(
            [1, 2, 5, 8, 2, 2], std=1.2
        )
        rows.append([spice, temperature, flavor_notes, fruitiness, color_tone, foaminess])
        seasons.append("summer")

    # Fall: warm, very spicy, rich, foamy, darker
    for _ in range(n_per_season):
        spice, temperature, flavor_notes, fruitiness, color_tone, foaminess = sample_vals(
            [8, 8, 8, 3, 8, 7], std=1.2
        )
        rows.append([spice, temperature, flavor_notes, fruitiness, color_tone, foaminess])
        seasons.append("fall")

    # Winter: very hot, cozy, rich, medium spice, very foamy, darkest
    for _ in range(n_per_season):
        spice, temperature, flavor_notes, fruitiness, color_tone, foaminess = sample_vals(
            [6, 9, 9, 2, 9, 8], std=1.2
        )
        rows.append([spice, temperature, flavor_notes, fruitiness, color_tone, foaminess])
        seasons.append("winter")

    cols = ["spice", "temperature", "flavor_notes", "fruitiness", "color_tone", "foaminess"]
    df = pd.DataFrame(rows, columns=cols)
    df["season"] = seasons
    return df

drinks = make_season_drinks(n_per_season=20)
drinks.head()

### üëâ Quick check

- How many total drinks are there?
- Which seasons are in the dataset?

Use the cell below to explore a bit.

In [None]:
# TODO: Explore the dataset
# Try things like drinks.shape, drinks["season"].value_counts(), drinks.describe()

print("Number of rows and columns:")
print(drinks.shape)

print("\nSeasons in the dataset:")
print(drinks["season"].value_counts())

print("\nSummary statistics:")
print(drinks.describe())

## 2Ô∏è‚É£ PCA: Reduce to 2 Dimensions

We have **6 features**, which is hard to see all at once. We'll use **PCA** (Principal Component Analysis) to reduce to **2 dimensions** so we can plot the drinks.

Steps:
1. Split features (X) and labels (y).
2. Fit PCA with 2 components.
3. Make a scatterplot colored by season.

Fill in the TODOs below where needed.

In [None]:
# Separate features (X) and target (y)
feature_cols = ["spice", "temperature", "flavor_notes", "fruitiness", "color_tone", "foaminess"]
X = drinks[feature_cols].values
y = drinks["season"].values

# TODO: Create a PCA object with 2 components and fit_transform X
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print("Original shape:", X.shape)
print("PCA shape:", X_pca.shape)

In [None]:
# Plot PCA results colored by season
plt.figure(figsize=(6, 5))

seasons = np.unique(y)
colors = ["tab:green", "tab:orange", "tab:blue", "tab:red"]
season_to_color = dict(zip(seasons, colors))

for season in seasons:
    mask = (y == season)
    plt.scatter(X_pca[mask, 0], X_pca[mask, 1], label=season.capitalize(), alpha=0.8)

plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.title("Seasonal Drinks in PCA Space")
plt.legend()
plt.tight_layout()
plt.show()

### ‚úèÔ∏è Question: Interpreting PCA

Look at the plot and answer in your own words (you can write answers in a markdown cell or on paper):

1. Do you see **clusters** for different seasons? Which seasons seem close together?
2. Based on what you know about the features, what do you think PCA **Component 1** and **Component 2** might represent? Give them fun names, like:
   - "cozy vs refreshing axis"
   - "spice & warmth axis"
   - "light vs dark drinks"


## 3Ô∏è‚É£ Train an SVM to Predict the Season

Now we'll train an **SVM (Support Vector Machine)** classifier to predict the season from the drink features.

We'll:
1. Split the data into training and test sets.
2. Train an SVM on the **original features** (6D) or on the **PCA features** (2D). We'll start with the original features.
3. Look at how well it does.

Fill in the TODOs where necessary.

In [None]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)

# TODO: Create an SVM classifier (try kernel="rbf" to start)
svm_clf = SVC(kernel="rbf", gamma="scale", C=1.0)

# TODO: Fit the model on the training data
svm_clf.fit(X_train, y_train)

# Evaluate on the test set
y_pred = svm_clf.predict(X_test)
print(classification_report(y_test, y_pred))

### üîÅ (Optional) Try different kernels

You can try other SVM kernels like `"linear"` or `"poly"` and see if the performance changes.

For example:
- `SVC(kernel="linear")`
- `SVC(kernel="poly", degree=3)`

Which one seems to work best for this dataset?

## 4Ô∏è‚É£ Invent Your Own Drinks and Classify Them

Now it's your turn to be a **drink designer**.

Create at least **3 new imaginary drinks**, like:

- `Pumpkin Spice Daydream`
- `Berry Sunrise Refresher`
- `Snowfall Vanilla Latte`
- `Blossom Matcha Cooler`

For each one, choose values (0‚Äì10) for:

- spice
- temperature
- flavor_notes
- fruitiness
- color_tone
- foaminess

Then use your SVM model to predict their seasons.

üëâ Do you agree with the model's guesses? Why or why not?

In [None]:
# Example: define your own custom drinks here
# Feel free to change the values and add more drinks!

custom_drinks = pd.DataFrame([
    {"name": "Pumpkin Spice Daydream", "spice": 9, "temperature": 9, "flavor_notes": 8, "fruitiness": 2, "color_tone": 9, "foaminess": 8},
    {"name": "Berry Sunrise Refresher", "spice": 1, "temperature": 2, "flavor_notes": 5, "fruitiness": 9, "color_tone": 2, "foaminess": 2},
    {"name": "Snowfall Vanilla Latte", "spice": 5, "temperature": 9, "flavor_notes": 9, "fruitiness": 1, "color_tone": 8, "foaminess": 9},
])

print("Your custom drinks:")
display(custom_drinks)

# Use the same feature columns as before
X_custom = custom_drinks[feature_cols].values
custom_preds = svm_clf.predict(X_custom)

custom_drinks["predicted_season"] = custom_preds
print("\nModel predictions:")
display(custom_drinks[["name", "predicted_season"]])

### ‚úèÔ∏è Reflection

Answer in your own words (in a markdown cell or on paper):

1. Did the model classify your custom drinks into the seasons you expected?
2. Which features (spice, temperature, fruitiness, etc.) seem most important for deciding the season?
3. Do you think this model would work well on **real** coffee shop menus? Why or why not?
4. What was one surprising thing you noticed from the PCA plot or SVM predictions?