---
title: "PA 3.2 — Iteration (Shiqi Wu)"
format:
  html:
    self-contained: true
    toc: false
    code-fold: false
jupyter: python3
---

**Repository:** [GitHub – PA 3.2](https://github.com/shiqiwu212/GSB-S544-01/tree/55e919efa93cf5f0a1cd7f1ee7d32ce049b2fd16/Week%203/Practice%20Activities/Practice%20Activity%203.2)

0. Load the `penguins` dataset from the `palmerpenguins` library, as well as any other libraries you need.



In [5]:
# This cell sets up the minimal environment for PA 3.2 and loads the penguins
# dataset from the palmerpenguins package into a pandas DataFrame.
# Install the dataset package
%pip -q install palmerpenguins

import numpy as np
import pandas as pd
from palmerpenguins import load_penguins

# Load as a DataFrame
penguins = load_penguins().copy()

print("penguins shape:", penguins.shape)
print("columns:", list(penguins.columns))
penguins.head()

Note: you may need to restart the kernel to use updated packages.
penguins shape: (344, 8)
columns: ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year']


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007


1. Write a function that takes in information about a penguin, and returns one of the following definitions:

*   "Big Mouth Billy": Male penguins with bill length times bill depth greater than 800.
*   "Dainty Daisy": Female penguins with flipper length less than 5% of body mass.
*   "Average Adelie:" Any Adelie penguins that do not fall into either category.
*   "Other": Any penguins that do not fall into any of the categories.



In [6]:
# AI Assistance
# Fix: NaN values in text fields (sex/species) can be floats; calling .strip() on a float
# raises AttributeError. Add a safe normalizer that treats None/NaN as empty strings.

import numpy as np
import pandas as pd

def classify_penguin(
    bill_length_mm=None,
    bill_depth_mm=None,
    sex=None,
    flipper_length_mm=None,
    body_mass_g=None,
    species=None,
):
    """Return one of the four categories based on the prompt rules."""
    # ---- numeric coercion ----
    def to_float(x):
        try:
            return float(x)
        except (TypeError, ValueError):
            return np.nan

    # ---- NEW: safe text normalization ----
    def norm_text(x):
        # Treat None/NaN as empty string; otherwise lowercase the stripped string
        if x is None or pd.isna(x):
            return ""
        return str(x).strip().lower()

    bl = to_float(bill_length_mm)
    bd = to_float(bill_depth_mm)
    fl = to_float(flipper_length_mm)
    bm = to_float(body_mass_g)

    sex_norm = norm_text(sex)         # <- safe now
    species_norm = norm_text(species) # <- safe now

    # ---- Big Mouth Billy ----
    prod = bl * bd if (np.isfinite(bl) and np.isfinite(bd)) else np.nan
    if sex_norm == "male" and np.isfinite(prod) and prod > 800:
        return "Big Mouth Billy"

    # ---- Dainty Daisy ----
    threshold = 0.05 * bm if np.isfinite(bm) else np.nan
    if sex_norm == "female" and np.isfinite(fl) and np.isfinite(threshold) and fl < threshold:
        return "Dainty Daisy"

    # ---- Average Adelie ----
    if species_norm == "adelie":
        return "Average Adelie"

    # ---- Other ----
    return "Other"

2. Use an iterable function to create a new variable called `category_name` that adds these labels.

In [7]:
# AI Assistance
# Guidance emphasized using map() to iterate in parallel across columns and assign the
# returned labels to a new column `category_name`, then printing counts in Canvas order.

category_labels = list(map(
    lambda bl, bd, sx, fl, bm, sp: classify_penguin(
        bill_length_mm=bl,
        bill_depth_mm=bd,
        sex=sx,
        flipper_length_mm=fl,
        body_mass_g=bm,
        species=sp
    ),
    penguins["bill_length_mm"],
    penguins["bill_depth_mm"],
    penguins["sex"],
    penguins["flipper_length_mm"],
    penguins["body_mass_g"],
    penguins["species"]
))

penguins["category_name"] = category_labels

order = ["Big Mouth Billy", "Dainty Daisy", "Average Adelie", "Other"]
counts = penguins["category_name"].value_counts()
print("Counts for Canvas:")
for name in order:
    print(f"{name:<16} -> {int(counts.get(name, 0))}")
print("Total classified:", int(counts.sum()))

Counts for Canvas:
Big Mouth Billy  -> 71
Dainty Daisy     -> 62
Average Adelie   -> 127
Other            -> 84
Total classified: 344


3. Run the following code to find the counts for each type.

In [8]:
penguins.value_counts("category_name")

category_name
Average Adelie     127
Other               84
Big Mouth Billy     71
Dainty Daisy        62
Name: count, dtype: int64