---
title: "Lab 3 — Singing a Song (Shiqi Wu)"
format:
  html:
    self-contained: true   
    toc: false
    code-fold: false
jupyter: python3
---

**Repository:** [GitHub – Lab 3](https://github.com/shiqiwu212/GSB-S544-01/tree/892bf19fb5bc2f29c66e410edb65fc661b1c30a4/Week%203/Lab)



## Step 0 — Setup:

In [126]:
# AI Assistance:
# Uses requests+certifi to load HTTPS CSVs reliably in local environments where 
# the system CA store may be incomplete. 

import io
import requests
import certifi
import pandas as pd

def read_csv_https(url: str) -> pd.DataFrame:
    resp = requests.get(url, verify=certifi.where(), timeout=30)
    resp.raise_for_status()
    return pd.read_csv(io.BytesIO(resp.content))

xmas = read_csv_https(
    "https://www.dropbox.com/scl/fi/qxaslqqp5p08i1650rpc4/xmas.csv?rlkey=erdxi7jbh7pqf9fh4lv4cayp5&dl=1"
)

## Function 1: `pluralize_gift()`:


In [127]:
# AI Assistance:
# Implements noun pluralization: irregulars, consonant+y→ies, sibilants→es, default +s.
# Accepts either a single string or a pandas Series (vectorized via .apply).

def pluralize_gift(gift):
    """Return the plural form of a noun or noun phrase.
       If a pandas Series is provided, return a Series with elementwise pluralization.
    """
    # Vectorized case: if a Series is passed, apply elementwise
    if isinstance(gift, pd.Series):
        return gift.apply(pluralize_gift)

    if not isinstance(gift, str) or gift.strip() == "":
        return gift

    words = gift.strip().split()
    last = words[-1].lower()

    irregular = {
        "goose": "geese",   # day 6
        "lady":  "ladies"   # day 9
    }

    if last in irregular:
        words[-1] = irregular[last]
        return " ".join(words)

    vowels = set("aeiou")
    if last.endswith("y") and len(last) > 1 and last[-2] not in vowels:
        words[-1] = last[:-1] + "ies"
        return " ".join(words)

    if any(last.endswith(suf) for suf in ["s", "sh", "ch", "x", "z"]):
        words[-1] = last + "es"
        return " ".join(words)

    words[-1] = last + "s"
    return " ".join(words)

## Test Function 1:

In [128]:
# Should work
pluralize_gift("goose")

# Will work if your function is vectorized! 
pluralize_gift(xmas['Gift.Item'])

0     partridges
1          doves
2           hens
3          birds
4          rings
5          geese
6          swans
7          maids
8         ladies
9          lords
10        pipers
11      drummers
Name: Gift.Item, dtype: object

## Function 2: `build_phrase()` and create `Full.Phrase`:


In [129]:
# Implementation notes:
# 1) Fill NAs to blanks for verb/adjective/location
# 2) Pluralize item only when num > 1 (calls pluralize_gift)
# 3) Detect vowel start on the head token (adjective if non-blank else item)
# 4) For day 1, choose a/an; otherwise use num_word
# 5) Assemble pieces per row and return a Series of phrases

# AI Assistance:
# Assistance focused on matching the function signature to the lab template, 
# vectorizing over pandas Series, and handling edge cases(missing values and decisions).
# The function prefers `num_word` when it is usable; otherwise it safely falls back to
# deriving cardinal words from `num`.

def make_phrase(
    num: pd.Series,
    num_word: pd.Series,
    item: pd.Series,
    verb: pd.Series,
    adjective: pd.Series,
    location: pd.Series
) -> pd.Series:
    """Return a Series of gift phrases for all rows (vectorized)."""

    # Coerce inputs to Series if a scalar/list sneaks in (keeps tests robust)
    to_series = lambda x: x if hasattr(x, "dtype") else pd.Series(x)
    num        = to_series(num)
    num_word   = to_series(num_word)
    item       = to_series(item)
    verb       = to_series(verb)
    adjective  = to_series(adjective)
    location   = to_series(location)

    ## Step 1: Replace NAs with blank strings
    verb = verb.fillna("")
    adjective = adjective.fillna("")
    location = location.fillna("")

    ## Step 2: If the day number is larger than 1, the gift items need pluralized!
    # Uses Function 1 `pluralize_gift` which supports Series.
    plural_all = pluralize_gift(item)
    item_out = item.copy()
    n_int = pd.to_numeric(num, errors="coerce").astype("Int64")
    mask_plural = (n_int > 1)
    if hasattr(plural_all, "loc"):  # Series path (vectorized)
        item_out.loc[mask_plural] = plural_all.loc[mask_plural]
    else:  # very defensive fallback
        item_out.loc[mask_plural] = item.loc[mask_plural].apply(pluralize_gift)

    ## Step 3: Figure out if a gift item starts with a vowel (use adjective if present)
    head = adjective.where(adjective.astype(str).str.strip() != "", item_out)
    starts_vowel = head.astype(str).str.strip().str[0].str.lower().isin(list("aeiou"))

    ## Step 4: For the first day, if the gift item starts with a vowel, replace the day with a/an
    # Prefer `num_word` (cardinal or convertible ordinal) else fall back to mapping from `num`.
    CARDINAL = {
        1:"one", 2:"two", 3:"three", 4:"four", 5:"five", 6:"six",
        7:"seven", 8:"eight", 9:"nine", 10:"ten", 11:"eleven", 12:"twelve"
    }
    ORD2CARD = {
        "first":"one", "second":"two", "third":"three", "fourth":"four",
        "fifth":"five", "sixth":"six", "seventh":"seven", "eighth":"eight",
        "ninth":"nine", "tenth":"ten", "eleventh":"eleven", "twelfth":"twelve"
    }

    nw = num_word.astype("string").str.strip().str.lower()
    # Convert ordinals to cardinals when applicable; otherwise keep as-is
    nw_cand = nw.map(ORD2CARD).fillna(nw)
    # Accept if it already matches a valid cardinal word; else fall back to `num`
    valid_cards = pd.Series(CARDINAL.values(), dtype="string")
    is_card = nw_cand.isin(valid_cards)
    number_part = nw_cand.where(is_card, n_int.map(CARDINAL)).astype("string")

    # Day 1 overrides with a/an based on vowel start of the head token
    mask_day1 = (n_int == 1)
    number_part.loc[mask_day1 & starts_vowel] = "an"
    number_part.loc[mask_day1 & ~starts_vowel] = "a"

    ## Step 5: Put all of the pieces together into one string and return!
    parts = pd.concat(
        [
            number_part.str.strip(),
            adjective.astype(str).str.strip(),
            item_out.astype(str).str.strip(),
            verb.astype(str).str.strip(),
            location.astype(str).str.strip(),
        ],
        axis=1,
    )
    phrases = parts.apply(
        lambda row: " ".join([s for s in row if isinstance(s, str) and s.strip() != ""]).strip(),
        axis=1,
    )
    return phrases

## Test Function 2:

In [130]:
# AI Assistance:
# Create the required column Full.Phrase on xmas using make_phrase (vectorized).

# Pick columns (handle slight naming variations)
item_col = xmas["Item"] if "Item" in xmas.columns else xmas["Gift.Item"]
verb_col = xmas["Verb"] if "Verb" in xmas.columns else pd.Series([""] * len(xmas), index=xmas.index)
adj_col  = xmas["Adjective"] if "Adjective" in xmas.columns else pd.Series([""] * len(xmas), index=xmas.index)
loc_col  = xmas["Location"] if "Location" in xmas.columns else pd.Series([""] * len(xmas), index=xmas.index)

# Required: build the Full.Phrase column on xmas
xmas["Full.Phrase"] = make_phrase(
    num       = xmas["Day"],
    num_word  = xmas["Day.in.Words"],
    item      = item_col,
    verb      = verb_col,
    adjective = adj_col,
    location  = loc_col
)

# Quick preview (first 12 rows)
xmas[["Day", "Full.Phrase"]].head(12)

Unnamed: 0,Day,Full.Phrase
0,1,a partridge in a pear tree
1,2,two turtle doves
2,3,three french hens
3,4,four calling birds
4,5,five golden rings
5,6,six geese a-laying
6,7,seven swans a-swimming
7,8,eight maids a-milking
8,9,nine ladies dancing
9,10,ten lords a-leaping


## Function 3: `sing_day()`:


In [131]:
# AI Assistance:
# Assistance focused on matching the scaffolded steps, mapping an integer to its
# ordinal word, formatting the special "and" on day 2 and the final period on day 1,
# and returning a single multi-line string. 

def sing_day(dataset: pd.DataFrame, num: int, phrase_col: str) -> str:
    """
    Build and return the stanza for a given day.

    Parameters
    ----------
    dataset : pd.DataFrame
        Data containing a 'Day' column and the phrase column.
    num : int
        The day number to sing.
    phrase_col : str
        Name of the column that holds the per-day gift phrase.

    Returns
    -------
    str
        A multi-line stanza for the specified day.
    """

    # Step 1: Setup the intro line (convert 1 -> "first", etc.)
    ORDINAL = {
        1: "first",  2: "second", 3: "third",   4: "fourth",
        5: "fifth",  6: "sixth",  7: "seventh", 8: "eighth",
        9: "ninth", 10: "tenth", 11: "eleventh",12: "twelfth"
    }
    num_word = ORDINAL.get(int(num), str(num))
    intro = f"On the {num_word} day of Christmas, my true love sent to me:"

    # Step 2: Sing the gift phrases (from num down to 1)
    lines = []
    for d in range(int(num), 0, -1):
        match = dataset.loc[dataset["Day"] == d, phrase_col]
        phrase = "" if match.empty else str(match.iloc[0])

        if num >= 2 and d == 2:
            lines.append(phrase + " and")
        elif d == 1:
            lines.append(phrase + ".")
        else:
            lines.append(phrase)

    # Step 3: Put it all together and return
    return "\n".join([intro] + lines)

## Test Function 3:

In [132]:
# Test Function 3 (pretty print)
print(sing_day(xmas, 3, "Full.Phrase"))

On the third day of Christmas, my true love sent to me:
three french hens
two turtle doves and
a partridge in a pear tree.


## Use Your Functions! — sing the full 12 days on xmas


In [133]:
# Helper: small utility to read HTTPS CSVs reliably on some local setups.
# This is not part of the graded functions; it only wraps pd.read_csv with certifi.

# AI Assistance:
# Build/format stanzas for all 12 days on xmas, with correct commas/“and”/periods
# and a blank line between stanzas.

# Helpers:
ORDINAL = {1:"first",2:"second",3:"third",4:"fourth",5:"fifth",6:"sixth",
           7:"seventh",8:"eighth",9:"ninth",10:"tenth",11:"eleventh",12:"twelfth"}

def ensure_full_phrase(df: pd.DataFrame, phrase_col: str = "Full.Phrase") -> pd.DataFrame:
    """Create Full.Phrase via make_phrase if it's missing."""
    if phrase_col not in df.columns:
        item = df["Item"] if "Item" in df.columns else df["Gift.Item"]
        verb = df["Verb"] if "Verb" in df.columns else pd.Series([""]*len(df), index=df.index)
        adj  = df["Adjective"] if "Adjective" in df.columns else pd.Series([""]*len(df), index=df.index)
        loc  = df["Location"] if "Location" in df.columns else pd.Series([""]*len(df), index=df.index)
        df[phrase_col] = make_phrase(
            num       = df["Day"],
            num_word  = df["Day.in.Words"],
            item      = item,
            verb      = verb,
            adjective = adj,
            location  = loc
        )
    return df

def stanza(df: pd.DataFrame, day: int, phrase_col: str = "Full.Phrase") -> str:
    """One stanza (intro + lines) with punctuation and 'and' handled."""
    intro = f"On the {ORDINAL[day]} day of Christmas, my true love sent to me:"
    lines = (
        df.loc[df["Day"].between(1, day)]
          .sort_values("Day", ascending=False)[phrase_col]
          .tolist()
    )
    out = []
    for i, txt in enumerate(lines):
        last = len(lines) - 1
        if i == last:            # last line -> period
            out.append(f"{txt}.")
        elif i == last - 1:      # second to last -> comma + and
            out.append(f"{txt}, and")
        else:                    # others -> comma
            out.append(f"{txt},")
    return intro + "\n" + "\n".join(out)

def sing_song(df: pd.DataFrame, phrase_col: str = "Full.Phrase") -> None:
    """Print 12 stanzas with a blank line between stanzas."""
    for d in range(1, 13):
        print(stanza(df, d, phrase_col))
        if d < 12:
            print()

# ---------- run on xmas ----------
xmas = ensure_full_phrase(xmas, "Full.Phrase")
sing_song(xmas, "Full.Phrase")

On the first day of Christmas, my true love sent to me:
a partridge in a pear tree.

On the second day of Christmas, my true love sent to me:
two turtle doves, and
a partridge in a pear tree.

On the third day of Christmas, my true love sent to me:
three french hens,
two turtle doves, and
a partridge in a pear tree.

On the fourth day of Christmas, my true love sent to me:
four calling birds,
three french hens,
two turtle doves, and
a partridge in a pear tree.

On the fifth day of Christmas, my true love sent to me:
five golden rings,
four calling birds,
three french hens,
two turtle doves, and
a partridge in a pear tree.

On the sixth day of Christmas, my true love sent to me:
six geese a-laying,
five golden rings,
four calling birds,
three french hens,
two turtle doves, and
a partridge in a pear tree.

On the seventh day of Christmas, my true love sent to me:
seven swans a-swimming,
six geese a-laying,
five golden rings,
four calling birds,
three french hens,
two turtle doves, and
a 

## Use Your Functions! — load xmas2 and sing again


In [134]:
XMAS2_LINK = "https://www.dropbox.com/scl/fi/p9x9k8xwuzs9rhp582vfy/xmas_2.csv?rlkey=kvc3j3lmyn4opcidsrhcmrof1&dl=1"

try:
    xmas2 = read_csv_https(XMAS2_LINK)  
except NameError:
    xmas2 = pd.read_csv(XMAS2_LINK)

xmas2 = ensure_full_phrase(xmas2, "Full.Phrase")
sing_song(xmas2, "Full.Phrase")

On the first day of Christmas, my true love sent to me:
an email from Cal Poly.

On the second day of Christmas, my true love sent to me:
two meal points, and
an email from Cal Poly.

On the third day of Christmas, my true love sent to me:
three lost pens,
two meal points, and
an email from Cal Poly.

On the fourth day of Christmas, my true love sent to me:
four course reviews,
three lost pens,
two meal points, and
an email from Cal Poly.

On the fifth day of Christmas, my true love sent to me:
five practice exams,
four course reviews,
three lost pens,
two meal points, and
an email from Cal Poly.

On the sixth day of Christmas, my true love sent to me:
six graders grading,
five practice exams,
four course reviews,
three lost pens,
two meal points, and
an email from Cal Poly.

On the seventh day of Christmas, my true love sent to me:
seven seniors stressing,
six graders grading,
five practice exams,
four course reviews,
three lost pens,
two meal points, and
an email from Cal Poly.

On t

## Make it nice — Whitespace

In [135]:
# AI Assistance: 
# Normalize spaces to ensure one space between tokens and no leading/trailing spaces.

def normalize_space_series(s: pd.Series) -> pd.Series:
    return s.astype(str).str.replace(r"\s+", " ", regex=True).str.strip()

xmas["Full.Phrase"]  = normalize_space_series(xmas["Full.Phrase"])
xmas2["Full.Phrase"] = normalize_space_series(xmas2["Full.Phrase"])

xmas[["Day", "Full.Phrase"]].head(5)

Unnamed: 0,Day,Full.Phrase
0,1,a partridge in a pear tree
1,2,two turtle doves
2,3,three french hens
3,4,four calling birds
4,5,five golden rings


## Make it nice — New lines 

In [136]:
print(sing_day(xmas, 2, "Full.Phrase"))

On the second day of Christmas, my true love sent to me:
two turtle doves and
a partridge in a pear tree.


## Make it nice — Separating lines

In [137]:
print(sing_day(xmas, 2, "Full.Phrase"))
print() 
print(sing_day(xmas, 1, "Full.Phrase"))

On the second day of Christmas, my true love sent to me:
two turtle doves and
a partridge in a pear tree.

On the first day of Christmas, my true love sent to me:
a partridge in a pear tree.


## Make it nice — Grammar 

In [138]:
# AI Assistance:
# Provided guidance to add the ordinal dictionary and structure `sing_day()`.
# Assisted with punctuation rules (commas on inner lines, ", and" on the penultimate line,
# and a period on the final line) and with printing order from day N down to 1.

ORDINAL = {1:"first",2:"second",3:"third",4:"fourth",5:"fifth",6:"sixth",
           7:"seventh",8:"eighth",9:"ninth",10:"tenth",11:"eleventh",12:"twelfth"}

def sing_day(dataset, num: int, phrase_col: str) -> str:
    intro = f"On the {ORDINAL[num]} day of Christmas, my true love gave to me:"
    lines = list(
        dataset.loc[dataset["Day"].between(1, num), phrase_col]
               .iloc[::-1]
    )
    out = []
    for i, txt in enumerate(lines):
        last = len(lines) - 1
        if i == last:
            out.append(f"{txt}.")
        elif i == last - 1:
            out.append(f"{txt}, and")
        else:
            out.append(f"{txt},")
    return intro + "\n" + "\n".join(out)

In [139]:
stanza_day3 = sing_day(xmas, 3, "Full.Phrase")
print(stanza_day3)

On the third day of Christmas, my true love gave to me:
three french hens,
two turtle doves, and
a partridge in a pear tree.
