# Pandas I — Introduction to Data Manipulation (DataFrame-focused)

**Course:** Data Science 100 — Spring 2025  
**Author:** Wook-Shin Han (POSTECH)  
**Last generated:** 2025-09-17 00:14

This notebook is a *hands-on* companion to the *Pandas I* lecture.
Run cells one-by-one and observe the outputs. Explanations are in English.
Each topic ends with **Exercises** and an **empty code cell** for you to type and run your answer.


## Setup

- Import pandas and numpy
- Use friendly display options
- Confirm library versions


In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 20)
pd.set_option("display.width", 120)

print("pandas:", pd.__version__)
print("numpy :", np.__version__)

pandas: 2.3.3
numpy : 2.2.6


## 1) DataFrame Anatomy & Diagnostics

A **DataFrame** is a 2D table with labeled rows (index) and columns. Each column is a `Series` with its own dtype.
Key attributes and methods you should know:
- `df.index`, `df.columns`, `df.dtypes`, `df.shape`
- `df.info()` for memory and non-null counts
- `df.select_dtypes(include=...)` to pick numeric, string, category, etc.


In [2]:
# Example city table with explicit dtypes
df = pd.DataFrame({
    "city":       pd.Series(["Oslo", "Vienna", "Tokyo"], dtype="string"),
    "population": pd.Series([698_660, 1_911_191, 14_043_239], dtype="Int64"),
    "area":       pd.Series([480.8, 414.8, 2194.1],     dtype="float64"),
})
df

Unnamed: 0,city,population,area
0,Oslo,698660,480.8
1,Vienna,1911191,414.8
2,Tokyo,14043239,2194.1


In [3]:
print("Index:", df.index)
print("Columns:", list(df.columns))
print("\nDtypes:")
print(df.dtypes)
print("\nInfo:")
print(df.info())

Index: RangeIndex(start=0, stop=3, step=1)
Columns: ['city', 'population', 'area']

Dtypes:
city          string[python]
population             Int64
area                 float64
dtype: object

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   city        3 non-null      string 
 1   population  3 non-null      Int64  
 2   area        3 non-null      float64
dtypes: Int64(1), float64(1), string(1)
memory usage: 203.0 bytes
None


<a id='ex1'></a>
### Exercise — Anatomy: density & index

Use the `df` above and do the following:
1. Create a new column `density = population / area` (people per km²).
2. Set the index to `city`.
3. Show the dtypes and the first 3 rows.


In [10]:
# Type your answer here
df['density']=df['population']/df['area']
df.set_index('city')
df.dtypes, df.head(3)

(city          string[python]
 population             Int64
 area                 float64
 density              Float64
 dtype: object,
      city  population    area      density
 0    Oslo      698660   480.8    1453.1198
 1  Vienna     1911191   414.8       4607.5
 2   Tokyo    14043239  2194.1  6400.455312)

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
df2 = df.copy()
df2["density"] = df2["population"] / df2["area"]
df2 = df2.set_index("city")
display(df2.dtypes)
display(df2.head(3))
```

**Explanation:**
We compute density by vectorized division of two numeric columns.
Setting the index to `city` changes the row labels, improving label-based selection.
Dtypes show `Int64` for nullable integers, `float64` for continuous values, and `string` for text.
</details>


## 2) Creating DataFrames

Typical ways to build a DataFrame:
- From **dict** of columns
- From **lists** / list of records
- From one or more **Series**
- From **CSV / files** with `pd.read_csv(...)`

Below we demonstrate each and inspect outcomes.


In [11]:
# From a dictionary of columns
student = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Math": [85, 92, 78],
    "Science": [88, 95, 82]
})
student

Unnamed: 0,Name,Math,Science
0,Alice,85,88
1,Bob,92,95
2,Charlie,78,82


In [12]:
# From a list of records + custom column names
data = [["Alice", 85, 88], ["Bob", 92, 95], ["Charlie", 78, 82]]
student2 = pd.DataFrame(data, columns=["Name", "Math", "Science"])
student2

Unnamed: 0,Name,Math,Science
0,Alice,85,88
1,Bob,92,95
2,Charlie,78,82


In [13]:
# From multiple Series
names = pd.Series(["Alice", "Bob", "Charlie"], name="Name")
math  = pd.Series([85, 92, 78], name="Math")
student3 = pd.DataFrame({"Name": names, "Math": math})
student3

Unnamed: 0,Name,Math
0,Alice,85
1,Bob,92
2,Charlie,78


In [22]:
# From CSV (we prepared a sample at: /mnt/data/elections.csv)
elections = pd.read_csv("elections.csv")
elections.head(5)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,Percentage
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,41.4
1,1824,John Quincy Adams,Democratic-Republican,113142,win,31.0
2,1828,Andrew Jackson,Democratic,642806,win,56.2
3,2016,Donald Trump,Republican,62984828,win,46.1
4,2016,Hillary Clinton,Democratic,65853514,loss,48.2


<a id='ex2'></a>
### Exercise — Create & filter from CSV

Read the file `elections.csv` from `/mnt/data/elections.csv` and perform:
1. Keep only rows where `Result == "win"` and `Year >= 2000`.
2. Show the subset with columns `["Year","Candidate","Party"]` sorted by `Year` ascending.
3. Compute the number of wins per `Party` using `value_counts()` or `groupby().size()`.


In [47]:
# Type your answer here
elections[(elections['Result'] == "win") & (elections['Year'] >= 2000)]
st = elections[["Year","Candidate","Party"]].sort_values(by='Year', ascending=True)
st["Party"].value_counts(), st.groupby("Party").size()

(Party
 Democratic               4
 Democratic-Republican    2
 Republican               2
 Name: count, dtype: int64,
 Party
 Democratic               4
 Democratic-Republican    2
 Republican               2
 dtype: int64)

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
e = pd.read_csv("/mnt/data/elections.csv")
w = e[(e["Result"] == "win") & (e["Year"] >= 2000)]
subset = w[["Year","Candidate","Party"]].sort_values("Year")
wins_by_party = w["Party"].value_counts()
display(subset)
display(wins_by_party)
```

**Explanation:**
We filter with a boolean mask combining conditions using `&`.
Selecting columns keeps the view focused. Sorting clarifies chronology.
`value_counts()` quickly yields party win frequencies; `groupby("Party").size()` is equivalent.
</details>


## 3) Selection — `.loc`, `.iloc`, and `[]`

- Use `.loc[row_labels, col_labels]` for **label-based** selection (inclusive slices).
- Use `.iloc[row_positions, col_positions]` for **position-based** selection (exclusive slices).
- Use `[]` for quick **column** selection, e.g., `df["col"]` (returns a Series).


In [48]:
grades = pd.DataFrame({
    "Name": ["Alice","Bob","Charlie","Diana"],
    "Math": [85, 92, 78, 96],
    "Sci" : [88, 95, 82, 91],
    "Team": ["A","B","A","B"]
}).set_index("Name")
grades

Unnamed: 0_level_0,Math,Sci,Team
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alice,85,88,A
Bob,92,95,B
Charlie,78,82,A
Diana,96,91,B


In [49]:
# Label-based: inclusive slice on index
grades.loc["Alice":"Charlie", ["Math","Sci"]]

Unnamed: 0_level_0,Math,Sci
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,92,95
Charlie,78,82


In [50]:
# Position-based: exclusive stop (rows 0..1, cols 0..1)
grades.iloc[0:2, 0:2]

Unnamed: 0_level_0,Math,Sci
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,92,95


In [51]:
# Quick column take
grades[["Math","Sci"]].head(2)

Unnamed: 0_level_0,Math,Sci
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,92,95


<a id='ex3'></a>
### Exercise — Label vs Position Selection

Using the `grades` DataFrame with `Name` as the index:
1. Select `Bob` and `Diana` rows, columns `["Math","Team"]` (label-based).
2. Select the last 2 rows and the first 2 columns (position-based).
3. Return the subtable where `Sci >= 90` and show only `["Sci","Team"]`.


In [68]:
# Type your answer here
grades.loc[['Bob','Diana'], ["Math","Team"]]
grades.iloc[-2:], grades.iloc[:2]
grades.loc[grades['Sci']>=90, ['Sci','Team']]

Unnamed: 0_level_0,Sci,Team
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Bob,95,B
Diana,91,B


<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
# 1) label-based
part1 = grades.loc[["Bob","Diana"], ["Math","Team"]]
# 2) position-based
part2 = grades.iloc[-2:, :2]
# 3) conditional filter
part3 = grades.loc[grades["Sci"] >= 90, ["Sci","Team"]]
display(part1); display(part2); display(part3)
```

**Explanation:**
`.loc` uses index labels and is inclusive for slices; `.iloc` uses integer positions and exclusive slicing.
Boolean masks (`grades["Sci"] >= 90`) filter rows by value conditions; column subsetting focuses the output.
</details>


## 4) Boolean Filtering & Across-Columns Tests

- Build masks with comparisons, combine via `&`, `|`, `~` (use parentheses).
- Use `df.isna().any(axis=1)` / `all(axis=1)` to test across columns per row.
- `Series.isin([...])` supports set membership.


In [70]:
people = pd.DataFrame({
    "name": ["Ann","Ben","Cat","Dan","Eve"],
    "age": [25, 19, 31, np.nan, 28],
    "score": [88, 77, 93, 85, np.nan]
})
people

Unnamed: 0,name,age,score
0,Ann,25.0,88.0
1,Ben,19.0,77.0
2,Cat,31.0,93.0
3,Dan,,85.0
4,Eve,28.0,


In [71]:
mask = (people["age"] >= 21) & (people["score"] >= 85)
adults_good = people.loc[mask, ["name","age","score"]]
has_missing = people.isna().any(axis=1)
adults_good, has_missing

(  name   age  score
 0  Ann  25.0   88.0
 2  Cat  31.0   93.0,
 0    False
 1    False
 2    False
 3     True
 4     True
 dtype: bool)

<a id='ex4'></a>
### Exercise — Boolean masks & missing tests

With the `people` DataFrame above:
1. Keep rows where `age >= 21` **or** `score >= 90`, show `["name","age","score"]`.
2. Add a new boolean column `good = (score >= 85)` (treat missing as `False`).
3. Select rows where **any** column is missing. Then compute the per-column missing counts.


In [90]:
# Type your answer here
people.loc[(people['age']>=21) | (people['score']>=90), ["name","age","score"]]
people['good'] = people['score'].fillna(-np.inf)>=85
people.loc[people.isna().any(axis=1)]
people.isna().sum(axis=0)

name     0
age      1
score    1
good     0
dtype: int64

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
p = people.copy()
part1 = p.loc[(p["age"] >= 21) | (p["score"] >= 90), ["name","age","score"]]
p["good"] = (p["score"].fillna(-np.inf) >= 85)
missing_rows = p[p.isna().any(axis=1)]
missing_counts = p.isna().sum()
display(part1); display(p); display(missing_rows); display(missing_counts)
```

**Explanation:**
Use `|` for logical OR, `&` for AND, and wrap each side in parentheses.
For the `good` column, we filled missing scores with `-inf` to avoid truthy comparison.
Row-level missing detection uses `isna().any(axis=1)`; column totals use `isna().sum()`.
</details>


## 5) Missing Values — `fillna`, `ffill/bfill`, `interpolate`

- `np.nan` is the NumPy floating NaN; `pd.NA` is pandas' missing sentinel for *nullable* dtypes (`Int64`, `boolean`, `string`).
- Interpolation requires ordered index (`method="time"` expects a `DatetimeIndex`).
- Choose fills deliberately; document assumptions.


In [91]:
s = pd.Series([1.0, np.nan, np.nan, 4.0, np.nan], name="s")
t_idx = pd.date_range("2024-01-01", periods=5, freq="D")
t = pd.Series([1.0, np.nan, 4.0, np.nan, 9.0], index=t_idx, name="t")
display(s)
display(t.head())

0    1.0
1    NaN
2    NaN
3    4.0
4    NaN
Name: s, dtype: float64

2024-01-01    1.0
2024-01-02    NaN
2024-01-03    4.0
2024-01-04    NaN
2024-01-05    9.0
Freq: D, Name: t, dtype: float64

In [92]:
display(s.fillna(0))
display(s.ffill())
display(s.bfill(limit=1))
display(t.interpolate(method="time"))

0    1.0
1    0.0
2    0.0
3    4.0
4    0.0
Name: s, dtype: float64

0    1.0
1    1.0
2    1.0
3    4.0
4    4.0
Name: s, dtype: float64

0    1.0
1    NaN
2    4.0
3    4.0
4    NaN
Name: s, dtype: float64

2024-01-01    1.0
2024-01-02    2.5
2024-01-03    4.0
2024-01-04    6.5
2024-01-05    9.0
Freq: D, Name: t, dtype: float64

<a id='ex5'></a>
### Exercise — Fill strategies & residual NaNs

Create a DataFrame:
```
df = pd.DataFrame({
    "a": [1, np.nan, 3, np.nan],
    "b": [np.nan, 2.5, np.nan, 5.0],
})
```
Tasks:
1. Count missings per column and overall.
2. Fill `a` with its **median** and `b` with **forward-fill**.
3. Create `c = a + b` and explain any `NaN` that remains.


In [102]:
# Type your answer here
df = pd.DataFrame({
    "a": [1, np.nan, 3, np.nan],
    "b": [np.nan, 2.5, np.nan, 5.0],
})
df.isna().sum()
df['a'] = df['a'].fillna(df['a'].median())
df['b'] = df['b'].ffill()
df['c'] = df['a'] + df['b']
df

Unnamed: 0,a,b,c
0,1.0,,
1,2.0,2.5,4.5
2,3.0,2.5,5.5
3,2.0,5.0,7.0


<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
df = pd.DataFrame({
    "a": [1, np.nan, 3, np.nan],
    "b": [np.nan, 2.5, np.nan, 5.0],
})
miss_cols = df.isna().sum()
miss_total = int(df.isna().sum().sum())
df2 = df.copy()
df2["a"] = df2["a"].fillna(df2["a"].median())
df2["b"] = df2["b"].ffill()
df2["c"] = df2["a"] + df2["b"]
display(miss_cols, miss_total)
display(df2)
```

**Explanation:**
Column `a` uses the column median for a robust fill; `b` uses forward-fill to propagate last seen values.
Any remaining `NaN` in `c` occur where both `a` and `b` were missing at the same location (no value to compute).
</details>


## 6) Alignment in Arithmetic

Binary operations align on **labels**. Non-overlapping labels produce `NaN` unless you use method forms with `fill_value`.


In [103]:
s1 = pd.Series({"a": 1.0, "b": 2.0})
s2 = pd.Series({"b": 10.0, "c": 100.0})
display(s1 + s2)
display(s1.add(s2, fill_value=0))

a     NaN
b    12.0
c     NaN
dtype: float64

a      1.0
b     12.0
c    100.0
dtype: float64

<a id='ex6'></a>
### Exercise — Alignment & fill_value

Given:
```
x = pd.Series({"k1": 10.0, "k2": 20.0})
y = pd.Series({"k2": 1.5, "k3": 2.0})
```
Tasks:
1. Compute `x + y` and explain the `NaN`s.
2. Recompute with `.add(..., fill_value=0)` and compare.
3. Convert the result to integers by filling NaNs with 0 and casting to `Int64`.


In [112]:
# Type your answer here
x = pd.Series({"k1": 10.0, "k2": 20.0})
y = pd.Series({"k2": 1.5, "k3": 2.0})
x.add(y).fillna(0).astype(int)

k1     0
k2    21
k3     0
dtype: int64

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
x = pd.Series({"k1": 10.0, "k2": 20.0})
y = pd.Series({"k2": 1.5, "k3": 2.0})
r1 = x + y
r2 = x.add(y, fill_value=0)
r3 = r2.fillna(0).astype("Int64")
display(r1); display(r2); display(r3)
```

**Explanation:**
`x + y` produces a union of keys; entries without counterparts become `NaN`.
Method form with `fill_value=0` treats missing as zero during the operation.
Finally, we fill any remaining `NaN` and cast to nullable integer `Int64`.
</details>


## 7) Sorting, Ranking, Top-k, Quantiles

- `df.sort_values(by=[...])` for ordering by columns
- `df.nlargest(k, "col")`/`nsmallest` for quick top/bottom by a column
- `df.rank(method="dense", ascending=False)` for ranks with ties
- `df.quantile([0.25, 0.5, 0.75])` for quartiles


In [113]:
scores = pd.DataFrame({
    "Name": ["Ann","Ben","Cat","Dan","Eve"],
    "Math": [81, 95, 95, 77, 89],
    "Sci":  [79, 92, 91, 85, 83],
})
scores["Total"] = scores["Math"] + scores["Sci"]
scores

Unnamed: 0,Name,Math,Sci,Total
0,Ann,81,79,160
1,Ben,95,92,187
2,Cat,95,91,186
3,Dan,77,85,162
4,Eve,89,83,172


In [114]:
scores.sort_values(by=["Total","Math"], ascending=[False, False]).head(3)  # tie-breaker by Math

Unnamed: 0,Name,Math,Sci,Total
1,Ben,95,92,187
2,Cat,95,91,186
4,Eve,89,83,172


<a id='ex7'></a>
### Exercise — Top-k & ranks

Using `scores` (with a `Total` column):
1. Get the **top-3** rows by `Total`. If ties occur, break them by `Math` descending.
2. Add a column `Rank = dense rank of Total (descending)`.
3. Compute the quartiles (25%, 50%, 75%) of `Total`.


In [122]:
# Type your answer here
scores.sort_values(by=['Total','Math'], ascending=[False, False]).head(3)
scores['rank'] = scores['Total'].rank(method='dense', ascending=False)
scores['Total'].quantile([0.25,0.5,0.75])

0.25    162.0
0.50    172.0
0.75    186.0
Name: Total, dtype: float64

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
top3 = scores.sort_values(by=["Total","Math"], ascending=[False, False]).head(3)
scores["Rank"] = scores["Total"].rank(method="dense", ascending=False)
quart = scores["Total"].quantile([0.25, 0.5, 0.75])
display(top3); display(scores); display(quart)
```

**Explanation:**
We sort by multiple keys to break ties deterministically.
`dense` ranking gives 1,2,3,... even when values tie.
Quantiles summarize distribution; the median is the 0.5 quantile.
</details>


## 8) Windows — Rolling, Expanding, EWM

- `rolling(w, min_periods)` computes statistics over the last `w` rows
- `expanding()` uses a growing window from the start
- `ewm(alpha, adjust=False)` applies exponentially decaying weights


In [123]:
s = pd.Series([5, 2, 7, 4, 9], index=list("abcde"))
roll3 = s.rolling(3, min_periods=1).mean()
expd  = s.expanding().sum()
expo  = s.ewm(alpha=0.5, adjust=False).mean()
display(roll3.round(4)); display(expd); display(expo.round(4))

a    5.0000
b    3.5000
c    4.6667
d    4.3333
e    6.6667
dtype: float64

a     5.0
b     7.0
c    14.0
d    18.0
e    27.0
dtype: float64

a    5.0000
b    3.5000
c    5.2500
d    4.6250
e    6.8125
dtype: float64

<a id='ex8'></a>
### Exercise — Windowed stats

For `s = [5,2,7,4,9]` (index a..e):
1. Compute a 3-point rolling **mean** with `min_periods=2` and round to 4 decimals.
2. Compute the **expanding mean**.
3. Compute `ewm(alpha=0.3, adjust=False).mean()` and explain how `alpha` affects smoothing.


In [132]:
# Type your answer here
s.rolling(3, min_periods=2).mean().round(4)
s.expanding().mean().round(4)
s.ewm(alpha=0.3, adjust=False).mean().round(4)

a    5.0000
b    4.1000
c    4.9700
d    4.6790
e    5.9753
dtype: float64

<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
s = pd.Series([5, 2, 7, 4, 9], index=list("abcde"))
r = s.rolling(3, min_periods=2).mean().round(4)
e = s.expanding().mean().round(4)
w = s.ewm(alpha=0.3, adjust=False).mean().round(4)
display(r); display(e); display(w)
```

**Explanation:**
`min_periods=2` requires at least 2 observations before producing a value (early entries are NaN).
`expanding().mean()` averages from the start to each step.
For EWM, smaller `alpha` places more weight on the past (heavier smoothing); larger `alpha` reacts more to recent values.
</details>


## 9) GroupBy Basics — Sum, Count, Mean

- `df.groupby(keys).agg({...})` for grouped aggregations
- Use `transform` to broadcast per-group stats back to rows (same shape as original)


In [134]:
sales = pd.DataFrame({
    "store": ["A","A","A","B","B","B"],
    "day":   ["Mon","Tue","Wed","Mon","Tue","Wed"],
    "revenue": [100, 120, 130, 80, 70, 150]
})
sales

Unnamed: 0,store,day,revenue
0,A,Mon,100
1,A,Tue,120
2,A,Wed,130
3,B,Mon,80
4,B,Tue,70
5,B,Wed,150


In [135]:
total_by_store = sales.groupby("store")["revenue"].sum()
mean_by_store  = sales.groupby("store")["revenue"].mean().round(2)
display(total_by_store, mean_by_store)

store
A    350
B    300
Name: revenue, dtype: int64

store
A    116.67
B    100.00
Name: revenue, dtype: float64

<a id='ex9'></a>
### Exercise — Per-store normalization & pivot

With `sales` above:
1. Create a column `rev_norm` = revenue **divided by the store's mean revenue** (hint: `transform("mean")`).
2. Get a pivot table with rows=store, columns=day, values=revenue (use `pivot`).
3. For each store, compute the day with **max revenue** (return a Series mapping store -> day).


In [148]:
# Type your answer here
s2 = sales.copy()
s2["rev_norm"] = s2["revenue"] / s2.groupby(by=['store'])['revenue'].transform('mean')
piv = s2.pivot(index='store',columns='day',values='revenue')
argmax_day = s2.loc[s2.groupby('store')['revenue'].idxmax()].set_index("store")["day"]
argmax_day

store
A    Wed
B    Wed
Name: day, dtype: object

In [190]:
s4 = sales.copy()
s5 = s4.groupby('store').agg(temp=('revenue', 'mean'))
s4 = s4.merge(s5, on='store', how='left')
s4['rev_norm'] = s4['revenue'] / s4['temp']
s4 = s4.drop('temp', axis=1)
s4

Unnamed: 0,store,day,revenue,rev_norm
0,A,Mon,100,0.857143
1,A,Tue,120,1.028571
2,A,Wed,130,1.114286
3,B,Mon,80,0.8
4,B,Tue,70,0.7
5,B,Wed,150,1.5


<details>
<summary><strong>▶ Show solution (click)</strong></summary>

**Solution code:**
```python
s2 = sales.copy()
s2["rev_norm"] = s2["revenue"] / s2.groupby("store")["revenue"].transform("mean")
piv = s2.pivot(index="store", columns="day", values="revenue")
argmax_day = s2.loc[s2.groupby("store")["revenue"].idxmax()].set_index("store")["day"]
display(s2, piv, argmax_day)
```

**Explanation:**
`transform` returns a Series aligned to the original shape, enabling row-wise normalization.
`pivot` reshapes long -> wide (one value per (store,day)).
`idxmax` finds the row index of the maximum per group; selecting those rows and reindexing yields store -> day.
</details>


---
## Wrap-up

- You practiced DataFrame anatomy, creation, selection, filtering, missing-value handling, alignment, sorting/ranking, windows, and GroupBy.
- Each section included runnable examples and exercises.
- Keep this notebook handy as a template for future labs.
