<img src="./intro_images/logo.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right">Dr Ali Sarrami Foroushani</div>
            <div style="text-align: right">Lecturer in Cardiovascular Biomechanics</div>
            <div style="text-align: right">School of Health Sciences</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
     </tr>
</table>

# Python Visualisation - Part II

This notebook teaches plotting with **Seaborn** using only **tiny, in-notebook data** (lists and dictionaries).

### What you'll learn
- Create a small dataset with lists/dicts and convert to a tiny DataFrame
- Core Seaborn plots: **countplot**, **barplot**, **box/violin**, **scatterplot (with hue)**, **lineplot**, **histplot/KDE**
- Add titles, labels, palettes, and simple styling
- Save a figure

Each section explains the idea, shows a tiny example, then gives you a **YOUR TURN** task.

## 0) Setup
We import Seaborn (and friends), set a clean theme, and slightly increase font sizes for readability.

In [1]:
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set_theme(context="notebook", style="whitegrid")
plt.rcParams.update({
    'figure.dpi': 120,
    'font.size': 12
})

Matplotlib is building the font cache; this may take a moment.


## 1) Tiny datasets right inside Python
Seaborn works best with **DataFrames**. We'll create a few **small lists/dicts**, then convert them to tiny DataFrames.

In [None]:
# --- Tiny student dataset ---
names   = ["Alice","Bob","Cara","Dan","Eli","Fay"]
gender  = ["F","M","F","M","M","F"]
math    = [62, 55, 78, 71, 66, 85]
bio     = [58, 60, 80, 68, 64, 88]
study_h = [2, 1, 3, 2, 2, 4]  # study hours per day

df_students = pd.DataFrame({
    "name": names,
    "gender": gender,
    "math": math,
    "bio": bio,
    "study_hours": study_h
})
df_students

In [3]:
# --- Tiny categorical dataset (snack counts) ---
snack_counts = {"Apples": 12, "Bananas": 7, "Carrots": 5, "Dates": 3}
df_snacks = pd.DataFrame({"snack": list(snack_counts.keys()), "count": list(snack_counts.values())})

# --- Tiny time-like dataset (line plot) ---
days  = ["Mon","Tue","Wed","Thu","Fri"]
steps = [3000, 4500, 4000, 5000, 6000]
df_daily = pd.DataFrame({"day": days, "steps": steps})

# --- Tiny numeric list for distributions ---
heights_cm = [150, 152, 153, 155, 156, 158, 160, 162, 163, 165, 168, 170, 172, 175, 178]
df_heights = pd.DataFrame({"height_cm": heights_cm})

df_snacks, df_daily.head(), df_heights.head()

(     snack  count
 0   Apples     12
 1  Bananas      7
 2  Carrots      5
 3    Dates      3,
    day  steps
 0  Mon   3000
 1  Tue   4500
 2  Wed   4000
 3  Thu   5000
 4  Fri   6000,
    height_cm
 0        150
 1        152
 2        153
 3        155
 4        156)

---
## 2) Countplot â€” counts of a category
A **countplot** shows how many rows belong to each category. Great for checking class balance or simple frequencies.

**Idea:** `sns.countplot(data=..., x='column')`

In [None]:
ax = sns.countplot(data=df_students, x="gender")
ax.set_title("Count by Gender")
ax.set_xlabel("Gender")
ax.set_ylabel("Count")
plt.show()

### âœ… YOUR TURN 1 â€” Countplot for snacks
Use `df_snacks` to draw a count-like bar chart of categories. (Tip: `countplot` expects rows per item; here we have totals, so use **barplot** with `x='snack', y='count'` below, or expand rows yourself.)

In [None]:
# Your code here

In [None]:
# Solution (we'll use barplot because we already have counts per category)
ax = sns.barplot(data=df_snacks, x="snack", y="count")
ax.set_title("Snack Counts")
ax.set_xlabel("Snack")
ax.set_ylabel("Count")
plt.xticks(rotation=15)
plt.show()

---
## 3) Barplot â€” average (or other aggregate) by group
A **barplot** shows a statistic (by default, the **mean**) of a numeric variable for each category.

**Idea:** `sns.barplot(data=..., x='group', y='value')`

In [None]:
ax = sns.barplot(data=df_students, x="gender", y="math")
ax.set_title("Average Math Score by Gender")
ax.set_xlabel("Gender")
ax.set_ylabel("Average Math")
plt.show()

### âœ… YOUR TURN 2 â€” Another barplot
Plot the **average biology score** by gender using `df_students`.

In [None]:
# Your code here

In [None]:
# Solution
ax = sns.barplot(data=df_students, x="gender", y="bio")
ax.set_title("Average Biology Score by Gender")
ax.set_xlabel("Gender")
ax.set_ylabel("Average Biology")
plt.show()

---
## 4) Boxplot & Violin â€” distribution by group
These show distribution shapes per category. **Boxplot** is simpler; **violin** adds a smooth density shape.

**Ideas:**
- `sns.boxplot(data=..., x='group', y='value')`
- `sns.violinplot(data=..., x='group', y='value', inner='quartile')`

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(9, 3.5))
sns.boxplot(data=df_students, x="gender", y="math", ax=axes[0])
axes[0].set_title("Math by Gender â€” Boxplot")

sns.violinplot(data=df_students, x="gender", y="math", inner="quartile", ax=axes[1])
axes[1].set_title("Math by Gender â€” Violin")

plt.tight_layout()
plt.show()

### âœ… YOUR TURN 3 â€” Boxplot for Biology
Draw a **boxplot** of biology scores by gender using `df_students`.

In [None]:
# Your code here

In [None]:
# Solution
ax = sns.boxplot(data=df_students, x="gender", y="bio")
ax.set_title("Biology by Gender â€” Boxplot")
plt.show()

---
## 5) Scatterplot â€” relationship between two numbers
Add **hue** to color by category (e.g., gender). Helpful for seeing patterns by group.

**Idea:** `sns.scatterplot(data=..., x='X', y='Y', hue='group')`

In [None]:
ax = sns.scatterplot(data=df_students, x="study_hours", y="math", hue="gender")
ax.set_title("Study Hours vs Math (colored by Gender)")
ax.set_xlabel("Study Hours per Day")
ax.set_ylabel("Math Score")
plt.grid(True)
plt.show()

### âœ… YOUR TURN 4 â€” Change variables
Make a scatterplot of **study_hours vs biology** colored by gender. Give it a title and axis labels.

In [None]:
# Your code here

In [None]:
# Solution
ax = sns.scatterplot(data=df_students, x="study_hours", y="bio", hue="gender")
ax.set_title("Study Hours vs Biology (colored by Gender)")
ax.set_xlabel("Study Hours per Day")
ax.set_ylabel("Biology Score")
plt.grid(True)
plt.show()

---
## 6) Lineplot â€” simple trend over an order
Use **lineplot** for ordered categories (like days). You can add markers for clarity.

**Idea:** `sns.lineplot(data=..., x='x', y='y', marker='o')`

In [None]:
ax = sns.lineplot(data=df_daily, x="day", y="steps", marker="o")
ax.set_title("Steps by Day")
ax.set_xlabel("Day")
ax.set_ylabel("Steps")
plt.grid(True)
plt.show()

---
## 7) Distributions â€” histplot and KDE
Use **histplot** for a histogram; add `kde=True` for a smooth curve. Or use **kdeplot** directly.

**Ideas:**
- `sns.histplot(data=..., x='value', bins=..., kde=True)`
- `sns.kdeplot(data=..., x='value', fill=True)`

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(9, 3.5))
sns.histplot(data=df_heights, x="height_cm", bins=6, kde=True, ax=axes[0])
axes[0].set_title("Height â€” Histogram + KDE")

sns.kdeplot(data=df_heights, x="height_cm", fill=True, ax=axes[1])
axes[1].set_title("Height â€” KDE (filled)")

plt.tight_layout()
plt.show()

### âœ… YOUR TURN 5 â€” Tweak the histogram
Change the number of `bins` (try 4, 8, 10) and observe how the shape looks different.

In [None]:
# Your code here

In [None]:
# Solution
ax = sns.histplot(data=df_heights, x="height_cm", bins=8, kde=True)
ax.set_title("Height â€” Histogram (bins=8) + KDE")
plt.show()

---
## 8) Quick styling tips in Seaborn
- Change overall look: `sns.set_theme(style='whitegrid', palette='pastel')`
- Change palette on a single plot: `sns.barplot(..., palette='Set2')`
- Add labels/titles with Matplotlib: `ax.set_title(...)`, `ax.set_xlabel(...)`, etc.

In [None]:
sns.set_theme(style="whitegrid", palette="Set2")
ax = sns.barplot(data=df_snacks, x="snack", y="count")
ax.set_title("Snack Counts â€” Styled")
plt.xticks(rotation=15)
plt.show()

# Restore default theme for consistency later
sns.set_theme(context="notebook", style="whitegrid")

---
## 9) Save a Seaborn figure
You can save a plot with `plt.savefig('filename.png', dpi=300)` **before** `plt.show()`.

In [None]:
ax = sns.lineplot(data=df_daily, x="day", y="steps", marker="o")
ax.set_title("Steps by Day (Saved)")
plt.grid(True)
plt.savefig("steps_by_day_seaborn.png", dpi=300)
plt.show()
print("Saved file: steps_by_day_seaborn.png")

---
## ðŸŽ‰ Wrap-up
You learned how to build Seaborn charts from tiny in-notebook data:
- **countplot**, **barplot**, **box/violin**, **scatter (hue)**, **lineplot**, **histplot/KDE**
- Basic styling and saving figures

**Next steps:**
- Adjust palettes (`palette='muted'`, `'Set1'`, `'coolwarm'`)
- Add `hue` to more plots to compare groups
- Try `sns.pairplot(df_students, hue='gender')` on numeric columns for a quick overview