

```
# Jadyn Dangerfield
# Assignment: 02 Summary Statistics
```



# 📈 Summary Statistics

Summary statistics help us describe a dataset with just a few numbers.
Instead of staring at thousands of rows, we can quickly understand **center, spread, and shape**.

This notebook covers:
- Measures of central tendency (mean, median, mode)
- Measures of spread (range, variance, standard deviation, IQR)
- Skewness and kurtosis (just enough to impress your friends)
- Quick summaries with Pandas


## 1. Central Tendency

These describe the 'middle' of the data:
- **Mean** – average
- **Median** – middle value
- **Mode** – most frequent value

In [9]:
import numpy as np
import pandas as pd
from scipy import stats

data = [5, 7, 8, 5, 10, 12, 7, 7, 6, 9, 100]

print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data, keepdims=True)[0][0])

Mean: 16.0
Median: 7.0
Mode: 7


👉 **Question:** Which is more robust to outliers, the mean or the median?

**the median**

## 2. Spread

Spread tells us how variable the data are.
- **Range**: max – min
- **Variance**: average squared deviation from the mean
- **Standard Deviation**: square root of variance (in original units)
- **Interquartile Range (IQR)**: middle 50% (Q3 – Q1)

In [10]:
print("Range:", np.max(data) - np.min(data))
print("Variance:", np.var(data, ddof=1))
print("Standard Deviation:", np.std(data, ddof=1))
print("IQR:", stats.iqr(data))

Range: 95
Variance: 780.6
Standard Deviation: 27.939219745726614
IQR: 3.0


👉 **Exercise:** Add an extreme outlier (e.g., 100) to the dataset and see how the mean, median, and standard deviation change.

**The mean and the standard deviation increased, and the median remained the same.**

## 3. Shape: Skewness & Kurtosis

- **Skewness**: measures asymmetry (left/right tail)
- **Kurtosis**: measures 'peakedness' or heavy tails

Most real-life datasets are *not* perfectly normal, so these help describe the difference.

In [11]:
print("Skewness:", stats.skew(data))
print("Kurtosis:", stats.kurtosis(data))

Skewness: 2.8167193409568494
Kurtosis: 5.999883906484753


👉 **Note:** High kurtosis means more extreme outliers; low kurtosis means flat/boring data.

## 4. Quick Summaries with Pandas

Instead of writing 10 functions, Pandas does it for you with `.describe()`.

In [12]:
df = pd.DataFrame({"Values": data})
df.describe()

Unnamed: 0,Values
count,11.0
mean,16.0
std,27.93922
min,5.0
25%,6.5
50%,7.0
75%,9.5
max,100.0


👉 **Task:** Use `.describe()` on another dataset (e.g., `penguins` from Seaborn or your own CSV).

In [14]:
penguins = pd.read_csv("penguins.csv")

penguins_df = pd.DataFrame(penguins)
penguins_df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


---
✅ That’s it for summary stats! Next up → [Matplotlib Basics](03-Matplotlib_Basics.ipynb)