In [3]:
import pandas as pd
import seaborn as sns

titanic = sns.load_dataset("titanic")
iris = sns.load_dataset("iris")

titanic.head()


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [4]:
# Preview your ingredients
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [5]:
titanic_summary = titanic.groupby(["sex", "pclass"])["survived"].mean().unstack()
titanic_summary

pclass,1,2,3
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


## Survival Analysis by Sex and Passenger Class

This summary table reveals strong interaction effects between gender, socioeconomic class (as proxied by ticket class), and survival rates on the Titanic.

| Class | Female Survival Rate | Male Survival Rate |
|-------|----------------------|--------------------|
| 1st   | ~96.8%               | ~36.9%             |
| 2nd   | ~92.1%               | ~15.7%             |
| 3rd   | 50.0%                | ~13.5%             |

### Key Observations:

- **Gender Disparity**: Survival rates were significantly higher for females in all classes, aligning with historical reports of a “women and children first” policy.
- **Class Effect**: First-class passengers had the highest survival rates for both genders. Notably, the survival advantage diminishes in second class and nearly vanishes in third.
- **Compounded Risk**: Third-class males had the **lowest survival probability (13.5%)**, suggesting structural disadvantages such as physical cabin location, delayed evacuation access, or lower social priority.

### Implications:

This pattern illustrates how multiple demographic variables (e.g., gender and class) can interact multiplicatively to influence survival likelihood — a classic example of **intersectional risk** in human-centered data. It also highlights the importance of multi-dimensional grouping in exploratory data analysis (EDA).

In [6]:
iris_summary = iris.groupby("species")["petal_width"].mean()
iris_summary

species
setosa        0.246
versicolor    1.326
virginica     2.026
Name: petal_width, dtype: float64

## Iris Petal Width Analysis by Species

This summary presents the **mean petal width** for each of the three Iris species in the dataset:

| Species     | Avg. Petal Width (cm) |
|-------------|------------------------|
| Setosa      | 0.246                  |
| Versicolor  | 1.326                  |
| Virginica   | 2.026                  |

### Key Observations:

- **Setosa** has **notably narrower petals** than the other two species, suggesting clear morphological differentiation.
- **Virginica**, on the other hand, shows the **widest petals on average**, which may contribute to its separation in clustering and classification models.
- The progression from Setosa → Versicolor → Virginica aligns with a **natural gradient in petal size**, useful for species classification using simple statistical or machine learning models.

### Analytical Note:

The substantial variance in petal width across species highlights it as a **highly discriminative feature**. In practice, this kind of group-by analysis forms the backbone of **feature exploration** and **dimensionality assessment** in early-stage data science workflows.

In [10]:
titanic_age_fare_summary = titanic.groupby("sex")[["age", "fare"]].agg(["mean", "median", "count"])\
    .style.background_gradient(cmap="Blues")
titanic_age_fare_summary

Unnamed: 0_level_0,age,age,age,fare,fare,fare
Unnamed: 0_level_1,mean,median,count,mean,median,count
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,27.915709,27.0,261,44.479818,23.0,314
male,30.726645,29.0,453,25.523893,10.5,577


## Summary Statistics by Passenger Sex

This multi-metric aggregation compares key demographic and ticketing information by gender:

| Metric            | Female           | Male             |
|-------------------|------------------|------------------|
| Age (mean)        | 27.92            | 30.73            |
| Age (median)      | 27.00            | 29.00            |
| Age (count)       | 261              | 453              |
| Fare (mean)       | 44.48            | 25.52            |
| Fare (median)     | 23.00            | 10.50            |
| Fare (count)      | 314              | 577              |

### Observations:

- **Age Distributions**:
  - Female and male passengers had **similar median ages** (~27–29), though males were slightly older on average.
  - The slightly higher mean age for males may be influenced by outliers (e.g., older high-fare male passengers).

- **Fare Disparities**:
  - **Females paid significantly higher fares** on average (~75% more than males).
  - The median fares reflect this as well — suggesting the pattern isn’t just driven by outliers, but a general skew toward females being booked into **higher fare classes**.
  - This aligns with previous survival analysis where **females were overrepresented in first and second class**, especially among survivors.

- **Passenger Counts**:
  - More male passengers overall, but more missing values in their fare and age fields (as seen from the difference in counts).

### Takeaway:

This grouped summary strengthens earlier evidence that **gender correlated strongly with both class and cost of travel** aboard the Titanic. In a modeling context, these features might act as important predictors — but likely also introduce **confounding effects** due to social prioritization (e.g., women more likely to be placed in higher classes and survive).

In [None]:
# Group by sex and class, then summarize survival
titanic.groupby(["sex", "pclass"])["survived"].agg(["mean", "count"])

# Transform to normalize age within each group
titanic["age_zscore"] = titanic.groupby("pclass")["age"].transform(
    lambda x: (x - x.mean()) / x.std() )

0     -0.251342
1     -0.015770
2      0.068776
3     -0.218434
4      0.789041
         ...   
886   -0.205529
887   -1.299306
888         NaN
889   -0.826424
890    0.548953
Name: age_zscore, Length: 891, dtype: float64