# 🚢 Pandas High-Level Exercises – Titanic Dataset
These exercises require advanced manipulation using the Titanic dataset.

Pandas High-Level Exercises – Titanic Dataset
Note: Start by loading the dataset:

python
Copy
Edit
import pandas as pd
df = pd.read_csv('train.csv')
1. Title Extraction and Analysis
Task: Extract the title (e.g., Mr., Miss, Master) from the Name column and determine the survival rate for each title.

Hint: Use string methods and .groupby().

2. Cabin Grouping
Task:
Group passengers based on the first letter of their Cabin (e.g., A, B, C...) and compute:

Total passengers

Survival rate per cabin group

Note: Handle missing cabins by labeling them as "Unknown".

3. Family Size Impact
Task:
Create a new column FamilySize = SibSp + Parch + 1 and categorize family size as:

'Single' (1)

'Small' (2–4)

'Large' (5+)

Then compute survival rates for each category.

4. Age Binning and Survival Analysis
Task:
Create age bins: [0–10], [11–20], ..., [71+] and use .cut() to assign age groups. Plot the survival rate by age group.

5. Cross Tab: Class vs. Sex vs. Survival
Task:
Create a pivot table showing survival counts and rates grouped by both Pclass and Sex.

6. Impute Missing Age Using Median per Title
Task:
Use the Title extracted in Exercise 1 to impute missing Age values with the median age of each title group.

7. Correlation Heatmap of Encoded Features
Task:
Encode Sex, Embarked, and Title using .map() or pd.get_dummies(), and compute a correlation matrix to see which features most strongly correlate with Survived.

8. Identify Duplicate Ticket Holders
Task:
Group passengers by Ticket number and identify tickets shared by multiple passengers. For these, compute the survival rate of all passengers sharing the same ticket.

9. Analyze Fare per Person
Task:
Calculate a new column FarePerPerson = Fare / FamilySize. Analyze the correlation between FarePerPerson and Survived.

10. Feature Interaction: Class and Fare
Task:
Create a new categorical feature that combines Pclass and fare quantiles (e.g., '1_HighFare', '2_LowFare', etc.). Analyze survival across these new categories.



`


In [5]:
import pandas as pd
df = pd.read_csv('titanic.csv')

In [6]:
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


## Exercise 1: Title Extraction and Survival Analysis

In [7]:
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.', expand=False)
df.groupby('Title')['Survived'].mean().sort_values(ascending=False)

Unnamed: 0_level_0,Survived
Title,Unnamed: 1_level_1
Countess,1.0
Ms,1.0
Lady,1.0
Mme,1.0
Mlle,1.0
Sir,1.0
Mrs,0.792
Miss,0.697802
Master,0.575
Major,0.5


## Exercise 2: Cabin Grouping

In [8]:
df['CabinGroup'] = df['Cabin'].fillna('Unknown').str[0]
df.groupby('CabinGroup')['Survived'].agg(['count', 'mean'])

Unnamed: 0_level_0,count,mean
CabinGroup,Unnamed: 1_level_1,Unnamed: 2_level_1
A,15,0.466667
B,47,0.744681
C,59,0.59322
D,33,0.757576
E,32,0.75
F,13,0.615385
G,4,0.5
T,1,0.0
U,687,0.299854


## Exercise 3: Family Size Impact

In [9]:
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
df['FamilyCategory'] = pd.cut(df['FamilySize'], bins=[0,1,4,11], labels=['Single','Small','Large'])
df.groupby('FamilyCategory')['Survived'].mean()

  df.groupby('FamilyCategory')['Survived'].mean()


Unnamed: 0_level_0,Survived
FamilyCategory,Unnamed: 1_level_1
Single,0.303538
Small,0.578767
Large,0.16129


## Exercise 4: Age Binning and Survival Analysis

In [10]:
age_bins = [0, 10, 20, 30, 40, 50, 60, 70, 100]
df['AgeGroup'] = pd.cut(df['Age'], bins=age_bins)
df.groupby('AgeGroup')['Survived'].mean()

  df.groupby('AgeGroup')['Survived'].mean()


Unnamed: 0_level_0,Survived
AgeGroup,Unnamed: 1_level_1
"(0, 10]",0.59375
"(10, 20]",0.382609
"(20, 30]",0.365217
"(30, 40]",0.445161
"(40, 50]",0.383721
"(50, 60]",0.404762
"(60, 70]",0.235294
"(70, 100]",0.2


## Exercise 5: Cross Tab - Class vs. Sex vs. Survival

In [11]:
pd.pivot_table(df, values='Survived', index='Pclass', columns='Sex', aggfunc=['count','mean'])

Unnamed: 0_level_0,count,count,mean,mean
Sex,female,male,female,male
Pclass,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1,94,122,0.968085,0.368852
2,76,108,0.921053,0.157407
3,144,347,0.5,0.135447


## Exercise 6: Impute Missing Age by Title Median

In [12]:
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.', expand=False)
df['Age'] = df.groupby('Title')['Age'].transform(lambda x: x.fillna(x.median()))
df['Age'].isna().sum()

np.int64(0)

## Exercise 7: Correlation Heatmap of Encoded Features

In [14]:
df_encoded = df.copy()
df_encoded['Sex'] = df_encoded['Sex'].map({'male':0, 'female':1})
df_encoded['Embarked'] = df_encoded['Embarked'].map({'S':0, 'C':1, 'Q':2})
df_encoded['Title'] = df_encoded['Title'].astype('category').cat.codes
df_encoded.corr(numeric_only=True)['Survived'].sort_values(ascending=False)

Unnamed: 0,Survived
Survived,1.0
Sex,0.543351
Fare,0.257307
Embarked,0.108669
Parch,0.081629
FamilySize,0.016639
PassengerId,-0.005007
SibSp,-0.035322
Age,-0.078816
Title,-0.201345


## Exercise 8: Duplicate Ticket Holders

In [15]:
ticket_groups = df.groupby('Ticket').filter(lambda x: len(x) > 1)
ticket_groups.groupby('Ticket')['Survived'].mean().head()

Unnamed: 0_level_0,Survived
Ticket,Unnamed: 1_level_1
110152,1.0
110413,0.666667
110465,0.0
111361,1.0
113505,1.0


## Exercise 9: Fare Per Person

In [None]:
df['FarePerPerson'] = df['Fare'] / df['FamilySize']
df[['FarePerPerson', 'Survived']].corr()

## Exercise 10: Feature Interaction - Class and Fare

In [None]:
df['FareBin'] = pd.qcut(df['Fare'], 4, labels=['Low', 'Mid', 'High', 'VeryHigh'])
df['ClassFare'] = df['Pclass'].astype(str) + '_' + df['FareBin'].astype(str)
df.groupby('ClassFare')['Survived'].mean().sort_values(ascending=False)