# Pandas Coding Questions on Seaborn's Titanic Dataset

### Question 1: Import the pandas library and load the Titanic dataset from Seaborn into a DataFrame. Display the first 5 rows of the DataFrame.

### Question 2: How many passengers are there in each class?

### Question 3: What is the average age of passengers who survived?

### Question 4: Display the survival count of passengers based on gender and class.

### Question 5: What is the average fare paid by passengers in each class?

### Question 6: Create a new column 'child' that indicates whether the passenger is a child (age < 18).

### Question 7: What is the survival rate for children versus adults?

### Question 8: How many passengers embarked from each port?

### Question 9: Find the number of missing values in each column.

### Question 10: Display the age distribution of passengers.

### Question 11: Calculate the survival rate based on the fare quartiles.

### Question 12: Does the cabin type affect the survival rate? Assume the first letter of the cabin indicates the cabin type.

### Question 13: Find the average age of passengers, grouped by survival and class.

### Question 14: Calculate the number of surviving and non-surviving passengers, grouped by gender and class.

### Question 15: Identify the most common embarkation port among the survivors.

### Question 16: Determine the proportion of passengers by gender.

### Question 17: Calculate the average fare and age, grouped by survival and gender.

### Question 18: Identify the passenger with the highest fare.

### Question 19: Calculate the total number of family members onboard for each passenger and create a new column 'family_size'.

### Question 20: Find the survival rate of passengers based on the number of family members onboard.

In [1]:
import pandas as pd
import seaborn as sns

# Load the Titanic dataset
titanic = sns.load_dataset('titanic')
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [2]:
# Question 2: Passengers in each class
titanic['class'].value_counts()

Third     491
First     216
Second    184
Name: class, dtype: int64

In [3]:
# Question 3: Average age of passengers who survived
titanic[titanic['survived'] == 1]['age'].mean()

28.343689655172415

In [4]:
# Question 4: Survival count based on gender and class
titanic.groupby(['sex', 'class'])['survived'].sum()

sex     class 
female  First     91
        Second    70
        Third     72
male    First     45
        Second    17
        Third     47
Name: survived, dtype: int64

In [5]:
# Question 5: Average fare paid by passengers in each class
titanic.groupby('class')['fare'].mean()

class
First     84.154687
Second    20.662183
Third     13.675550
Name: fare, dtype: float64

In [6]:
# Question 6: Create a new column 'child' indicating if the passenger is a child
titanic['child'] = titanic['age'] < 18

In [7]:
# Question 7: Survival rate for children vs adults
titanic.groupby('child')['survived'].mean()

child
False    0.361183
True     0.539823
Name: survived, dtype: float64

In [8]:
# Question 8: Passengers embarked from each port
titanic['embark_town'].value_counts()

Southampton    644
Cherbourg      168
Queenstown      77
Name: embark_town, dtype: int64

In [9]:
# Question 9: Number of missing values in each column
titanic.isnull().sum()

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
child            0
dtype: int64

In [10]:
# Question 10: Age distribution of passengers
titanic['age'].describe()

count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: age, dtype: float64

In [11]:
# Question 11: Survival rate based on fare quartiles
titanic['fare_quartile'] = pd.qcut(titanic['fare'], 4)
titanic.groupby('fare_quartile')['survived'].mean()

fare_quartile
(-0.001, 7.91]     0.197309
(7.91, 14.454]     0.303571
(14.454, 31.0]     0.454955
(31.0, 512.329]    0.581081
Name: survived, dtype: float64

In [12]:
# Question 12: Effect of cabin type on survival rate
titanic['cabin_type'] = titanic['cabin'].str[0]
titanic.groupby('cabin_type')['survived'].mean()

KeyError: ignored

In [13]:
# Question 13: Average age of passengers, grouped by survival and class
titanic.groupby(['survived', 'class'])['age'].mean()

survived  class 
0         First     43.695312
          Second    33.544444
          Third     26.555556
1         First     35.368197
          Second    25.901566
          Third     20.646118
Name: age, dtype: float64

In [14]:
# Question 14: Number of surviving and non-surviving passengers, grouped by gender and class
titanic.groupby(['sex', 'class', 'survived']).size()

sex     class   survived
female  First   0             3
                1            91
        Second  0             6
                1            70
        Third   0            72
                1            72
male    First   0            77
                1            45
        Second  0            91
                1            17
        Third   0           300
                1            47
dtype: int64

In [15]:
# Question 15: Most common embarkation port among survivors
titanic[titanic['survived'] == 1]['embark_town'].mode()[0]

'Southampton'

In [16]:
# Question 16: Proportion of passengers by gender
titanic['sex'].value_counts(normalize=True)

male      0.647587
female    0.352413
Name: sex, dtype: float64

In [17]:
# Question 17: Average fare and age, grouped by survival and gender
titanic.groupby(['survived', 'sex']).agg({'fare': 'mean', 'age': 'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,fare,age
survived,sex,Unnamed: 2_level_1,Unnamed: 3_level_1
0,female,23.024385,25.046875
0,male,21.960993,31.618056
1,female,51.938573,28.847716
1,male,40.821484,27.276022


In [18]:
# Question 18: Passenger with the highest fare
titanic[titanic['fare'] == titanic['fare'].max()]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,child,fare_quartile
258,1,1,female,35.0,0,0,512.3292,C,First,woman,False,,Cherbourg,yes,True,False,"(31.0, 512.329]"
679,1,1,male,36.0,0,1,512.3292,C,First,man,True,B,Cherbourg,yes,False,False,"(31.0, 512.329]"
737,1,1,male,35.0,0,0,512.3292,C,First,man,True,B,Cherbourg,yes,True,False,"(31.0, 512.329]"


In [19]:
# Question 19: Total number of family members onboard for each passenger
titanic['family_size'] = titanic['sibsp'] + titanic['parch']
titanic['family_size'].head()

0    1
1    1
2    0
3    1
4    0
Name: family_size, dtype: int64

In [20]:
# Question 20: Survival rate of passengers with family vs. alone
titanic['alone'] = titanic['family_size'] == 0
titanic.groupby('alone')['survived'].mean()

alone
False    0.505650
True     0.303538
Name: survived, dtype: float64