# Titanic Data Analysis

In this analysis, we explore the Titanic dataset to answer various questions regarding the survival of passengers. By examining various features like gender, age, class, and embarkation point, we aim to uncover key patterns in the data.

The specific questions we will address are:

1. What is the overall survival rate of passengers?
2. How does survival rate vary by passenger class?
3. What is the survival rate by gender?
4. How does age influence the survival rate?
5. What is the impact of embarking at different locations on survival?
6. How does the number of family members onboard (SibSp + Parch) influence survival?


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load Titanic dataset
t = pd.read_csv('titanic.csv')

# Display the first few rows of the dataset
t.head()


## Data Exploration and Cleaning

Before diving into the analysis, we should inspect the data to check for missing values and understand the structure of the dataset. We'll also clean the data as needed.


In [None]:
# Check for missing values
t.isnull().sum()

# Handle missing values for 'Age' and 'Embarked'
t['Age'].fillna(t['Age'].mean(), inplace=True)
t['Embarked'].fillna(t['Embarked'].mode()[0], inplace=True)

# Verify if there are any missing values left
t.isnull().sum()


## Question 1: What is the overall survival rate of passengers?

Let's calculate the overall survival rate of passengers in the Titanic dataset.


In [None]:
# Calculate overall survival rate
survival_rate = t['Survived'].mean()
print(f"Overall survival rate: {survival_rate:.2f}")


## Question 2: How does survival rate vary by passenger class?

We will now examine how the survival rate changes by passenger class (Pclass).


In [None]:
# Survival rate by passenger class
plt.figure(figsize=(8, 6))
sns.barplot(x='Pclass', y='Survived', data=t)
plt.title('Survival Rate by Passenger Class')
plt.show()



## Question 3: What is the survival rate by gender?

We will analyze how survival rates differ between male and female passengers.


In [None]:
# Survival rate by gender
plt.figure(figsize=(8, 6))
sns.barplot(x='Sex', y='Survived', data=t)
plt.title('Survival Rate by Gender')
plt.show()



## Question 4: How does age influence the survival rate?

Let's investigate the relationship between age and survival. We'll plot survival rates across different age groups.



In [None]:
# Create age bins
age_bins = [0, 12, 18, 30, 40, 50, 60, 100]
age_labels = ['0-12', '13-18', '19-30', '31-40', '41-50', '51-60', '60+']
t['AgeGroup'] = pd.cut(t['Age'], bins=age_bins, labels=age_labels)

# Survival rate by age group
plt.figure(figsize=(8, 6))
sns.barplot(x='AgeGroup', y='Survived', data=t)
plt.title('Survival Rate by Age Group')
plt.show()


## Question 5: What is the impact of embarking at different locations on survival?

Let's analyze the survival rate based on the embarkation location (Embarked).


In [None]:
# Survival rate by embarkation point
plt.figure(figsize=(8, 6))
sns.barplot(x='Embarked', y='Survived', data=t)
plt.title('Survival Rate by Embarkation Location')
plt.show()


## Question 6: How does the number of family members onboard (SibSp + Parch) influence survival?

In this analysis, we’ll examine the impact of the number of family members on survival.


In [None]:
# Create a new column for family size
t['FamilySize'] = t['SibSp'] + t['Parch']

# Survival rate by family size
plt.figure(figsize=(8, 6))
sns.barplot(x='FamilySize', y='Survived', data=t)
plt.title('Survival Rate by Family Size')
plt.show()


# Conclusion

Based on our analysis, we observed that:

- **Overall Survival Rate**: Around 38% of passengers survived.
- **Passenger Class**: First-class passengers had the highest survival rate.
- **Gender**: Women had a significantly higher survival rate than men.
- **Age**: Children had the highest survival rate, while adults in the 30-40 age group had the lowest.
- **Embarkation Location**: Passengers embarking from Cherbourg had the highest survival rate.
- **Family Size**: Passengers with a small family size (1-2 members) had a slightly higher survival rate.

These findings provide valuable insights into the factors that contributed to survival during the Titanic disaster.
