# <h1><center>**Titanic Analysis**</center></h1>
<h1><center>*************** Data exploration Checkpoint ***************</center></h1>

Data set link: [Titanic Data Set](https://drive.google.com/file/d/1YdbRKJZ0Kz742yDxIStLZIPIEUGlc1Cc/view?usp=sharing)

Data set information:

The titanic and titanic2 data frames describe the survival status of individual passengers
on the Titanic. The titanic data frame does not contain information from the crew, but it
does contain actual ages of half of the passengers. The principal source for data about
Titanic passengers is the Encyclopedia Titanica. The datasets used here were begun by
a variety of researchers. One of the original sources is Eaton & Haas (1994) Titanic:
Triumph and Tragedy, Patrick Stephens Ltd, which includes a passenger list created by
many researchers and edited by Michael A. Findlay.

More info in this  [link](http://campus.lakeforest.edu/frank/FILES/MLFfiles/Bio150/Titanic/TitanicMETA.pdf)

<img src = "https://mcdn.wallpapersafari.com/medium/9/99/g7mtvV.jpg" class="Center"  width="900" height="420">

**Columns description:**

  Pclass Passenger Class: (1 = 1st; 2 = 2nd; 3 = 3rd)

  survival: Survival (0 = No; 1 = Yes)

  name: Name

  Sex: Sex

  age: Age

  sibsp: Number of Siblings/Spouses Aboard

  parch: Number of Parents/Children Aboard

  ticket: Ticket Number

  fare: Passenger Fare (British pound)

  cabin: Cabin

  embarked: Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

  

The analysis will be structured as follow :
  1. Data Ingestion
  2. Descriptive Statistics and Initial Data Exploration
  3. Data Cleaning and Preprocessing
  4. Data Visualization and Analysis

In [None]:
import pandas as pd
import plotly.express as px
from sklearn.preprocessing import LabelEncoder

#Data Ingestion

In [None]:
data = pd.read_csv('/titanic-passengers.csv', sep=';')
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,343,No,2,"Collander, Mr. Erik Gustaf",male,28.0,0,0,248740,13.0,,S
1,76,No,3,"Moen, Mr. Sigurd Hansen",male,25.0,0,0,348123,7.65,F G73,S
2,641,No,3,"Jensen, Mr. Hans Peder",male,20.0,0,0,350050,7.8542,,S
3,568,No,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29.0,0,4,349909,21.075,,S
4,672,No,1,"Davidson, Mr. Thornton",male,31.0,1,0,F.C. 12750,52.0,B71,S


#Descriptive Statistics and Initial Data Exploration

In [None]:
data.describe()

Unnamed: 0,PassengerId,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,2.0,20.125,0.0,0.0,7.9104
50%,446.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,3.0,38.0,1.0,0.0,31.0
max,891.0,3.0,80.0,8.0,6.0,512.3292


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    object 
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(4), object(6)
memory usage: 83.7+ KB


In [None]:
data.isnull().mean()

PassengerId    0.000000
Survived       0.000000
Pclass         0.000000
Name           0.000000
Sex            0.000000
Age            0.198653
SibSp          0.000000
Parch          0.000000
Ticket         0.000000
Fare           0.000000
Cabin          0.771044
Embarked       0.002245
dtype: float64

#Data Cleaning and Preprocessing

##Handling the missing values

In [None]:
# Create a histogram plot using Plotly
fig = px.histogram(data, x='Age', nbins=10, title='Age Distribution on Titanic')

# Show the plot
fig.show()

In [None]:
data['Embarked'].mode()

0    S
Name: Embarked, dtype: object

In [None]:
data['Age'].fillna(data['Age'].mean(), inplace=True)
most_frequent_value = data['Embarked'].mode()[0]
data['Embarked'].fillna(most_frequent_value, inplace=True)
data.drop(columns= ['Cabin'], inplace= True)

##Feature engineering

In [None]:
encoder = LabelEncoder()
data['Survived'] = encoder.fit_transform(data['Survived'])

In [None]:
data['AgeGroup'] = pd.cut(data.Age, [0, 18, 30, 50, 80])
survival_age = data.groupby(["Sex", 'AgeGroup'])["Survived"].mean().reset_index()

survival_age['AgeGroup'] = survival_age['AgeGroup'].astype(str)

#Data Visualization and Analysis

##Univariate analysis

In [None]:
survived_people = len(data[data['Survived']==1])
dead_people = len(data[data['Survived']==0])
fig = px.pie(names=['Survived', 'Deaths'], values=[survived_people, dead_people], title="Survivors and Deaths")
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(uniformtext_minsize=16, uniformtext_mode='hide')
fig.show()
#Chart showing distribution of survivors and deaths

This pie chart portrays the destiny of Titanic passengers. 61.6% did not survive the tragedy, while 38.4% managed to survive, highlighting the importance of safety measures and preparedness in critical situations.

In [None]:
fig1 = px.histogram(data, x='Pclass', nbins =3, color="Pclass", text_auto=True)
fig1.update_layout(bargap=0.4)
fig1.show()

#Passenger class distribution

This histogram displays the distribution of Titanic passengers among different classes:

First Class: 216 passengers
Second Class: 184 passengers
Third Class: 491 passengers
The histogram underscores the socioeconomic diversity of passengers and its potential influence on the events of the Titanic disaster.


##Bivariate analysis

In [None]:
Survived_gender= (data.groupby("Sex")["Survived"].mean()*100)

fig2 = px.pie(names=Survived_gender.index, values=Survived_gender.values,
             title='Survival rate based on gender',
             labels={'names': 'Gender', 'values': 'Survival Rate'},
             color_discrete_sequence=['lightpink','lightblue'],
             hover_data=[Survived_gender.values])

fig2.update_traces(textposition='inside', textinfo='percent+label')
fig2.update_layout(uniformtext_minsize=16, uniformtext_mode='hide')
fig2.show()

##survival rate based on gender


This pie chart presents the survival distribution based on gender among Titanic passengers:

Female Survivors: 79.7%
Male Survivors: 20.3%
The chart highlights the notable disparity in survival rates between female and male passengers, shedding light on societal norms and evacuation dynamics during the disaster.

In [None]:
#survival rate for each port departure
survival_port= data.groupby("Embarked")["Survived"].mean()
survival_port

Embarked
C    0.553571
Q    0.389610
S    0.339009
Name: Survived, dtype: float64

In [None]:
port_departures = ['C', 'Q', 'S']
survival_rate = [0.492593, 0.439024, 0.333698]
df = {'Port Departure': port_departures, 'Survival Rate': survival_rate}
df1 = pd.DataFrame(df)
fig5 = px.bar(df1, x='Port Departure', y='Survival Rate', labels={'Survival Rate': 'Mean Survival Rate'},
             title='Survival Rate for Each Port Departure')
fig5.show()

Cherbourg ('C') Departure: Mean Survival Rate of 0.49 \
Queenstown ('Q') Departure: Mean Survival Rate of 0.43 \
Southampton ('S') Departure: Mean Survival Rate of 0.333 \
Implications:

Departure Port Influence: Cherbourg passengers had the highest mean survival rate, followed by Queenstown and Southampton.

Socioeconomic Factors: Socioeconomic disparities or boarding sequences could have contributed to the survival rate differences.

Evacuation Dynamics: Variations in evacuation procedures or boarding practices may have affected survival rates.

##Multivariate analysis

In [None]:
fig3 = px.bar(data, x='Pclass', y='Survived', color='Sex', barmode='group',
             title='Survival rate based on class and gender',
             labels={'Pclass': 'Passenger Class', 'Survived': 'Survival Rate'},
             color_discrete_sequence=['lightblue', 'lightpink'])
fig3.show()
##survival based on passenger class and gender

The chart categorizes survival rates by passenger class and color-codes them by gender:

**First Class:**
Male: 45%
Female: 91%

**Second Class:**
Male: 18%
Female: 70%

**Third Class:**
Male: 48%
Female: 72%

Implications:

Gender Disparity: Survival rates consistently favored females across all classes.\
Class Effect: First-class passengers generally had higher survival rates.\
Complex Interaction: The chart underscores the interplay between gender and class in determining survival outcomes.

In [None]:
fig4 = px.bar(survival_age, x='AgeGroup', y='Survived', color='Sex', barmode='group',
             labels={'AgeGroup': 'Age Group', 'Survived': 'Survival Rate'},
             title='Survival rate based on age and sex distribution',
             color_discrete_sequence=['lightpink','lightblue'])
fig4.show()
##survival rate based on age and sex

The chart categorizes survival rates by age groups and color-codes them by gender:

**Age [0, 18]**:
Female: 67% survival rate,
Male: 38% survival rate

**Age [18, 32]**:
Female: 72% survival rate,
Male: 14% survival rate

**Age [30, 50]**:
Female: 77% survival rate,
Male: 22% survival rate

**Age [50, 80]**:
Female: 94% survival rate,
Male: 12% survival rate

Implications:

*Age-Gender Dynamics*:
The chart reveals significant age and gender disparities in survival rates across different age groups.

*Female Survival Advantage:*
Females consistently had higher survival rates across all age groups, suggesting a prioritization of women in evacuation procedures.

*Age-Based Patterns:*
Survival rates fluctuated among different age groups, with notable trends favoring both younger and older passengers. (FEMALES)

#Conclusion

The analysis of the Titanic dataset reveals a tapestry of intertwined factors that shaped the disaster's outcome. Visualizations depict the social dynamics of class, gender, and age, offering a glimpse into the past and the intricate web of decisions and circumstances that played a role.

Beyond the numbers, these visualizations evoke the societal norms and inequalities of the time. The disparities in survival rates, whether based on class, gender, or age, reflect the complex interplay of privilege and vulnerability.

"The Titanic stands as a haunting reminder of human overconfidence and its devastating aftermath." This quote encapsulates the broader lesson: disasters amplify social divides. It's a call to foster social equity, prioritize safety preparedness, and challenge the norms that perpetuate inequality.

In the modern era, the Titanic's story reverberates, urging us to consider how our own societies handle crises. Just as then, we're reminded to value every life, promote fairness, and ensure access to resources for all, as we navigate the unpredictable waters of our own times.