# Python Programming - In-Class Assignment


Step 0: Import necessary libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load the 'tips' dataset from seaborn
titanic_data = sns.load_dataset('titanic')

titanic_data.head()

Task 1: Calculate overall survival rate

In [None]:
# Calculate overall survival rate
overall_survival_rate = titanic_data['survived'].mean() * 100
print(f'Overall Survival Rate: {overall_survival_rate:.2f}%')

Task 2: Calculate and print the survival rates by gender ('sex'), passenger class ('class'), and embarkation point ('embarked').



In [None]:
# Calculate survival rates by gender
survival_rate_by_gender = titanic_data.groupby('sex')['survived'].mean() * 100
print('Survival Rate by Gender:')
print(survival_rate_by_gender)

survival_rate_by_gender.plot(kind='bar')

In [None]:
# Calculate survival rates by passenger class
survival_rate_by_class = titanic_data.groupby('class')['survived'].mean() * 100
print('Survival Rate by Passenger Class:')
print(survival_rate_by_class)

survival_rate_by_class.plot(kind='bar', color='green')

In [None]:
# Calculate survival rates by embarkation point
survival_rate_by_embarked = titanic_data.groupby('embarked')['survived'].mean() * 100
print('Survival Rate by Embarkation Point:')
print(survival_rate_by_embarked)

survival_rate_by_embarked.plot(kind='bar', color='red')

Task 3: Calculate the survival rate by age group (e.g., under 12 as child, 13-20 as teen, 21-40 as adult, 41-60 as middle-aged, above 61 as senior).

In [None]:
# Manipulate age groups
bins = [0, 12, 20, 40, 60, np.inf]
labels = ['Child', 'Teen', 'Adult', 'Middle-Aged', 'Senior']
titanic_data['age_group'] = pd.cut(titanic_data['age'], bins=bins, labels=labels)

# Calculate survival rates by age group
survival_rate_by_age_group = titanic_data.groupby('age_group')['survived'].mean() * 100
print('Survival Rate by Age Group:')
print(survival_rate_by_age_group)

survival_rate_by_age_group.plot(kind='bar', color='orange')


Task 4: Create a box plot for age distribution by survival.

In [None]:
# Create a box plot for age distribution by survival
plt.figure(figsize=(10, 6))
sns.boxplot(x='survived', y='age', data=titanic_data)
plt.title('Age Distribution of Survivors vs. Non-Survivors')
plt.xlabel('Survived')
plt.ylabel('Age')
plt.xticks([0, 1], ['Did Not Survive', 'Survived'])  # Rename x-ticks for clarity
plt.show()

Task 5: Create a stacked bar chart to visualize the distribution of Titanic passengers by class, using different colors to represent survival status within each class.

In [None]:
import matplotlib.pyplot as plt

# Visualize the distribution of passengers based on class and sex using bar charts
class_sex_dist = titanic_data.groupby(['class', 'survived']).size().unstack()

# Plot the bar chart
class_sex_dist.plot(kind='bar', stacked=True)
plt.title('Distribution of Passengers by Class and Survival')
plt.ylabel('Number of Passengers')
plt.xlabel('Class')
plt.xticks(rotation=0)
plt.show()

**Open-Ended Questions:** From our analysis, some more in-depth questions can be raised. Please discuss and attempt to answer them by using Python to perform data analysis and visualizations to support your claims. Keep in mind that there are no definitive answers to these questions.

Question 1: Given, from task 2, the survival rates by embarkation point are as follows:

| Embarkation Point | Survival Rate (%) |
|-------------------|--------------------|
| C                 | 55.36              |
| Q                 | 38.96              |
| S                 | 33.70              |

does this indicate that, if you will be on the Titanic, better to select Port C as your embarkation point for increasing the chances of survival?

No, because there can be correlation but not casuation from looking at the data alone. There are more factors involved including gender, strength, fitness, beyond just which point of embarkment that can correlate with survival rate. 

In [None]:
# Optional: if you have code to support your Q1 discussion, please add it here.

Question 2: Given, from task 3, the survival rates by age group are as follows:

| Age Group | Survival Rate (%) |
|-----------|--------------------|
| Child     | 57.97              |
| Teen      | 38.18              |
| Adult     | 39.74              |
| Middle-Aged| 39.06              |
| Senior    | 22.73              |

Does this indicate that senior passengers were less taken care of?



No, because there can be correlation but not casuation from looking at the data alone. The seniors could have died from natural causes (weaker, less fit), compared to children who also may have been prioritized when leaving the Titanic ("Women and Children first!"). 

In [1]:
# Optional: if you have code to support your Q2 discussion, please add it here.
