<font color='darkorange'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Exercises

For these exercises, add your functions to the *apputil\.py* file and *app\.py* file as instructed. *These exercises use the same [Titanic dataset](https://www.kaggle.com/competitions/titanic/data) as the lab.*


## Exercise 1: Survival Patterns


For this exercise you will analyze survival patterns on the Titanic by looking at passenger class, sex, and age group. Name the function `survival_demographics()`.

1. Create a new column in the Titanic dataset that classifies passengers into age categories (i.e., a pandas `category` series). The categories should be:
    - Child (up to 12)
    - Teen (13–19)
    - Adult (20–59)
    - Senior (60+)  
  
	Hint: The `pd.cut()` function might come in handy here.

2. Group the passengers by class, sex, and age group.  

3. For each group, calculate:  
    - The total number of passengers, `n_passengers`
    - The number of survivors, `n_survivors`
    - The survival rate, `survival_rate`

4. Return a table that includes the results for *all* combinations of class, sex, and age group.  

5. Order the results so they are easy to interpret.  

6. Come up with a clear question that your results table makes you curious about (e.g., “Did women in first class have a higher survival rate than men in other classes?”). Write this question in your `app.py` file above the call to your visualization function, using `st.write("Your Question Here")`.
   
7. Create a Plotly visualization in a function named `visualize_demographic()` that directly addresses your question by returning a Plotly figure (e.g., `fig = px. ...`). You are free to choose the chart type that you think best communicates the findings. Be creative — try different approaches, compare them, and ensure that your chart clearly answers the question you posed.


In [15]:
import pandas as pd

# Read in titanic dataset
df = pd.read_csv('https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/titanic.csv')

# Use pd.cut to create age groups based on the Age column
df['Age_Group'] = pd.cut(df["Age"], bins = [0, 12, 20, 60, 120], labels = ["child", "teen", "adult", "senior"])

# Create a column to allow for easy summation of population
df['Count'] = 1

df.head()



Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_Group,Count
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,adult,1
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,adult,1
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,adult,1
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,adult,1
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,adult,1


In [30]:
# Group the passangers by class, sex, and group
df_grouped = df.groupby(['Pclass', 'Sex', 'Age_Group']).agg({
    'Survived': 'sum',
    'Count': 'sum'
}).reset_index()

# Calculate survival percentage
df_grouped['Survival_Percentage'] = ((df_grouped['Survived'] / df_grouped['Count']) * 100).round(2)

df_grouped.head()


  df_grouped = df.groupby(['Pclass', 'Sex', 'Age_Group']).agg({


Unnamed: 0,Pclass,Sex,Age_Group,Survived,Count,Survival_Percentage
0,1,female,child,0,1,0.0
1,1,female,teen,13,13,100.0
2,1,female,adult,67,69,97.1
3,1,female,senior,2,2,100.0
4,1,male,child,3,3,100.0


In [None]:
import streamlit as st
import plotly.express as px

# Use the grouped dataframe for analysis
df = df_grouped

# Create the two categories for visualization analysis.
df['Category'] = "Other"
df.loc[(df['Pclass'] == 1) & (df['Sex'] == 'male'), 'Category'] = '1st Class Male'
df.loc[(df['Pclass'] == 3) & (df['Age_Group'] == 'child'), 'Category'] = '3rd Class Child'

# Filter the Dataframe for the visualization
df_viz = df[df['Category'] != "Other"]

df_viz


Unnamed: 0,Pclass,Sex,Age_Group,Survived,Count,Survival_Percentage,Category
4,1,male,child,3,3,100.0,1st Class Male
5,1,male,teen,1,4,25.0,1st Class Male
6,1,male,adult,35,82,42.68,1st Class Male
7,1,male,senior,1,12,8.33,1st Class Male
16,3,female,child,11,23,47.83,3rd Class Child
20,3,male,child,9,25,36.0,3rd Class Child


In [54]:
fig = px.histogram(
            df_viz,  
             y='Category',
             x='Survival_Percentage',
             histfunc='avg',
             hover_data=['Category'],
             template='plotly_white',
             color_discrete_sequence=px.colors.qualitative.D3
            )
fig.show()

In [None]:
import streamlit as st

from apputil import *

st.write(
'''
# Did men in the first class have a higher survival rate than children in the third class?

'''
)
# Generate and display the figure
fig1 = visualize_demographic()
st.plotly_chart(fig1, use_container_width=True)





## Exercise 2: Family Size and Wealth

Using the Titanic dataset, write a function named `family_groups()` to explore the relationship between family size, passenger class, and ticket fare.  

1. Create a new column in the Titanic dataset that represents the total family size for each passenger, `family_size`. Family size is defined as the number of siblings/spouses aboard plus the number of parents/children aboard, plus the passenger themselves.

2. Group the passengers by family size and passenger class. For each group, calculate:  
   - The total number of passengers, `n_passengers`
   - The average ticket fare, `avg_fare`
   - The minimum and maximum ticket fares (to capture variation in wealth), `min_fare` and `max_fare`

3. Return a table with these results, sorted so that the values are clear and easy to interpret (for example, by class and then family size).

4. Write a function called `last_names()` that extracts the last name of each passenger from the `Name` column, and returns the count for each last name (i.e., a pandas series with last name as index, and count as value). Does this result agree with that of the data table above? Share your findings in your app using `st.write`.

5. Just like you did in Exercise 1, come up with a clear question that your results makes you curious about. Write this question in your app.py file above the call to your visualization function. Then, create a Plotly visualization in a function named `visualize_families()` that directly addresses your question. As in Exercise 1 you are free to choose the chart type that you think best communicates the findings.

## Bonus Question

Add a new column, `older_passenger`, to the Titanic dataset that indicates whether each passenger’s age is above the median age for *their* passenger class. So, suppose row $x$ is in passenger class 2. Then, a value of `True` at row $x$ would indicate that passenger older than 50% of class 2 passengers, and `False` would indicate that they younger.

- You should use pandas functions to accomplish this.
- The new column should contain Boolean values (True if the age is above the median, False if less than or equal to).
- Return the updated table in the function `determine_age_division()`

Once you’ve created this column, consider how this age division relates to your analysis above. Try to visualize this analysis in Plotly using the function name `visualize_age_division()`.