# Try 1.5.2: Grouped Bar Charts

This notebook demonstrates creating grouped bar charts using the seaborn library.

By setting the `hue` parameter equal to a group, a bar chart is created that counts the frequency of a category by group.

**Example**: The code that counts the number of people who survived and didn't survive the sinking of the Titanic by class is given below.

In [None]:
# Import required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Set the style for better-looking plots
sns.set_style("whitegrid")

## Example 1: Titanic Survival by Class

We'll use the famous Titanic dataset to demonstrate grouped bar charts.

In [None]:
# Load Titanic dataset (available in seaborn)
titanic_df = sns.load_dataset('titanic')

# Display first few rows
print("Titanic dataset preview:")
print(titanic_df[['class', 'survived', 'sex', 'age']].head(10))
print(f"\nTotal records: {len(titanic_df)}")

In [None]:
# Create a grouped bar chart: Survival by Class
plt.figure(figsize=(10, 6))
ax = sns.countplot(x='class', hue='survived', data=titanic_df)
plt.title('Titanic Survival by Class', fontsize=14, fontweight='bold')
plt.xlabel('Class', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.tight_layout()
plt.show()

print("\nInterpretation: This chart shows the number of survivors vs. non-survivors in each class.")
print("First class had the highest survival rate, while third class had the most casualties.")

## Example 2: Titanic Survival by Sex

Let's examine survival rates by gender.

In [None]:
# Create a grouped bar chart: Survival by Sex
plt.figure(figsize=(8, 6))
ax = sns.countplot(x='sex', hue='survived', data=titanic_df)
plt.title('Titanic Survival by Sex', fontsize=14, fontweight='bold')
plt.xlabel('Sex', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.tight_layout()
plt.show()

## Example 3: U.S. Workforce by Decade and Gender

Let's recreate a grouped bar chart similar to the one in the textbook showing men vs. women in the workforce over time.

In [None]:
# Create sample workforce data (in thousands)
workforce_data = {
    'Year': [1970, 1970, 1980, 1980, 1990, 1990, 2000, 2000, 2010, 2010],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'Number': [51228, 31543, 61453, 45487, 69011, 56829, 76280, 66303, 82130, 71904]
}

df_workforce = pd.DataFrame(workforce_data)

print("U.S. Workforce Data (in thousands):")
print(df_workforce)

In [None]:
# Create grouped bar chart for workforce
plt.figure(figsize=(12, 6))
ax = sns.barplot(x='Year', y='Number', hue='Gender', data=df_workforce, palette=['#1f77b4', '#ff7f0e'])
plt.title('Sex of the U.S. Workforce', fontsize=14, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('Number (in thousands)', fontsize=12)
plt.legend(title='Gender')

# Format y-axis
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x/1000)}k'))

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("1. The number of women in the workforce has increased significantly over time")
print("2. The gap between men and women has narrowed but not equalized")
print("3. The total number of workers has increased consistently")

## Example 4: Horizontal Grouped Bar Chart

Creating a horizontal grouped bar chart can be useful for longer category labels.

In [None]:
# Create sample data for hospital locations
hospital_data = {
    'Location': ['Large Central Metro', 'Large Central Metro', 
                 'Large Fringe Metro', 'Large Fringe Metro',
                 'Small/Medium Metro', 'Small/Medium Metro',
                 'Nonmetropolitan', 'Nonmetropolitan'],
    'Provider': ['Physician', 'PA/APN', 'Physician', 'PA/APN', 'Physician', 'PA/APN', 'Physician', 'PA/APN'],
    'Percentage': [85, 15, 82, 18, 78, 22, 72, 28]
}

df_hospital = pd.DataFrame(hospital_data)

# Create horizontal grouped bar chart
plt.figure(figsize=(10, 6))
ax = sns.barplot(y='Location', x='Percentage', hue='Provider', data=df_hospital)
plt.title('Utilization of Physician Assistants and Advanced Practice Nurses', fontsize=14, fontweight='bold')
plt.xlabel('Percentage', fontsize=12)
plt.ylabel('Hospital Location', fontsize=12)
plt.legend(title='Provider Type')
plt.tight_layout()
plt.show()

print("\nObservation: PA/APN utilization increases as hospital location becomes less metropolitan.")

## Exercise

Try creating your own grouped bar chart:
1. Create a dataset with at least 3 categories and 2 groups
2. Create a grouped bar chart
3. Experiment with both vertical and horizontal orientations
4. Add appropriate titles, labels, and legends

In [None]:
# Your code here
