# ðŸš€ Data Export for Dashboarding

This notebook extracts and prepares the final datasets from our cleaned and analyzed hospital data to support the creation of interactive dashboards.  

We export five key data views that correspond to the planned dashboard visualizations, enabling efficient and focused analysis in Tableau.  

These exports serve as the bridge between SQL-based analysis and visual storytelling, completing the end-to-end project workflow.

### ðŸ§° Import Libraries

This notebook loads necessary Python libraries for data manipulation and saving aggregated CSV files that will feed into the dashboard.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

### ðŸ“‚ Load Base Dataset

Load the cleaned hospital data from previous processing steps. We will use this dataframe to generate multiple summary tables for dashboard visualization.

In [2]:
# Load the cleaned hospital info dataset for aggregation and dashboard data preparation
data_path = "../data/processed/hospital_info_clean.csv"
df_hospital = pd.read_csv(data_path)

### ðŸ“Š Dataset 1: Hospital Counts and Ratings by State

This dataset summarizes the number of hospitals and their average overall rating for each U.S. state. It will be used to build visualizations highlighting geographic distribution and quality of hospitals by state.

In [3]:
# Aggregate hospital counts and average rating by state
df_dashboard_1 = (
    df_hospital.groupby("State")
    .agg(
        hospital_count=pd.NamedAgg(column="Facility ID", aggfunc="count"),
        avg_rating=pd.NamedAgg(column="Hospital overall rating", aggfunc="mean")
    )
    .reset_index()
)

# Round average rating to 2 decimals for clarity
df_dashboard_1["avg_rating"] = df_dashboard_1["avg_rating"].round(2)

# Save the aggregated dataset for dashboard 1
output_path_1 = "../data/dashboard/hospital_counts_ratings_by_state.csv"
df_dashboard_1.to_csv(output_path_1, index=False)

### ðŸ“Š Dataset 2: Hospital Counts and Emergency Services by State

This dataset aggregates hospital counts by state along with the number and percentage of hospitals offering emergency services. It will be used to visualize emergency care availability across states.

In [4]:
# Aggregate hospital counts and emergency services by state
df_dashboard_2 = (
    df_hospital.groupby("State")
    .agg(
        hospital_count=pd.NamedAgg(column="Facility ID", aggfunc="count"),
        emergency_services_count=pd.NamedAgg(column="Emergency Services", aggfunc=lambda x: (x == 'Yes').sum())
    )
    .reset_index()
)

# Calculate percentage of hospitals with emergency services
df_dashboard_2["emergency_services_percent"] = (
    df_dashboard_2["emergency_services_count"] / df_dashboard_2["hospital_count"] * 100
).round(1)

# Save the aggregated dataset for dashboard 2
output_path_2 = "../data/dashboard/hospital_counts_emergency_by_state.csv"
df_dashboard_2.to_csv(output_path_2, index=False)

### ðŸ“Š Dataset 3: Distribution of Hospitals by Type and Ownership

This dataset provides counts of hospitals grouped by their type and ownership status. It will support visualizations exploring how hospital characteristics vary across ownership models and types.

In [5]:
# Aggregate hospital counts by type and ownership
df_dashboard_3 = (
    df_hospital.groupby(["Hospital Type", "Hospital Ownership"])
    .agg(hospital_count=pd.NamedAgg(column="Facility ID", aggfunc="count"))
    .reset_index()
)

# Save the aggregated dataset for dashboard 3
output_path_3 = "../data/dashboard/hospital_counts_by_type_ownership.csv"
df_dashboard_3.to_csv(output_path_3, index=False)

### ðŸ“Š Dataset 4: Hospital Emergency Services Availability by State

This dataset summarizes the count of hospitals offering emergency services, broken down by state. It will help visualize emergency care accessibility across different regions.

In [6]:
# Filter hospitals that offer emergency services and count by state
df_dashboard_4 = (
    df_hospital[df_hospital["Emergency Services"] == "Yes"]
    .groupby("State")
    .agg(emergency_hospital_count=pd.NamedAgg(column="Facility ID", aggfunc="count"))
    .reset_index()
)

# Save the aggregated dataset for dashboard 4
output_path_4 = "../data/dashboard/emergency_services_by_state.csv"
df_dashboard_4.to_csv(output_path_4, index=False)

### ðŸ“Š Dataset 5: Distribution of Hospital Types by State

This dataset provides counts of different hospital types in each state, useful for analyzing healthcare infrastructure diversity.

In [7]:
# Count hospitals by type and state
df_dashboard_5 = (
    df_hospital.groupby(["State", "Hospital Type"])
    .agg(hospital_type_count=pd.NamedAgg(column="Facility ID", aggfunc="count"))
    .reset_index()
)

# Save the aggregated dataset for dashboard 5
output_path_5 = "../data/dashboard/hospital_types_by_state.csv"
df_dashboard_5.to_csv(output_path_5, index=False)

### ðŸŽ¯ Conclusion

This notebook prepares and exports five focused datasets derived from the CMS Hospital General Information data.  
These datasets are designed to support the upcoming dashboard visualizations, providing key aggregated metrics by state, hospital type, and quality indicators.  

With these clean, structured data exports ready, we can proceed confidently to the dashboard development phase to deliver insightful healthcare analytics.