# Topic 15 – Healthcare Call Data Analysis During Emergency Times (Kaggle)

**Level:** Easy  
**Goal:** Analyze and forecast healthcare call center data during emergency periods using time series analysis. Suitable for both traditional ARIMA/SARIMA methods and ML/DL approaches.

## Dataset
- **Source:** Healthcare Call Data Analysis During Emergency Times – Kaggle
- **Link:** https://www.kaggle.com/datasets/shuvokumarbasak2030/healthcare-call-data-analysis-duringemergencytimes/data

## Download Instructions
1. Open the dataset link above.
2. Log in to Kaggle.
3. Click "Download".
4. Extract to `data/healthcare_calls/` folder.
5. Use the file `daily_and_month_call_report.csv`.


## Installation

Install required packages:


In [None]:
# Install required packages (uncomment if needed)
# !pip install pandas numpy matplotlib seaborn statsmodels scikit-learn tensorflow


## Data Loading

Load the healthcare call data. The dataset contains monthly data with multiple time series.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load the dataset
df = pd.read_csv("data/healthcare_calls/daily_and_month_call_report.csv")

# Display basic information
print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst few rows:")
df.head()


In [None]:
# Create datetime index from Year and Month columns
df['Date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'] + '-01')
df = df.set_index('Date').sort_index()

# Display basic information
print(f"Date range: {df.index.min()} to {df.index.max()}")
print(f"Number of observations: {len(df)}")
print(f"\nAvailable time series columns:")
print([col for col in df.columns if col not in ['Year', 'Month']])
print(f"\nFirst few rows:")
df.head()


In [None]:
# Check for missing values
print("Missing values:")
print(df.isnull().sum())

# Basic statistics
print(f"\nBasic statistics:")
df.describe()


In [None]:
# Plot the total number of calls over time
calls_col = 'Total Number of Calls'

plt.figure(figsize=(14, 6))
plt.plot(df.index, df[calls_col], linewidth=1.5, marker='o', markersize=3)
plt.title("Healthcare Call Center - Total Number of Calls Over Time", fontsize=14, fontweight='bold')
plt.xlabel("Date", fontsize=12)
plt.ylabel("Number of Calls", fontsize=12)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# You can also plot other time series
print(f"\nOther available time series to analyze:")
print("- Total Number of Doctors Consultancy")
print("- Number of Total Health Information")
print("- Number of Total Ambulance Information")
print("- Number of Total Complaints")
print("- Number of Calls To Know About The Service")
