# Heart Failure Detection - Data Profiling Demo

This notebook demonstrates how to use ydata-profiling to generate comprehensive reports for your heart failure detection dataset.

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ydata_profiling import ProfileReport

# Set plot style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('Set2')

## 2. Load the Dataset

First, let's load the heart failure dataset.

In [None]:
# Replace this path with the path to your dataset
data_path = "artifacts/2025-04-10_21-03-39/data_ingestion/feature_store/heart_failure_data.csv"

# Load the data
data = pd.read_csv(data_path)

# Display the first few rows
data.head()

## 3. Basic Data Exploration

In [None]:
# Check the shape of the dataset
print(f"Dataset shape: {data.shape}")

# Check data types
data.dtypes

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
missing_percentage = (missing_values / len(data)) * 100

missing_df = pd.DataFrame({
    'Missing Values': missing_values,
    'Percentage': missing_percentage
})

# Display only columns with missing values
missing_df[missing_df['Missing Values'] > 0].sort_values('Missing Values', ascending=False)

## 4. Generate Profiling Report

Now, let's generate a comprehensive profiling report using ydata-profiling.

In [None]:
# Generate a profiling report
profile = ProfileReport(data, title="Heart Failure Dataset Profiling Report", explorative=True)

# Display the report in the notebook
profile

## 5. Save the Report

You can save the report as an HTML file for future reference.

In [None]:
# Save the report to an HTML file
profile.to_file("heart_failure_profile_report.html")
print("Report saved to 'heart_failure_profile_report.html'")

## 6. Generate Minimal Report

You can also generate a minimal report for a quicker overview.

In [None]:
# Generate a minimal report
minimal_profile = ProfileReport(data, title="Heart Failure Dataset - Minimal Report", minimal=True)

# Display the minimal report
minimal_profile

## 7. Compare Datasets

You can also use ydata-profiling to compare different datasets, such as training and testing sets.

In [None]:
# Load training and testing datasets
train_path = "artifacts/2025-04-10_21-03-39/data_ingestion/dataset/train.csv"
test_path = "artifacts/2025-04-10_21-03-39/data_ingestion/dataset/test.csv"

train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)

print(f"Training set shape: {train_data.shape}")
print(f"Testing set shape: {test_data.shape}")

In [None]:
# Generate profile reports for training and testing sets
train_profile = ProfileReport(train_data, title="Training Dataset", minimal=True)
test_profile = ProfileReport(test_data, title="Testing Dataset", minimal=True)

# Compare the reports
comparison_report = train_profile.compare(test_profile)

# Display the comparison report
comparison_report

## 8. Save the Comparison Report

In [None]:
# Save the comparison report
comparison_report.to_file("train_test_comparison_report.html")
print("Comparison report saved to 'train_test_comparison_report.html'")

## 9. Conclusion

In this notebook, we demonstrated how to use ydata-profiling to generate comprehensive reports for your heart failure detection dataset. These reports provide valuable insights into your data, including:

- Basic statistics (mean, median, min, max, etc.)
- Missing values
- Correlations between variables
- Distributions of variables
- Potential issues with the data

You can use these insights to better understand your data and make informed decisions during the data preprocessing and modeling phases.