# Patient Health and Lifestyle Dataset Guide

## Overview
This dataset contains comprehensive patient health information including demographics, lifestyle choices, and medical history. The dataset is valuable for analyzing health outcomes and predicting heart disease risk based on various patient characteristics.

### Key Details:
- **Number of Entries**: 237,630 patient records
- **Number of Features**: 36
- **Target Variable**: Heart Disease status

### Dataset Features:
| Category | Features | Description |
|----------|-----------|-------------|
| **Patient Information** | PatientID, State, Sex, AgeCategory | Basic demographic information |
| **Health Metrics** | HeightInMeters, WeightInKilograms, BMI | Physical measurements |
| **Medical History** | HadHeartAttack, HadStroke, HadDiabetes | Previous medical conditions |
| **Lifestyle** | SmokerStatus, ECigaretteUsage, AlcoholDrinkers | Behavioral factors |
| **Healthcare** | FluVaxLast12, PneumoVaxEver, HIVTesting | Preventive care measures |
| **COVID-19** | CovidPos | COVID-19 test status |

## Objective
The **target variable** is the presence of heart disease. Your goal is to build a model that predicts whether a patient has heart disease based on their health metrics, lifestyle factors, and medical history.

The evaluation metric is **Area Under the ROC Curve (AUC-ROC)**, which is appropriate for binary classification problems.

## Loading the Data

The dataset is available in CSV format. Let's go through the process of loading and exploring it.

### Step 1: Import Libraries
We'll need several libraries for data analysis and modeling.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np


### Step 2: Load the Dataset
Load the patient health data into a Pandas DataFrame.

In [None]:
# Path to the dataset
data_path = "inputs.csv"  # Update path as needed

# Load the dataset
df = pd.read_csv(data_path)

# Display the first few rows
print("Dataset Preview:")
df.head()

### Step 3: Data Exploration
Let's examine the dataset's structure and characteristics.

In [None]:
# Basic dataset information
print("Dataset Dimensions:")
print(f"Number of patients: {df.shape[0]}")
print(f"Number of features: {df.shape[1]}")

# Check data types and missing values
print("\nDataset Info:")
df.info()

# Summary statistics for numerical columns
print("\nNumerical Features Summary:")
df.describe()