
# Glucose Level Prediction Project

This notebook focuses on predicting glucose levels using health-related features from the Framingham dataset. 



## Conclusion / What We Learned

- **Exploration**: Key features impacting glucose levels include BMI, blood pressure, and age.
- **Modeling**: Several models were trained and tested. Random Forest performed the best.
- **Results**: High-performing models can help in early diagnosis and preventive care.
- **Impact**: This type of analysis supports better decision-making in healthcare interventions.

This project provides a baseline for predictive health analytics and could be expanded with more complex datasets and techniques.



## What To Do

1. Import and explore the `framingham.csv` dataset.
2. Clean the data (handle nulls, correct formats, etc.).
3. Visualize the distribution of glucose and related health indicators.
4. Perform feature selection and engineering.
5. Train ML models (e.g., Logistic Regression, Decision Tree, Random Forest).
6. Evaluate models using classification metrics.
7. Predict glucose levels and draw insights.
8. Visualize the model’s important features and performance.


## Importing and Exploring the dataset

(also importing required libraries and tools)

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('framingham.csv')

print("Shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nNull values:\n", df.isnull().sum())
print("\nStatistical Summary:\n", df.describe())

In [None]:
print(df.dtypes , "\n")

missing = df.isnull().sum()
print("\n Missing Values:\n", missing[missing > 0])

df_cleaned = df.copy()
num_cols = df_cleaned.select_dtypes(include=['float64', 'int64']).columns

for col in num_cols:
    if df_cleaned[col].isnull().sum() > 0:
        median_val = df_cleaned[col].median()
        df_cleaned[col].fillna(median_val, inplace=True)

#median over means as it is less effected by outlier values

print("Any remaining nulls?\n", df_cleaned.isnull().sum().sum())


Filling values instead of removing the null regions. Dropping those null values made a greater impact on the dataset and would make the model less accurate