<h1 style="text-align: center;"><b>4.Predictive Analysis</b></h1>

<font size="4">**Q.Given a patient's current glucose, calories, heart rate, and bolus volume delivered, can we predict their glucose level 30 minutes in the future?**</font>

<font size=3>**Explored basal insulin adequacy:**

Identified that if hypoglycemia happens during fasting (no carb intake, no bolus) → basal insulin may be too high.</font>


In [None]:
import pandas as pd
import plotly.express as px

# Load dataset
df = pd.read_csv(r"C:\Python_Hackathon_Aug2025\HUPA-UC Diabetes Dataset\HUPA0003P.csv", sep=";")

# Convert all column names in DataFrame to lowercase
df.columns = df.columns.str.lower()
print(df.columns)

# Sort the data by time. which is essential for time-series analysis.
df = df.sort_values(by='time').reset_index(drop=True)

# Create the target variable: a 'future' glucose level.
#predicts glucose for next 30 mins (5 min interval * 6 = 30 mins)
# The last row will have a NaN value, which we must drop.
df['future_glucose'] = df['glucose'].shift(-6)
df.dropna(inplace=True)

# Define the features (X) and the new target variable (y).
features = ['glucose', 'calories', 'heart_rate', 'bolus_volume_delivered', 'carb_input']
target = 'future_glucose'

X = df[features]
y = df[target]

# For time-series data, it's crucial to split the data chronologically to prevent data leakage.
# We'll use a simple index split (e.g., 80% for training, 20% for testing).
split_point = int(len(df) * 0.8)

X_train = X.iloc[:split_point]
X_test = X.iloc[split_point:]
y_train = y.iloc[:split_point]
y_test = y.iloc[split_point:]

# Print the shapes to confirm the time-based split
print(f'Training features shape: {X_train.shape}')
print(f'Testing features shape: {X_test.shape}')
print(f'Training target shape: {y_train.shape}')
print(f'Testing target shape: {y_test.shape}')

# Now, we'll create a graph to visualize the split.
# Combine the training and testing data for plotting and add a 'split' column
df_train = X_train.copy()
df_train['glucose_level'] = y_train
df_train['split'] = 'Training'

df_test = X_test.copy()
df_test['glucose_level'] = y_test
df_test['split'] = 'Testing'

# Concatenate the dataframes to plot them together
plot_df = pd.concat([df_train, df_test])

# Plot the glucose levels over time, colored by the data split
fig = px.line(
    plot_df.reset_index(),
    x='index',
    y='glucose_level',
    color='split',
    title='Glucose Levels: Training vs. Testing Data Split',
    labels={'index': 'Time Step', 'glucose_level': 'Glucose Level'}
)

# Show the plot
fig.show()
