# **Diabetes Risk Percentage**

## Objectives

- Upload intially cleaned cardiovascular disease dataset
- Calculate the risk percentage of diabetes based on the dataset using ChatGPT

## Inputs

- **Dataset:** cardio_data_processed_clean.csv
- Required libraries: pandas, numpy, openai

## Outputs

- **Cleaned dataset:** cardio_data_with_diabetes_risk.csv is saved back to raw folder under dataset directory 


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/Users/raihannasir/Documents/DA_AI/diabetes_risk/diabetes_risk/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/Users/raihannasir/Documents/DA_AI/diabetes_risk/diabetes_risk'

## Load necessary libraries

In [4]:
import pandas as pd
import numpy as np

In [13]:
raw_path = 'dataset/cleaned/cardio_data_processed_clean.csv'

---

# Load cleaned cardiovascular disease dataset for diabetes risk percentage calculation


In [14]:
df = pd.read_csv(raw_path)

## Based on `bmi` column of the cardiovascular disease dataset, asked ChatGPT to calculate the risk percentage of diabetes

In [None]:
# Code copied from the ChatGPT conversation

# Define BMI category
def get_bmi_category(bmi):
    if bmi < 18.5:
        return "Underweight"
    elif 18.5 <= bmi < 25:
        return "Normal"
    elif 25 <= bmi < 30:
        return "Overweight"
    elif 30 <= bmi < 35:
        return "Obese I"
    elif 35 <= bmi < 40:
        return "Obese II"
    else:
        return "Obese III"

# Define age group
def get_age_group(age):
    if age < 30:
        return "18-29"
    elif age < 40:
        return "30-39"
    elif age < 50:
        return "40-49"
    elif age < 60:
        return "50-59"
    else:
        return "60+"

# Risk table including underweight
risk_table = {
    "18-29": {
        "Underweight": (1, 2), "Normal": (2, 4), "Overweight": (5, 7),
        "Obese I": (10, 15), "Obese II": (15, 20), "Obese III": (25, 35)
    },
    "30-39": {
        "Underweight": (2, 3), "Normal": (5, 7), "Overweight": (10, 15),
        "Obese I": (20, 25), "Obese II": (25, 35), "Obese III": (40, 50)
    },
    "40-49": {
        "Underweight": (3, 5), "Normal": (10, 12), "Overweight": (20, 25),
        "Obese I": (35, 45), "Obese II": (45, 55), "Obese III": (60, 70)
    },
    "50-59": {
        "Underweight": (5, 7), "Normal": (15, 18), "Overweight": (30, 35),
        "Obese I": (50, 60), "Obese II": (60, 70), "Obese III": (70, 80)
    },
    "60+": {
        "Underweight": (7, 10), "Normal": (20, 25), "Overweight": (35, 45),
        "Obese I": (55, 65), "Obese II": (65, 75), "Obese III": (75, 85)
    }
}

# Add BMI category and age group
df["bmi_category"] = df["bmi"].apply(get_bmi_category)
df["age_group"] = df["age"].apply(get_age_group)

# Compute risk
def calculate_diabetes_risk(row):
    risk_range = risk_table[row["age_group"]][row["bmi_category"]]
    base_risk = np.random.uniform(*risk_range)
    if row["gender"] == 2:  # Female
        return round(base_risk * np.random.uniform(0.85, 0.95), 2)
    else:  # Male
        return round(base_risk * np.random.uniform(0.95, 1.05), 2)

df["diab_risk_percent"] = df.apply(calculate_diabetes_risk, axis=1)


✅ Diabetes risk added and file saved as 'cardio_data_with_diabetes_risk.csv'


## Save dateset with diabetes risk percentage to a new CSV file back to the raw folder under dataset directory

In [18]:
# Save to new CSV
df.to_csv("dataset/raw/cardio_data_with_diabetes_risk.csv", index=False)

print("✅ Diabetes risk added and file saved as 'cardio_data_with_diabetes_risk.csv'")

✅ Diabetes risk added and file saved as 'cardio_data_with_diabetes_risk.csv'
