## **Diabetes Risk Prediction Inference**

---

### **I. Introduction**

- **Author**  : Ayudha Amari Hirtranusi
- **Dataset** : [CDC Diabetes Health Indicators](https://archive.ics.uci.edu/dataset/891/cdc+diabetes+health+indicators)
- **Hugging Face**: [Link](https://huggingface.co/spaces/amariayudha/Diabetes_Prediction)

---------------------

**Problem Statement**

The prevalence of diabetes has significantly increased, creating a public health burden. The goal is to develop a machine learning model that can predict an individual's risk of developing diabetes with 80% recall within six months, enabling better early intervention strategies.

This notebook is designed to test the model's inference capabilities using the previously developed model.

### **II. Libraries**

The libraries used to test the model are as follows:

In [1]:
# Import Library
import pandas as pd
import pickle

**Libraries Function**
- pandas: data manipulation
- pickle: loading model

### **III. Data Loading**

The initial step involves loading the model and the inference data, which have been previously separated from the model training file.

In [2]:
# Load model
with open('tuned_model.pkl', 'rb') as model_file:
    model = pickle.load(model_file)

In [3]:
# Define new data for inference
data = {'ID': [1, 2, 3],
        'HighBP': [1, 0, 1],
        'HighChol': [1, 0, 1],
        'CholCheck': [1, 1, 1],
        'BMI': [30, 25, 35],
        'Smoker': [1, 0, 0],
        'Stroke': [0, 0, 1],
        'HeartDiseaseorAttack': [0, 0, 1],
        'PhysActivity': [0, 1, 0],
        'Fruits': [0, 1, 0],
        'Veggies': [1, 1, 0],
        'HvyAlcoholConsump': [0, 0, 0],
        'AnyHealthcare': [1, 1, 1],
        'NoDocbcCost': [0, 0, 1],
        'GenHlth': [3, 2, 4],
        'MentHlth': [5, 0, 15],
        'PhysHlth': [10, 0, 20],
        'DiffWalk': [0, 0, 1],
        'Sex': [1, 0, 1],
        'Age': [9, 7, 11],
        'Education': [4, 6, 3],
        'Income': [5, 8, 2]}

# Create the DataFrame
df = pd.DataFrame(data)

# Print the DataFrame
df

Unnamed: 0,ID,HighBP,HighChol,CholCheck,BMI,Smoker,Stroke,HeartDiseaseorAttack,PhysActivity,Fruits,...,AnyHealthcare,NoDocbcCost,GenHlth,MentHlth,PhysHlth,DiffWalk,Sex,Age,Education,Income
0,1,1,1,1,30,1,0,0,0,0,...,1,0,3,5,10,0,1,9,4,5
1,2,0,0,1,25,0,0,0,1,1,...,1,0,2,0,0,0,0,7,6,8
2,3,1,1,1,35,0,1,1,0,0,...,1,1,4,15,20,1,1,11,3,2


### **IV. Model Prediction**

Since the model handles preprocessing within the pipeline, there's no need to separately preprocess the data. The next step is simply using the saved model for making predictions.

In [4]:
# Predict using the tuned model
y_pred_inf = model.predict(df)

print(y_pred_inf)

[1 0 1]


In [5]:
# Print the predictions
for idx, pred in enumerate(y_pred_inf):
    if pred == 0:
        print(f'Prediction for individual {idx+1}: No Diabetes')
    elif pred == 1:
        print(f'Prediction for individual {idx+1}: Diabetes')

Prediction for individual 1: Diabetes
Prediction for individual 2: No Diabetes
Prediction for individual 3: Diabetes


### **V. Conclusion**

The model is able to predict the diabetes status of the raw unseen data, as demonstrated by the example showing predictions for three individuals with varying health indicators. The predictions of our model for individuals 1, 2, and 3 are **Diabetes**, **No Diabetes**, and **Diabetes** respectively. This means that, using the input data we have for inference, the model predicts that the second individuals do not have diabetes, while the first third individual is likely to have diabetes.

These predictions demonstrate the model's **ability to consider multiple factors** and their **complex interactions** in determining diabetes risk. However, it's important to note that these are predictions based on **statistical patterns** or **machine learning model**, and should not be considered as **definitive medical diagnoses**. Always consult with healthcare professionals for accurate medical assessments.