<p align="center">
  <img src="https://raw.githubusercontent.com/pachecowillians/svg-icons/24b0ca90f467a751be9f0e7e5fa50801f89e4d17/img/diabetes.svg" alt="Diabetes" width="100px">
</p>

Predictive analysis of individuals' health, specifically focusing on determining the presence or absence of diabetes. The notebook utilizes machine learning techniques and a dataset containing relevant health indicators such as gender, BMI, hypertension, and age. By training a classification model and evaluating its performance, valuable insights can be gained regarding the predictive factors associated with diabetes.

In [306]:
# Importing the pandas library
import pandas as pd

In [307]:
# Loading the Wine Dataset
ds = pd.read_csv('dataset/diabetes_prediction_dataset.csv')

# Displaying the dataset
ds.head()

Unnamed: 0,gender,age,hypertension,heart_disease,smoking_history,bmi,HbA1c_level,blood_glucose_level,diabetes
0,Female,80.0,0,1,never,25.19,6.6,140,0
1,Female,54.0,0,0,No Info,27.32,6.6,80,0
2,Male,28.0,0,0,never,27.32,5.7,158,0
3,Female,36.0,0,0,current,23.45,5.0,155,0
4,Male,76.0,1,1,current,20.14,4.8,155,0


In [308]:
def generateIdToColumn(column):
    # Get the distinct values from the column
    unique_values = column.unique()

    # Create a mapping dictionary for the distinct values
    mapping_gender = {valor: index for index, valor in enumerate(unique_values)}

    # Replace the values in the column with the corresponding number
    column = column.map(mapping_gender)

    return column

In [309]:
ds['gender'] = generateIdToColumn(ds['gender'])
ds.head()

Unnamed: 0,gender,age,hypertension,heart_disease,smoking_history,bmi,HbA1c_level,blood_glucose_level,diabetes
0,0,80.0,0,1,never,25.19,6.6,140,0
1,0,54.0,0,0,No Info,27.32,6.6,80,0
2,1,28.0,0,0,never,27.32,5.7,158,0
3,0,36.0,0,0,current,23.45,5.0,155,0
4,1,76.0,1,1,current,20.14,4.8,155,0


In [310]:
ds['smoking_history'] = generateIdToColumn(ds['smoking_history'])
ds.head()

Unnamed: 0,gender,age,hypertension,heart_disease,smoking_history,bmi,HbA1c_level,blood_glucose_level,diabetes
0,0,80.0,0,1,0,25.19,6.6,140,0
1,0,54.0,0,0,1,27.32,6.6,80,0
2,1,28.0,0,0,0,27.32,5.7,158,0
3,0,36.0,0,0,2,23.45,5.0,155,0
4,1,76.0,1,1,2,20.14,4.8,155,0


In [311]:
# Splitting the dataset into goal (target variable) and predict (feature variables)
goal = ds['diabetes']
predict = ds.drop('diabetes', axis=1)

In [312]:
# Importing the train_test_split function
from sklearn.model_selection import train_test_split

In [313]:
# Splitting the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(predict, goal, test_size=0.3)

In [314]:
# Printing the shapes of the datasets
print(ds.shape, x_train.shape, x_test.shape)

(100000, 9) (70000, 8) (30000, 8)


In [315]:
# Importing the ExtraTreesClassifier
from sklearn.ensemble import ExtraTreesClassifier

In [316]:
# Creating an instance of the ExtraTreesClassifier model
model = ExtraTreesClassifier()

# Training the model
training = model.fit(x_train, y_train)

In [317]:
# Calculating the accuracy of the trained model on the test set
result = model.score(x_test, y_test)

# Printing the accuracy
print("Accuracy:", result)

Accuracy: 0.9672666666666667


In [318]:
# Importing the random module
import random

In [319]:
random_sample = x_test.sample(n=10)

In [320]:
random_sample_goal = y_test.loc[random_sample.index]

In [321]:
import numpy as np

In [322]:
random_sample_goal = np.array(random_sample_goal)
print(random_sample_goal)

[0 0 0 0 0 0 0 0 0 0]


In [323]:
# Making predictions on the selected subset of test data
prediction = model.predict(random_sample)
print(prediction)

[0 0 0 0 0 0 0 0 0 0]


In [324]:
# Printing the comparison results
print(random_sample_goal == prediction)

[ True  True  True  True  True  True  True  True  True  True]
