This project involves predicting the onset of diabetes in patients using the k-nearest neighbors (KNN) algorithm. The dataset used contains various health metrics and an outcome variable indicating whether a patient developed diabetes.
The project aims to:
- Predict the onset of diabetes based on health metrics such as glucose level, blood pressure, BMI, etc.
- Implement the KNN algorithm for classification.
- Evaluate the model's performance using a confusion matrix and accuracy metric.
- Determine the optimal value of k for KNN through cross-validation.
The dataset used in this project, "diabetes.csv," contains the following columns:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome (1: diabetes, 0: no diabetes)
- Load the dataset and examine the first few rows and summary statistics.
- Normalize the explanatory variables using min-max normalization.
- Split the dataset into training and test sets (80-20 split).
- Implement the KNN algorithm to predict diabetes onset with various values of k.
- Evaluate the model's performance using a confusion matrix and accuracy metric.
- Determine the optimal value of k based on the mean squared error (MSE).
- R programming language
- Required R packages:
readr
,tidyverse
,ggplot2
To replicate the analysis:
- Load the dataset "diabetes.csv" using the
read.csv
function. - Execute the provided R code chunks step by step.
- Ensure that the required R packages are installed and loaded.
- Customize the analysis as needed, such as modifying the value of k or adding additional preprocessing steps.
- Analyze the results, including the confusion matrix and accuracy metric, to assess the model's performance.
The project generate predictions for diabetes onset based on the input health metrics. The accuracy of the model and the optimal value of k is determined, providing insights into the effectiveness of the KNN algorithm for diabetes prediction.
- The dataset used in this project is sourced from a publicly available diabetes dataset.
- R Core Team and contributors for developing the R programming language and associated packages.