This project uses the K-Nearest Neighbors algorithm to predict whether a person is likely to have diabetes based on health-related features such as glucose level, BMI, blood pressure, insulin, age, and other medical measurements.
The main goal of this project is to understand how KNN classification works and how feature scaling affects distance-based machine learning models.
The dataset used in this project is the Pima Indians Diabetes Dataset from Kaggle.
Target column:
Outcome0= Not Diabetic1= Diabetic
The dataset contains the following features:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome
- Python
- Pandas
- Scikit-learn
- StandardScaler
- K-Nearest Neighbors
- Streamlit
- Joblib
- Loaded the dataset
- Separated input features and target column
- Split the data into training and testing sets
- Applied feature scaling using StandardScaler
- Built a KNN classification model
- Tested different K values
- Selected the best K value
- Evaluated the final model using accuracy, confusion matrix, and classification report
- Built a simple Streamlit UI for prediction
The KNN model was evaluated using:
- Accuracy Score
- Confusion Matrix
- Precision
- Recall
- F1-score
The model achieved good performance for a beginner-level classification project after testing different K values.
A simple Streamlit web app was created where users can enter health-related values and get a prediction.
To run the app:
streamlit run app.py