DiaMeter: Diabetes Prediction & Follow-up

DiaMeter is a graduation project submitted to College of Computer and Information Technology, Shaqra University in partial fulfillment of the requirements for the degree of Bachelor in Computer Science.

Introduction

Problem Description and Motivation

Since diabetes has become one of the most common causes of death and the cause of many other serious diseases. And the significant increase in the number of diabetics in the world in general and in Saudi Arabia in particular. It become important that an advanced system be implemented and used to predict diabetes and monitor diabetes patients. In addition, this system should be accessible and usable for non-professionals.
Doctors diagnose diabetes by examining the symptoms that appear on patients and looking at the test results. In addition, specialists can determine a person's risk of developing diabetes by looking at the influencing factors. And since diabetes is one of the most diseases that need continuous routine follow-up, patients must visit the doctor continuously to see if any positive or negative complications of the disease occur and adjust their lifestyle accordingly.

Objectives and Proposed Methodology

There are two main objectives of the project. First, Diameter will allow everyone to predict the risk of developing diabetes by answering the related questions, such as age, weight, blood pressure. We used VotingClassifier model to combine conceptually different machine learning classifiers (LGBMClassifier, KNeighborsClassifier and GradientBoostingClassifier) and use the average predicted probabilities to balance out their individual weaknesses. And by training this model on The Pima Indians Diabetes dataset from The National Institute of Diabetes and Digestive and Kidney Diseases. DiaMeter will analyze these values and give an accurate prediction and the advice that must be followed to improve their lifestyle.
Additionally, DiaMeter will allow diabetics and doctors to follow-up: the blood sugar levels, change in weight, blood pressure level, and help them to adjust their lifestyle by calculating the appropriate calories for each person according to their age, weight and the goal they seek to reach.

DiaMeter's Dataset and Database

Dataset Overview

Data Source: The Pima Indians Diabetes dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
The dataset has 768 samples of diabetic and healthy individuals.
The dataset consist of several medical predictor variables such as BMI, blood pressure level, age. In addition to one target variable which is the outcome.

Data Cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
As we can see, the zero values in some of the variables is meaningless. It indicates missing values. Ignoring them can significantly affect the accuracy of the model we are trying to build.
There are two common ways of handling missing data, which are; entirely removing the observations from the data set or imputing a new value based on other observations.

Missing Data Imputation

We replaced the missing values with the median value for each feature after excluding the missing data. Taking into consideration the target value; to get the most accurate result possible.

Features	Records with 0 Value	% of Total Records	Median for Outcome 0	Median for Outcome 1
Glucose	5	0.65%	107.0	140.0
BloodPressure	35	4.56%	70.0	74.5
SkinThickness	227	29.56%	27.0	32.0
Insulin	374	48.7%	102.5	169.5
BMI	11	1.43%	30.1	34.3

Dataset Transformation

Data transformation is one of the fundamental steps in processing. It can significantly improve the accuracy of the machine learning model.
We used data standardization to rescale the distribution of values and export the model to use it with the new input data.

Dataset Splitting

Using K-Folds Cross Validation we split our data into k (5) different subsets (folds). We use k-1 folds to train our data and leave the last fold as test data. We then average the model against each of the folds and then finalize our model.

Database Development

The database is a basic pillar as the second goal of the project is to follow up on diabetics, where their information and medical records, clinics and doctors' data, login information for all users will be stored. In order for doctors to view the reports when needed, and for DiaMeter to give appropriate instructions to each user.
In this project we used MySQL, which is the world’s most popular open source database.

DiaMeter's Machine Learning Model

We used VotingClassifier model to combine conceptually different machine learning classifiers (LGBMClassifier, KNeighborsClassifier and GradientBoostingClassifier) and use the average predicted probabilities to balance out their individual weaknesses.

Model’s Performance
Accuracy	89.6%	Precision	86.4%
Recall	83.6%	F1 score	84.9%
Confusion Matrix
True Positives	224	False Positives	36
True Negatives	464	False Negatives	44

DiaMeter's User Interface

User interface is important to meet user expectations and support the effective functionality of DiaMeter.
That's why we used Flask which is a lightweight WSGI web application framework. Although it doesn’t include an ORM (Object Relational Manager) or such features. It does have many cool features like url routing, template engine. And it is considered as one of the most popular Python web application frameworks.

Tools and Technologies Involved

macOS: Operating System
PyCharm: Python IDE
TablePlus: Database Client
iA Writer, Pages: Documentation
Keynote: Presentation
StarUML, OmniPlan, MindNode, Pixelmator: Designing
GitHub: Version Control
MySQL: Project Database
Python: Project Backend
- Pandas: for data analysis and manipulation
- NumPy: for scientific computing
- Scikit-learn: for machine learning, including data processing, metrics and modeling
- LightGBM: for machine learning modeling
- Plotly: for data analysis and visualization
- Flask: for web application development

Design Overview (UI / UX)

Developer Perspective

User Perspective

Installation Guide

Install MySQL
Import DiaMeter's Database by running the following command line: mysql -u root -p DiaMeter < data/DiaMeter.dump
Install Python
Install the required libraries by running the following command line: pip install -r requirements.txt
(Optional) Install PyCharm and import the project and run src/main.py
Run the project using the following command line: python src/main.py
You can now access the flask server at http://127.0.0.1:5000/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
assets/images		assets/images
data		data
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiaMeter: Diabetes Prediction & Follow-up

Table of Contents

Introduction

Problem Description and Motivation

Objectives and Proposed Methodology

DiaMeter's Dataset and Database

Dataset Overview

Data Cleaning

Missing Data Imputation

Dataset Transformation

Dataset Splitting

Database Development

DiaMeter's Machine Learning Model

DiaMeter's User Interface

Tools and Technologies Involved

Design Overview (UI / UX)

Developer Perspective

User Perspective

Installation Guide

About

Releases

Packages

Languages

iAli/DiaMeter

Folders and files

Latest commit

History

Repository files navigation

DiaMeter: Diabetes Prediction & Follow-up

Table of Contents

Introduction

Problem Description and Motivation

Objectives and Proposed Methodology

DiaMeter's Dataset and Database

Dataset Overview

Data Cleaning

Missing Data Imputation

Dataset Transformation

Dataset Splitting

Database Development

DiaMeter's Machine Learning Model

DiaMeter's User Interface

Tools and Technologies Involved

Design Overview (UI / UX)

Developer Perspective

User Perspective

Installation Guide

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages