This project is based on the Kaggle dateset. The dataset is a smaller and cleaner version of the dataset published by the CDC - Behavioral Risk Factor Surveillance System in 2015 which can be found here.
Our mission will be to predict the probability of someone having diabetes based on different features (which are the responses to the survey) and compare their performance. The target variable is a binary value that represets whether the person has diabetes 1 or not 0 and the features are numerical and categorical. This project calculates and compares the predicted probability of 5 different ML models: Decision Trees, Logistic Regression, Random Forest, XGBoost and AdaBoostClassifier. It will also provide the changein probability (delta) when the parameters vary w.r.t the previous inputed data, this will allow to see how changeson the different features could affect the predicted probability.
Note: If you dont know you BMI you can calculate it from here. Disclaimer: This project is not intended to give or replace any health or medical advice provided by health proffesionals. Its for educational purposes only. You can follow me on Github and find this project here and give it a ⭐ if you may 💙.
├── ...
├── diabetes.ipynb
├── diabetes_app.py
├── Dockerfile
├── Pipfile
├── Pipfile.lock
├── Predict.py
├── Previous.txt
├── README.md
├── train.py
├── utils.py
├── Datasets
│ ├── diabetes_dataset.csv
│ └── ...
├── models
│ ├── DecisionTreeClassifier.bin
│ ├── LogisticRegression.bin
│ ├── RandomForestClassifier.bin
│ ├── XGBClassifier.bin
│ └── ...
Pipfile and Pipfile.lock files are provided. Copy the content of this folder to your machine. Then from the terminal of your IDE of preference (in the correct work directory) the following:
pipenv install
pipev shell
Now you will be in the virtual environment and will be able to run the files locally
From your console (in the correct work directory) and after the environment has been created (previous step):
You can run the train and predict files from here
python train.py
python Predict.py
or the jupyter notebook from anaconda: diabetes.ipynb
or you can run the streamlit app
streamlit run diabetes_app.py
You can now view your Streamlit app in your browser. Local http://localhost:8501 or Network http://10.97.0.6:8501
Dockerfile has been provided. To create and run the image, from your IDE terminal do the following (within the work directory):
- First option: Create and run the app yourself.
- Create image:
docker build -t diabetes_app_streamlit .
- Run image:
docker run -p 8501:8501 diabetes_app_streamlit
You can now access the Streamlit app in your web browser: Local URL: http://localhost:8501 or from URL: http://0.0.0.0:8501
- Second option: To run it using docker hub repository:
- Download image from hub run command:
docker pull supermac789/diabetes_app_streamlit:latest
- Run the command from your terminal:
docker run -p 8501:8501 supermac789/diabetes_app_streamlit:latest
You can now access the Streamlit app in your web browser: Local URL: http://localhost:8501 or from URL: http://0.0.0.0:8501
The app can be found and run from https://maclavijo-diabetespredictionproject-diabetes-app-7rf1nc.streamlit.app/.