# Final Data Story - Table of Contents

## Team03 - Cracking the Code: Predicting Fracture Risk using Health and Lifestyle Data

### Team Members
| Name          | Pawprint | Emphasis Area             | Reason for selecting project                                     |
|---------------|----------|---------------------------|----------------------------------------------------------------|
| Josh Jaeger   | Jwj8c8   | High Performance Computing | While my background is in engineering and utility data, I have experience in the medical field and have seen what data can do to improve patient outcomes. |
| Tyler Hall    | Hallty   | Biohealth                 | Background is in physiology and previous experience at MOI.      |
| Karen Bromert | bromertk | Biohealth                 | My background is in biology but I’m interested in the medical recovery process. |
| David Turvey  | Dtfp3    | High Performance Computing | Worked at Cerner for 10 years assisting with interoperability of data. |

### Project Description
Our project scope is to identify and understand the effects of lifestyle factors, medical history, and specific biomarkers on the risk of osteoporosis-related bone fractures. The objective is to build a predictive model that can anticipate the likelihood of fractures based on identified risk factors. The project will involve the following components:

1. **Data Analysis and Exploration:**
   - Descriptive Analytics: Examine general trends, patterns, and correlations in the data to gain insights into the relationships between risk factors and fractures.
   - Diagnostic Analytics: Delve deeper into identified patterns to understand the underlying reasons behind these patterns.

2. **Predictive Modeling:**
   - Utilize machine learning algorithms to develop a predictive model that accurately predicts fracture risk based on health and lifestyle factors.
   - Employ a black box ensemble method to identify important features and refine the model's accuracy.

3. **Prescriptive Analytics:**
   - Provide prescriptive insights based on the predictive model to guide preventive measures, treatment plans, and policy decisions.
   - Recommend lifestyle changes and interventions for high-risk groups identified by the model.

The project will provide valuable insights into fracture risk assessment and enable informed decision-making regarding preventive measures, treatments, and policies related to osteoporosis-related fractures.


### Domain Question
When combined with BMD data, what specific health-related issues or lifestyle factors are associated with increased fracture risk?


### Table of Contents
These notebooks represent the various incremental steps we've made over the last five Sprint Increments. Each notebook contributes to our datastory in a meaningful way. 

#### Data Shaping and Carpentry 
1. [Data1.ipynb](../SpIn_2_Artifacts/Data1.ipynb): Conversion of our raw SAS7BDAT data files to CSV and upload in to the Postgres database
1. [Data2.ipynb](../SpIn_3_Artifacts/Data2.ipynb): CSV Creation of 12 patient forms from Postgres table and insertion in to our group network share
1. [Data3.ipynb](../SpIn_4_Artifacts/Data3.ipynb): Merged dataset consisting of all 12 patient forms and target variables 
1. [Data4.ipynb](../SpIn_5_Artifacts/Data4.ipynb): Updates to the Merged dataset for modeling (PCA, Correlation feature reduction, SMOTE) and accompanying train, test, and validation datasets for modeling

#### Exploratory Data Analysis and Visualization
1. [Bone Mineral Density](../SpIn_4_Artifacts/EDA4-B1.ipynb): Analysis of Bone Mineral Densities from the MrOS B1 Data
1. [Fracture Outcomes Analysis](../SpIn_4_Artifacts/EDA4_Fracture.ipynb) Analysis of our Fracture target variables 

##### Health Form Analysis: examples of patient presented forms can be [found here](../SpIn_3_Artifacts/V1_ANNOTATED.pdf). 
1. [Dietary Health - DH](../SpIn_4_Artifacts/EDA4-V1-DH.ipynb)
1. [Functional Vision - FV](../SpIn_4_Artifacts/EDA4-V1-FV.ipynb)
1. [Grip Strength - GS](../SpIn_4_Artifacts/EDA4-V1-GS.ipynb)
1. [Height and Weight - HW](../SpIn_4_Artifacts/EDA4-V1-HW.ipynb)
1. [Nottingham Power Rig - NP](../SpIn_4_Artifacts/EDA4-V1-NP.ipynb)
1. [Medical History - MH](../SpIn_4_Artifacts/EDA4-V1-MH.ipynb)
1. [Medication Use - MU](../SpIn_4_Artifacts/EDA4-V1-MU.ipynb)
1. [Neruomuscular Function - NF](../SpIn_4_Artifacts/EDA4-V1-NF.ipynb)
1. [Tabacco & Alcohol Use - TU](../SpIn_4_Artifacts/EDA4-V1-TU.ipynb)
1. [Fracture History - FF](../SpIn_4_Artifacts/EDA4-V1-FF.ipynb)
1. [General Information - GI](../SpIn_4_Artifacts/EDA4-V1-GI.ipynb)

#### Modeling
Our team decided to divide and conquer with modeling.  Each team member built their own model and then measured the outcomes.  We discovered that the performance of the models drop after being tested on the validation set. Further work could be done on optimizations. Despite the current model not meeting our initial expectations, health screening applications are viable.

1. [Random Forest](../SpIn_5_Artifacts/SPIN5_Random_Forest_Boruta.ipynb) <br>
  <img align="left" width="400" height="300" src="Random_Forest_test_performance.png">
<br><br><br><br><br><br><br><br><br>
1. [LightGBM](../SpIn_5_Artifacts/EDA5_Hyperparameter_Tuning_LightGBM.ipynb) <br>
  <img align="left" width="400" height="300" src="lightgmb_test_performance.png">
<br><br><br><br><br><br><br><br>
1. [XGBoost Model 1](../SpIn_5_Artifacts/XGBoostSPIN5.ipynb) <br>
  <img align="left" width="400" height="300" src="xgboost_model1_test_performance.png">
<br><br><br><br><br><br><br><br><br>
1. [XGBoost Model 2](../SpIn_5_Artifacts/XGBoostSPIN5_Model2.ipynb) <br>
  <img align="left" width="400" height="300" src="../SpIn_5_Artifacts/xgboost_model2_val_performance.png">
<br><br><br><br><br><br><br><br>

### Summary Table of Model Scores
Below is the summary table of the model results.
Because the models were developed with the purpose of becoming a screening tool, the goal was to set a Recall value of 0.80. This would act as an acceptable amount of False negatives. From there the best model is one that also balanced precision well to minimize false positives. This left XGBoost Model 1 as the best performing model on the Test set.

<!DOCTYPE html>
<html>
<head>
  <style>
    table {
      border-collapse: collapse;
      width: 100%;
      margin-left: 0; /* Added to align the table to the left */
    }

    th, td {
      border: 1px solid black;
      padding: 8px;
      text-align: left;
    }

    th {
      font-weight: bold;
    }

    tr.row-names th {
      font-weight: bold;
    }
  </style>
</head>
<body>

<table>
  <tr>
    <th></th>
    <th>Precision</th>
    <th>Recall</th>
    <th>F1-Score</th>
    <th>Accuracy</th>
    <th>AUC Score</th> <!-- Added AUC Score column -->
  </tr>
  <tr class="row-names">
    <th>Random Forest</th>
    <td>0.24</td>
    <td>0.84</td>
    <td>0.38</td>
    <td>0.41</td>
    <td>0.58</td> <!-- AUC Score value for Random Forest -->
  </tr>
  <tr class="row-names">
    <th>Light GBM</th>
    <td>0.25</td>
    <td>0.82</td>
    <td>0.38</td>
    <td>0.43</td>
    <td>0.63</td> <!-- AUC Score value for Light GBM -->
  </tr>
  <tr class="row-names">
    <th>XGBoost Model 1</th>
    <td>0.26</td>
    <td>0.80</td>
    <td>0.39</td>
    <td>0.46</td>
    <td>0.63</td> <!-- AUC Score value for XGBoost Model 1 -->
  </tr>
  <tr class="row-names">
    <th>XGBoost Model 2</th>
    <td>0.23</td>
    <td>0.88</td>
    <td>0.37</td>
    <td>0.35</td>
    <td>0.62</td> <!-- AUC Score value for XGBoost Model 2 -->
  </tr>
</table>

</body>
</html>
