<h1 align='center'>Sepsis Detection with ML API using FastAPI<h1>

## Business Objective

Early detection of sepsis is essential for patients to receive timely treatment and improve their chances of survival. Fortunately, machine learning technology can help medical professionals identify sepsis quickly and accurately. By using a Machine-Learning (ML) API for sepsis detection, healthcare providers can analyze a patient's medical data and receive real-time notifications when symptoms of sepsis are present.

This can lead to faster diagnosis and treatment, saving lives and reducing healthcare costs. With the power of ML technology at our fingertips, we have an opportunity to make a significant impact on the fight against sepsis.

`Goal:`The aim of this project is to develop an ML-powered sepsis detection system capable of accurately identifying early signs of sepsis in patient data to enable timely intervention and improve survival rates.


### Hypothesis

`Null Hypothesis:` Older patients are associated with a higher likelihood of developing sepsis.

`Alternate Hypothesis:` Older patients are not associated with a higher likelihood of developing sepsis.


### Analytical Questions

1. How does the distribution of vital signs (PRG, PL, PR, SK, TS, M11, BD2) vary across different age groups?

2. Are there any noticeable differences in the distribution of vital signs between patients who are likely to develop sepsis and those who are not?

3. How does the distribution of patients who are likely to develop sepsis and those who are not vary across the different patient's ages?

4. How does the distribution of patients who are likely to develop sepsis and those who are not vary across the different insurance status?

5. What is the relationship between Age and Blood Pressure with Sepsis status as hue?

## Setup

### Importation

In [9]:
# Data Handling
import pandas as pd

# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# EDA
from ydata_profiling import ProfileReport

## Data Understanding

The table below shows a detailed description of what the columns in the dataset contain

### Data Fields

| Column   Name                | Description                                                                                                                                                                                                  |
|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ID                           | Unique number to represent patient ID                                                                                                                                                                        |
| PRG           |  Plasma glucose|
| PL               |   Blood Work Result-1 (mu U/ml)                                                                                                                                                |
| PR              | Blood Pressure (mm Hg)|
| SK              | Blood Work Result-2 (mm)|
| TS             |     Blood Work Result-3 (mu U/ml)|                                                                                  
| M11     |  Body mass index (weight in kg/(height in m)^2|
| BD2             |   Blood Work Result-4 (mu U/ml)|
| Age              |    patients age  (years)|
| Insurance | If a patient holds a valid insurance card|
| Sepssis                 | Positive: if a patient in ICU will develop a sepsis , and Negative: otherwise |

### Data Loading

In [6]:
train_data = pd.read_csv('../data/Paitients_Files_Train.csv')
train_data.head()

Unnamed: 0,ID,PRG,PL,PR,SK,TS,M11,BD2,Age,Insurance,Sepssis
0,ICU200010,6,148,72,35,0,33.6,0.627,50,0,Positive
1,ICU200011,1,85,66,29,0,26.6,0.351,31,0,Negative
2,ICU200012,8,183,64,0,0,23.3,0.672,32,1,Positive
3,ICU200013,1,89,66,23,94,28.1,0.167,21,1,Negative
4,ICU200014,0,137,40,35,168,43.1,2.288,33,1,Positive


### Exploratory Data Analysis (EDA)

In [12]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 599 entries, 0 to 598
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ID         599 non-null    object 
 1   PRG        599 non-null    int64  
 2   PL         599 non-null    int64  
 3   PR         599 non-null    int64  
 4   SK         599 non-null    int64  
 5   TS         599 non-null    int64  
 6   M11        599 non-null    float64
 7   BD2        599 non-null    float64
 8   Age        599 non-null    int64  
 9   Insurance  599 non-null    int64  
 10  Sepssis    599 non-null    object 
dtypes: float64(2), int64(7), object(2)
memory usage: 51.6+ KB


In [7]:
train_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
PRG,599.0,3.824708,3.362839,0.0,1.0,3.0,6.0,17.0
PL,599.0,120.153589,32.682364,0.0,99.0,116.0,140.0,198.0
PR,599.0,68.732888,19.335675,0.0,64.0,70.0,80.0,122.0
SK,599.0,20.562604,16.017622,0.0,0.0,23.0,32.0,99.0
TS,599.0,79.460768,116.576176,0.0,0.0,36.0,123.5,846.0
M11,599.0,31.920033,8.008227,0.0,27.1,32.0,36.55,67.1
BD2,599.0,0.481187,0.337552,0.078,0.248,0.383,0.647,2.42
Age,599.0,33.290484,11.828446,21.0,24.0,29.0,40.0,81.0
Insurance,599.0,0.686144,0.464447,0.0,0.0,1.0,1.0,1.0


In [11]:
eda_report = ProfileReport(train_data, title='Profile Report')
eda_report

Summarize dataset: 100%|██████████| 84/84 [01:37<00:00,  1.16s/it, Completed]                 
Generate report structure: 100%|██████████| 1/1 [00:10<00:00, 10.02s/it]
Render HTML: 100%|██████████| 1/1 [00:09<00:00,  9.79s/it]


