### **HYPOTHESIS TESTING**

**Hypothesis 1: Age and Sepsis**

Null Hypothesis (H0): There is no significant difference in age between patients with and without sepsis.

Alternative Hypothesis (H1): There is a significant difference in age between patients with and without sepsis.

T-statistic: 5.254202967191448
P-value: 2.0718778891881853e-07

Interpretation:
The extremely low p-value (p < 0.05) indicates strong evidence against the null hypothesis. 
We reject the null hypothesis and accept the alternative hypothesis. 

This means there is a statistically significant difference in age between patients with and without sepsis. The high t-statistic suggests that this difference is substantial.

Implication:
Age appears to be a significant factor associated with sepsis. This could indicate that certain age groups are more susceptible to sepsis or that age influences the likelihood of developing sepsis.


**Hypothesis 2: Blood Pressure and Sepsis**

Null Hypothesis (H0): There is no significant difference in blood pressure (PR) between patients with and without sepsis.

Alternative Hypothesis (H1): There is a significant difference in blood pressure (PR) between patients with and without sepsis.

Hypothesis 2: Blood Pressure and Sepsis
T-statistic: 1.495353813655633
P-value: 0.1353505282559576

Interpretation:
The p-value (0.135) is greater than the conventional significance level of 0.05. 

We fail to reject the null hypothesis. 

This means we don't have sufficient evidence to conclude that there's a significant difference in blood pressure between patients with and without sepsis.

Implication:
Based on this data, blood pressure (PR) does not appear to be significantly associated with sepsis. However, this doesn't necessarily mean there's no relationship; it just indicates that we couldn't detect a statistically significant difference with this particular dataset and test.

**Hypothesis 3: Insurance and Sepsis**

Null Hypothesis (H0): There is no association between insurance status and the occurrence of sepsis.

Alternative Hypothesis (H1): There is an association between insurance status and the occurrence of sepsis.

Hypothesis 3: Insurance and Sepsis
Chi-square statistic: 2.0712782081677066
P-value: 0.1500956791860619

Interpretation:
The p-value (0.150) is greater than 0.05, meaning we fail to reject the null hypothesis. 
We don't have sufficient evidence to conclude that there's a significant association between insurance status and the occurrence of sepsis.

Implication:
Based on this analysis, insurance status does not appear to be significantly associated with sepsis occurrence. This suggests that having or not having insurance might not directly influence the likelihood of developing sepsis, at least in this dataset.

In [3]:
import pandas as pd
import scipy.stats as stats

# Load the dataset
data = pd.read_csv("F:\\school\\Azubi Africa\\P5-ML-API\\data\\Paitients_Files_Train.csv")

# Hypothesis 1: Age and Sepsis
sepsis_age = data[data['Sepssis'] == 'Positive']['Age']
non_sepsis_age = data[data['Sepssis'] == 'Negative']['Age']
t_stat, p_value = stats.ttest_ind(sepsis_age, non_sepsis_age)
print("Hypothesis 1 - Age and Sepsis:")
print(f"T-statistic: {t_stat}, P-value: {p_value}")

# Hypothesis 2: Blood Pressure and Sepsis
sepsis_bp = data[data['Sepssis'] == 'Positive']['PR']
non_sepsis_bp = data[data['Sepssis'] == 'Negative']['PR']
t_stat, p_value = stats.ttest_ind(sepsis_bp, non_sepsis_bp)
print("\nHypothesis 2 - Blood Pressure and Sepsis:")
print(f"T-statistic: {t_stat}, P-value: {p_value}")

# Hypothesis 3: Insurance and Sepsis
contingency_table = pd.crosstab(data['Insurance'], data['Sepssis'])
chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)
print("\nHypothesis 3 - Insurance and Sepsis:")
print(f"Chi-square statistic: {chi2}, P-value: {p_value}")


Hypothesis 1 - Age and Sepsis:
T-statistic: 5.254202967191448, P-value: 2.0718778891881853e-07

Hypothesis 2 - Blood Pressure and Sepsis:
T-statistic: 1.495353813655633, P-value: 0.1353505282559576

Hypothesis 3 - Insurance and Sepsis:
Chi-square statistic: 2.0712782081677066, P-value: 0.1500956791860619


## **Overall Insights**

Age is a significant factor: The strongest relationship found is between age and sepsis. This could have important implications for patient care and risk assessment.
Blood pressure and insurance are not significant: Neither blood pressure nor insurance status showed a statistically significant relationship with sepsis in this analysis. However, this doesn't mean these factors are irrelevant; they might still play a role that wasn't captured by these specific tests or this particular dataset.
Further investigation needed: While age shows a clear association with sepsis, the nature of this relationship (e.g., which age groups are more at risk) would require further analysis. For blood pressure and insurance, despite not showing significance here, it might be worth exploring these factors with different statistical methods or larger datasets.
Clinical vs. Statistical Significance: Remember that statistical significance doesn't always equate to clinical significance. Even though blood pressure and insurance didn't show statistical significance, they might still be clinically important in sepsis management and prevention.
Limitations: These results are based on the specific dataset and methods used. Other factors not considered here might also play important roles in sepsis occurrence and outcomes.
These findings provide a starting point for understanding factors associated with sepsis in this patient population, with age emerging as a key factor to consider in sepsis risk assessment and management.