In [1]:
# Import the necessary libraries to complete the assignment
import pandas as pd               # for reading and working with data
import statsmodels.api as sm      # for statistical models

# Read the dataset
df = pd.read_csv("Dataset.csv")

# This is the definition of the independent variable (Age) and and then it's added to a constant for the intercept
X = sm.add_constant(df["Age"])    # X will now include a column of 1s (intercept) and the Age values

# Defining the dependent variable
y = df["BP_Reduction"]

# Fitting the linear regression model
model = sm.OLS(y, X).fit()

# Print the regression results:
print(model.params)       # shows the intercept and the slope
print(model.rsquared)     # shows the R-squared value (how well Age explains BP_Reduction)
print(model.pvalues)      # shows the p-values (test significance of the coefficients)


const    13.561623
Age      -0.087472
dtype: float64
0.10906568718052712
const    1.563851e-15
Age      7.917602e-04
dtype: float64


In [None]:
### Simple Linear Regression Results

Intercept (constant): 13.5616  
Regression coefficient for Age: -0.0875  
R-squared value: 0.1091  
p-value for Age: 0.00079


In [None]:
#interpretation
The results show that age has a statistically significant effect on blood pressure reduction because 
the p-value is 0.00079, which is below 0.05.  
The coefficient for age is -0.087, so as people get older, their blood pressure reduction tends to go down 
a little (by about 0.087 units per year).  
But the R-squared is only 0.11, which means age only explains around 11% of the changes in blood pressure reduction.  
So even though the effect is statistically significant, it has not a very strong effect in practice.


In [3]:
#3b 

import pandas as pd
import statsmodels.formula.api as smf

# Fit a multiple linear regression model:
# Dependent variable: BP_Reduction
# Independent variables: Age, BMI, and Treatment_Group (categorical)
model = smf.ols('BP_Reduction ~ Age + BMI + C(Treatment_Group)', data=df).fit()

#  Print the summary of the regression model to see:
# - Regression coefficients
# - P-values (for significance testing)
# - R-squared value (how well the model explains the data)
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:           BP_Reduction   R-squared:                       0.520
Model:                            OLS   Adj. R-squared:                  0.500
Method:                 Least Squares   F-statistic:                     25.73
Date:                Fri, 23 May 2025   Prob (F-statistic):           1.85e-14
Time:                        20:32:57   Log-Likelihood:                -234.50
No. Observations:                 100   AIC:                             479.0
Df Residuals:                      95   BIC:                             492.0
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
Intercept     

In [None]:
### Report: Coefficients, R-squared, and Significance Levels

We fitted a multiple linear regression model where BP_Reduction is predicted by Age, BMI, and Treatment_Group.

#### Coefficients:
- Intercept: 6.067  
- Drug B vs Drug A: +2.4437  
- Placebo vs Drug A: −2.7394  
- Age: −0.0646  
- BMI: +0.2339  

#### R-squared:
- The R-squared value is 0.52, meaning the model explains 52% of the variation in blood pressure reduction.

#### Significance Levels (p-values):
- All predictors are statistically significant (p < 0.05):
  - Drug B: p = 0.000
  - Placebo: p = 0.000
  - Age: p = 0.001
  - BMI: p = 0.000

This suggests that all included variables have a significant effect on blood pressure reduction.


In [None]:
All included variables are statistically significant predictors of blood pressure reduction because their p-values are below 0.05:
  - Drug B (vs. Drug A): p = 0.000
  - Placebo (vs. Drug A): p = 0.000
  - Age: p = 0.001
  - BMI: p = 0.000



In [None]:
### Comparison with ANOVA Results

The regression results support the findings from the ANOVA test.  
In both models, the treatment group has a significant effect on blood pressure reduction.

From the regression:
- Drug A is the reference group.
- Drug B reduces blood pressure more than Drug A (coefficient = +2.44).
- Placebo reduces blood pressure less than Drug A (coefficient = −2.74).

This matches the ANOVA results, which also showed differences between the groups.  
So, the regression confirms that Drug B has the strongest effect, and Placebo has the weakest.


In [None]:
### Comment on Confounding Variables

Adjusting for variables like Age, BMI, and Gender helps us see the real effect of the treatment.

For example:
- If people of different ages receive the same treatment, age might influence how well it works.
- By including age in the analysis, we can find out if the treatment works better or worse depending on age.

This approach helps us understand whether an outside factor, like age, is affecting the results, and gives a clearer view of how effective the treatment really is.
