# CSE 163
## Final Project
## Statistical Analysis on Heart Disease Data

Importing basic libraries for analysis:

In [1]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

Import dataset, and view data:

In [2]:
heart = pd.read_csv('heart_disease_uci.csv')
heart

Unnamed: 0,id,age,sex,dataset,cp,trestbps,chol,fbs,restecg,thalch,exang,oldpeak,slope,ca,thal,num
0,1,63,Male,Cleveland,typical angina,145.0,233.0,True,lv hypertrophy,150.0,False,2.3,downsloping,0.0,fixed defect,0
1,2,67,Male,Cleveland,asymptomatic,160.0,286.0,False,lv hypertrophy,108.0,True,1.5,flat,3.0,normal,2
2,3,67,Male,Cleveland,asymptomatic,120.0,229.0,False,lv hypertrophy,129.0,True,2.6,flat,2.0,reversable defect,1
3,4,37,Male,Cleveland,non-anginal,130.0,250.0,False,normal,187.0,False,3.5,downsloping,0.0,normal,0
4,5,41,Female,Cleveland,atypical angina,130.0,204.0,False,lv hypertrophy,172.0,False,1.4,upsloping,0.0,normal,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
915,916,54,Female,VA Long Beach,asymptomatic,127.0,333.0,True,st-t abnormality,154.0,False,0.0,,,,1
916,917,62,Male,VA Long Beach,typical angina,,139.0,False,st-t abnormality,,,,,,,0
917,918,55,Male,VA Long Beach,asymptomatic,122.0,223.0,True,st-t abnormality,100.0,False,0.0,,,fixed defect,2
918,919,58,Male,VA Long Beach,asymptomatic,,385.0,True,lv hypertrophy,,,,,,,0


In [3]:
heart.columns

Index(['id', 'age', 'sex', 'dataset', 'cp', 'trestbps', 'chol', 'fbs',
       'restecg', 'thalch', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'],
      dtype='object')

View NA values:

In [4]:
heart.isna().sum()

id            0
age           0
sex           0
dataset       0
cp            0
trestbps     59
chol         30
fbs          90
restecg       2
thalch       55
exang        55
oldpeak      62
slope       309
ca          611
thal        486
num           0
dtype: int64

Here we examind some specific values for thal, which we determined was an import factor in our data:

In [5]:
heart.thal.value_counts()

normal               196
reversable defect    192
fixed defect          46
Name: thal, dtype: int64

In [6]:
heart.shape

(920, 16)

When working with our data, we decided for this statistical analysis to not remove any NA values. We want as much data to go into the below models as possible, and the NA values are not currently affecting our analysis. This just means when considering values such as slope, ca, and thal with lots of NA values that we take their results with a grain of salt.

## Statistical Analysis using the Statsmodels library
### Fitting the entire dataset based on 'num'

To analyze our model, we decided to use the OLS model, or the Ordinary Least Squares model. The OLS model utilizes linear regression to fit the model, and when used with the summary function, we can determine which values are statistically significant in this model. Statistically significant means that the values we get from these variables would be very rare if our null hypothesis was true. Essentially, that they could not just happen randomly so they must be important in our analysis. 

We decided to use the linear regression model, even though our outcomes are categorical. We are using this method because we wanted to analyze which variables are most influential. We are using the values given in the OLS model to determine which are most influential.

From analyzing the dataset, the predictive value is 'num'. Num stands for the predicted stage of heart disease for each patient, and with our analysis we want to see which variables are most influential in predicting these values. In this analysis we leave out location because we want to analyze this variable later.

In [7]:
m = smf.ols("num ~ age + sex + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = heart).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.584
Model:,OLS,Adj. R-squared:,0.557
Method:,Least Squares,F-statistic:,21.82
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,3.3800000000000002e-43
Time:,14:45:31,Log-Likelihood:,-354.72
No. Observations:,299,AIC:,747.4
Df Residuals:,280,BIC:,817.8
Df Model:,18,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0533,0.750,1.403,0.162,-0.424,2.531
sex[T.Male],0.2147,0.118,1.824,0.069,-0.017,0.446
cp[T.atypical angina],-0.2846,0.157,-1.812,0.071,-0.594,0.025
cp[T.non-anginal],-0.4625,0.130,-3.551,0.000,-0.719,-0.206
cp[T.typical angina],-0.6959,0.198,-3.512,0.001,-1.086,-0.306
fbs[T.True],0.0164,0.143,0.114,0.909,-0.266,0.299
restecg[T.normal],-0.1994,0.100,-1.985,0.048,-0.397,-0.002
restecg[T.st-t abnormality],0.5487,0.428,1.283,0.201,-0.293,1.391
exang[T.True],0.1649,0.121,1.360,0.175,-0.074,0.403

0,1,2,3
Omnibus:,21.441,Durbin-Watson:,1.91
Prob(Omnibus):,0.0,Jarque-Bera (JB):,27.359
Skew:,0.552,Prob(JB):,1.15e-06
Kurtosis:,3.988,Cond. No.,5220.0


When we conduct our summary we find 4 significant values, ca, oldpeak, cp(non-aniginal), cp(typical angina). For categorical variables, such as cp, statsmodels automatically makes them into dummy variables because categorical variables would not work in linear regression. 

For our model, we determine statistical significance by looking at p-values and t-values. P-values are probability, which in this summary tells us that probability that this value would occur. We are looking for extremely small values. A p-value of 0.000 means that the probability this value would occur is almost 0%, which tells us it is significant in our model. Additionally, we also need to look at the t-value. The t-value is a very common statistical measure that is created from variability and standard error within the model. For our analysis it is important, the greater the t-value the more evidence there is against the null hypothesis. 

Our 4 values are determined to be significant because their p-values are all very small [0.000 - 0.001]. Additionally, their t-value are all larger than the t-critical value. When analyzing t-values, you should look up the t-critical value on a t-value chart. For our analysis, the t-critical value is 2.101. This is determined by viewing the confidence interval and Df Model values and then matching those values on the t-value table. Our 4 variables are the only ones with t-values greater than the t-critical. This is how we've determined their significance.

Lastly, we can analyze these values by looking at their coefficient (coeff). Firstly we look at the intercept. This tell us num value for our default in our model. The statsmodel library will select the default values automatically, and changing them is very complicated when working with so many variables. Our default values are the categorical variables not listed within the summary table. From the intercept value, the change is the value of each variable. For example, when looking at `oldpeak` we can determine that as `oldpeak` increases by 1 unit, that the predicted level of heart disease will increase by 0.1893. This rise is not as influential as the other variables. The most influential appears to be cp(typical angina). If the patient has typical angina, their stage of heart disease decreases by 0.70, which is almost a whole stage. 

In [8]:
m = smf.ols("num ~ cp + oldpeak + ca", data = heart).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.484
Model:,OLS,Adj. R-squared:,0.476
Method:,Least Squares,F-statistic:,56.94
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,1.25e-41
Time:,14:45:32,Log-Likelihood:,-399.3
No. Observations:,309,AIC:,810.6
Df Residuals:,303,BIC:,833.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6765,0.104,6.505,0.000,0.472,0.881
cp[T.atypical angina],-0.6569,0.155,-4.237,0.000,-0.962,-0.352
cp[T.non-anginal],-0.7917,0.125,-6.342,0.000,-1.037,-0.546
cp[T.typical angina],-0.8635,0.197,-4.389,0.000,-1.251,-0.476
oldpeak,0.3363,0.048,7.025,0.000,0.242,0.430
ca,0.4612,0.058,8.016,0.000,0.348,0.574

0,1,2,3
Omnibus:,27.524,Durbin-Watson:,1.915
Prob(Omnibus):,0.0,Jarque-Bera (JB):,33.397
Skew:,0.698,Prob(JB):,5.6e-08
Kurtosis:,3.805,Cond. No.,8.2


### Analyzing by Location
In this analysis we add in the location data with all the variales, and then we analyze with just the most significant values and location.

In [9]:
heart.dataset.value_counts()

Cleveland        304
Hungary          293
VA Long Beach    200
Switzerland      123
Name: dataset, dtype: int64

In [10]:
m = smf.ols("num ~ age + sex + dataset + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = heart).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.586
Model:,OLS,Adj. R-squared:,0.556
Method:,Least Squares,F-statistic:,19.69
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,3.4299999999999995e-42
Time:,14:45:32,Log-Likelihood:,-353.85
No. Observations:,299,AIC:,749.7
Df Residuals:,278,BIC:,827.4
Df Model:,20,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.1167,0.753,1.484,0.139,-0.365,2.598
sex[T.Male],0.2171,0.118,1.843,0.066,-0.015,0.449
dataset[T.Hungary],-0.9031,0.841,-1.074,0.284,-2.558,0.752
dataset[T.Switzerland],-1.703e-14,1.13e-14,-1.507,0.133,-3.93e-14,5.21e-15
dataset[T.VA Long Beach],-0.5961,0.841,-0.709,0.479,-2.252,1.060
cp[T.atypical angina],-0.2867,0.157,-1.823,0.069,-0.596,0.023
cp[T.non-anginal],-0.4666,0.130,-3.579,0.000,-0.723,-0.210
cp[T.typical angina],-0.7011,0.198,-3.535,0.000,-1.091,-0.311
fbs[T.True],0.0147,0.143,0.103,0.918,-0.268,0.297

0,1,2,3
Omnibus:,20.983,Durbin-Watson:,1.912
Prob(Omnibus):,0.0,Jarque-Bera (JB):,27.287
Skew:,0.534,Prob(JB):,1.19e-06
Kurtosis:,4.024,Cond. No.,4.43e+17


We first analyzed the model by incorporating location into the model to see if those locations are significant in our model. After incorporating them, we have learned than none of the location variables are statistically significant in influencing our stages of heart disease. This does not mean that they do not affect heart disease level, but in this model we cannot eliminate them as random. However, our original 4 most significant variables are still significant, which adds to our confidence in them.

In [11]:
m = smf.ols("num ~ cp + oldpeak + ca + dataset", data = heart).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.488
Model:,OLS,Adj. R-squared:,0.474
Method:,Least Squares,F-statistic:,35.74
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,1.71e-39
Time:,14:45:32,Log-Likelihood:,-398.23
No. Observations:,309,AIC:,814.5
Df Residuals:,300,BIC:,848.1
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6911,0.106,6.533,0.000,0.483,0.899
cp[T.atypical angina],-0.6619,0.156,-4.248,0.000,-0.969,-0.355
cp[T.non-anginal],-0.8065,0.126,-6.411,0.000,-1.054,-0.559
cp[T.typical angina],-0.8885,0.198,-4.491,0.000,-1.278,-0.499
dataset[T.Hungary],-0.3074,0.520,-0.591,0.555,-1.332,0.717
dataset[T.Switzerland],0.3007,0.408,0.737,0.462,-0.502,1.104
dataset[T.VA Long Beach],-0.7018,0.637,-1.102,0.271,-1.955,0.551
oldpeak,0.3404,0.048,7.061,0.000,0.246,0.435
ca,0.4473,0.059,7.638,0.000,0.332,0.563

0,1,2,3
Omnibus:,26.909,Durbin-Watson:,1.937
Prob(Omnibus):,0.0,Jarque-Bera (JB):,32.62
Skew:,0.685,Prob(JB):,8.25e-08
Kurtosis:,3.812,Cond. No.,24.1


We are now going analyze our location by filtering our data into the US and Europe. To complete our model, we need our observations to be abundant, which is why we were unable to seperate them by individual location. 

In [12]:
us = heart[(heart.dataset == 'VA Long Beach') | (heart.dataset == 'Cleveland')]
europe = heart[(heart.dataset == 'Switzerland') | (heart.dataset == 'Hungary')]

In [13]:
us.shape, europe.shape

((504, 16), (416, 16))

In [14]:
m = smf.ols("num ~ age + sex + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = us).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.585
Model:,OLS,Adj. R-squared:,0.559
Method:,Least Squares,F-statistic:,21.89
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,2.9800000000000002e-43
Time:,14:45:32,Log-Likelihood:,-353.44
No. Observations:,298,AIC:,744.9
Df Residuals:,279,BIC:,815.1
Df Model:,18,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0953,0.751,1.458,0.146,-0.384,2.574
sex[T.Male],0.2177,0.118,1.850,0.065,-0.014,0.449
cp[T.atypical angina],-0.2855,0.157,-1.817,0.070,-0.595,0.024
cp[T.non-anginal],-0.4641,0.130,-3.564,0.000,-0.720,-0.208
cp[T.typical angina],-0.6981,0.198,-3.524,0.000,-1.088,-0.308
fbs[T.True],0.0149,0.143,0.104,0.917,-0.267,0.297
restecg[T.normal],-0.1933,0.101,-1.922,0.056,-0.391,0.005
restecg[T.st-t abnormality],0.5461,0.428,1.277,0.203,-0.296,1.388
exang[T.True],0.1668,0.121,1.376,0.170,-0.072,0.405

0,1,2,3
Omnibus:,21.197,Durbin-Watson:,1.913
Prob(Omnibus):,0.0,Jarque-Bera (JB):,27.282
Skew:,0.544,Prob(JB):,1.19e-06
Kurtosis:,4.006,Cond. No.,5220.0


After conducting our analysis on the US, we started to work on the Europe analysis but we were unable to complete it. We did not have as many observations in the Europe dataframe and we ran into some unexpected error. This is a limitation in our analysis, but through the model fit to the US data we've determined that our 4 variables are still significant even when the data is filtered down to just US data.

### Analyzing by Gender
In this analysis we seperate the data by gender, and in this dataset we only have two values. We then want to examine which are the most significant values for each gender. We're wondering if the statistically significant values change depending on gender, and if it is possible that one gender is skewing the data, especially with the differing number of men and women.

In [15]:
female = heart[heart.sex == 'Female']
male = heart[heart.sex == 'Male']

In [16]:
female.shape, male.shape

((194, 16), (726, 16))

In [17]:
m = smf.ols("num ~ age + sex + dataset + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = female).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.789
Model:,OLS,Adj. R-squared:,0.743
Method:,Least Squares,F-statistic:,17.17
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,1.02e-19
Time:,14:45:33,Log-Likelihood:,-65.278
No. Observations:,96,AIC:,166.6
Df Residuals:,78,BIC:,212.7
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.1511,1.151,0.131,0.896,-2.140,2.442
dataset[T.Hungary],-1.507e-14,1.02e-13,-0.147,0.883,-2.19e-13,1.89e-13
dataset[T.Switzerland],-3e-15,5.17e-15,-0.580,0.563,-1.33e-14,7.29e-15
dataset[T.VA Long Beach],2.216e-15,1.54e-15,1.439,0.154,-8.5e-16,5.28e-15
cp[T.atypical angina],-0.3333,0.173,-1.922,0.058,-0.679,0.012
cp[T.non-anginal],-0.4553,0.151,-3.013,0.003,-0.756,-0.154
cp[T.typical angina],-0.6486,0.312,-2.079,0.041,-1.270,-0.027
fbs[T.True],0.3791,0.192,1.970,0.052,-0.004,0.762
restecg[T.normal],-0.1163,0.122,-0.957,0.341,-0.358,0.126

0,1,2,3
Omnibus:,17.736,Durbin-Watson:,2.334
Prob(Omnibus):,0.0,Jarque-Bera (JB):,25.196
Skew:,0.851,Prob(JB):,3.38e-06
Kurtosis:,4.845,Cond. No.,1.1e+19


In [18]:
m = smf.ols("num ~ age + sex + dataset + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = male).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.519
Model:,OLS,Adj. R-squared:,0.469
Method:,Least Squares,F-statistic:,10.37
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,2.24e-20
Time:,14:45:33,Log-Likelihood:,-261.36
No. Observations:,203,AIC:,562.7
Df Residuals:,183,BIC:,629.0
Df Model:,19,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.3544,1.013,1.337,0.183,-0.644,3.353
dataset[T.Hungary],-1.0090,0.958,-1.053,0.294,-2.899,0.881
dataset[T.Switzerland],8.277e-15,5.97e-15,1.386,0.168,-3.51e-15,2.01e-14
dataset[T.VA Long Beach],-0.7691,0.969,-0.794,0.428,-2.680,1.142
cp[T.atypical angina],-0.3249,0.224,-1.451,0.148,-0.767,0.117
cp[T.non-anginal],-0.5286,0.186,-2.845,0.005,-0.895,-0.162
cp[T.typical angina],-0.7574,0.253,-2.994,0.003,-1.257,-0.258
fbs[T.True],-0.0631,0.194,-0.325,0.746,-0.446,0.320
restecg[T.normal],-0.2074,0.140,-1.487,0.139,-0.483,0.068

0,1,2,3
Omnibus:,8.662,Durbin-Watson:,2.032
Prob(Omnibus):,0.013,Jarque-Bera (JB):,8.58
Skew:,0.45,Prob(JB):,0.0137
Kurtosis:,3.453,Cond. No.,8.07e+17


When analyzing by age, we were able to see variation in our influential variables. For women, we found a unique analysis in that they only had 3 siginificant values within their summary table. They kept the `ca` and `cp non-anginal`, but they also are influenced by `restecg[T.st-t abnormality]` which is new. In our previous analysis this value was not significant. For women this value is the most influential. When `restecg[T.st-t abnormality]` is increased by one unit, the level of heart disease is increased by 0.8197 which is almost increases by another stage of heart disease.

For men, we found that the men had the same 4 most significant values. This does not surprise us because men dominate this data set about 7:1, so there is limited information on women.

## Analyzing by Age
In this analysis, we decided to divide the data between less than 55, and 55 and above. We decided to make this distinction due to the size of the data. In order for the model to fit, we needed to have a good number of observations for our dataset. We initially wanted to seperate by decade of age, but that made our dataframes too small and unable to analyze. We reached a value of 55 by finding the median age in the data, which was 54, and rounding it to a nice number.

In [19]:
heart.age.min(), heart.age.max()

(28, 77)

In [20]:
heart.age.median()

54.0

In [21]:
heart.age.value_counts()

54    51
58    43
55    41
56    38
57    38
52    36
62    35
51    35
59    35
53    33
60    32
61    31
48    31
63    30
50    25
41    24
46    24
43    24
64    22
49    22
65    21
44    19
47    19
45    18
42    18
38    16
67    15
39    15
69    13
40    13
66    13
35    11
37    11
68    10
34     7
70     7
74     7
36     6
32     5
71     5
72     4
29     3
75     3
31     2
33     2
76     2
77     2
30     1
28     1
73     1
Name: age, dtype: int64

In [22]:
under55 = heart[heart.age < 55]
above55 = heart[heart.age >= 55]

In [25]:
under55.shape, above55.shape

((472, 16), (448, 16))

In [23]:
m = smf.ols("num ~ age + sex + dataset + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = under55).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.648
Model:,OLS,Adj. R-squared:,0.595
Method:,Least Squares,F-statistic:,12.25
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,1.63e-19
Time:,14:45:33,Log-Likelihood:,-132.42
No. Observations:,139,AIC:,302.8
Df Residuals:,120,BIC:,358.6
Df Model:,18,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.8721,1.064,1.760,0.081,-0.234,3.978
sex[T.Male],0.1644,0.148,1.112,0.268,-0.128,0.457
dataset[T.Hungary],-0.9490,0.726,-1.307,0.194,-2.387,0.489
dataset[T.Switzerland],-9.834e-15,5.39e-15,-1.825,0.071,-2.05e-14,8.36e-16
dataset[T.VA Long Beach],-1.44e-14,7.12e-15,-2.023,0.045,-2.85e-14,-3.09e-16
cp[T.atypical angina],-0.3627,0.185,-1.960,0.052,-0.729,0.004
cp[T.non-anginal],-0.4764,0.163,-2.923,0.004,-0.799,-0.154
cp[T.typical angina],-0.6543,0.272,-2.402,0.018,-1.194,-0.115
fbs[T.True],-0.3338,0.204,-1.636,0.105,-0.738,0.070

0,1,2,3
Omnibus:,17.521,Durbin-Watson:,1.892
Prob(Omnibus):,0.0,Jarque-Bera (JB):,28.345
Skew:,0.622,Prob(JB):,7e-07
Kurtosis:,4.829,Cond. No.,1.42e+22


In [24]:
m = smf.ols("num ~ age + sex + dataset + cp + trestbps + chol + fbs + restecg + thalch + exang + oldpeak + slope + ca + thal", data = above55).fit()
m.summary()

0,1,2,3
Dep. Variable:,num,R-squared:,0.556
Model:,OLS,Adj. R-squared:,0.496
Method:,Least Squares,F-statistic:,9.235
Date:,"Mon, 13 Mar 2023",Prob (F-statistic):,1.06e-16
Time:,14:45:34,Log-Likelihood:,-202.55
No. Observations:,160,AIC:,445.1
Df Residuals:,140,BIC:,506.6
Df Model:,19,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.1936,1.530,0.780,0.437,-1.831,4.218
sex[T.Male],0.3514,0.186,1.888,0.061,-0.017,0.719
dataset[T.Hungary],4.2e-15,3.7e-15,1.135,0.258,-3.12e-15,1.15e-14
dataset[T.Switzerland],1.293e-14,1.41e-14,0.915,0.362,-1.5e-14,4.09e-14
dataset[T.VA Long Beach],-0.3261,0.958,-0.340,0.734,-2.220,1.568
cp[T.atypical angina],-0.2937,0.258,-1.138,0.257,-0.804,0.217
cp[T.non-anginal],-0.3180,0.216,-1.476,0.142,-0.744,0.108
cp[T.typical angina],-0.8758,0.294,-2.974,0.003,-1.458,-0.294
fbs[T.True],0.1707,0.209,0.816,0.416,-0.243,0.584

0,1,2,3
Omnibus:,4.196,Durbin-Watson:,1.814
Prob(Omnibus):,0.123,Jarque-Bera (JB):,3.797
Skew:,0.367,Prob(JB):,0.15
Kurtosis:,3.177,Cond. No.,4.09e+18


For our age analysis, we found that patients under 55 exhibit the same 4 most statistically significant values when we conduct our OLS model summary. However, for patients 55 and above, they only have 2 significant values `ca` and `cp[T.typical angina]`. The other two values we've been seeing in other models cannot be ruled out as random in our 55 and above summary table.