# Import Packages

In [1]:
library(dplyr)
library(tidyr)
library(IDPmisc)
library(ggplot2)
library(repr) # options() to change size of plot image
library(gridExtra) # side-by-side plots


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



Attaching package: ‘gridExtra’


The following object is masked from ‘package:dplyr’:

    combine




# Import Data

* The data was retrieved from the (BioLINCC)[https://biolincc.nhlbi.nih.gov/teaching/]
* Data from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts which was collected during three examination periods, approximately 6 years apart, from roughly 1956 to 1968
* Missing values in the dataset are indicated by a period (.)
* Disclaimer: This is teaching data that has been rendered anonymous through the application of certain statistical processes such as permutations and/or random visit selection. We cannot claim or imply that any inferences derived from the teaching datasets are valid estimates

## Data Dictionary

|Variable|Description|Levels (if applicable)/<br> Notes|
| --- | --- |---|
|RANDID | Unique identification number for each participant||
|SEX | Participant sex |1=Men <br> 2=Women|
|PERIOD|Examination Cycle|1=Period 1 <br> 2=Period 2 <br> 3=Period 3|
|TIME| Number of days since baseline exam||
|AGE|AGE at exam (years)||
|SYSBP|Systolic Blood Pressure (mean of last two of three <br> measurements) (mmHg)|
|DIABP|Diastolic Blood Pressure (mean of last two of three <br> measurements) (mmHg)|
|BPMEDS|Use of Anti-hypertensive medication at exam|0=Not currently used <br> 1=Current Use|
|CURSMOKE|Current cigarette smoking at exam|0=Not current smoker <br> 1=Current smoker|
|CIGPDAY|Number of cigarettes smoked each day|0=Not current smoker <br> 1-90 cigarettes per day|
|TOTCHOL|Serum Total Cholesterol (mg/dL)||
|HDLC|High Density Lipoprotein Cholesterol (mg/dL)|available for period 3 only|
|LDLC|Low Density Lipoprotein Cholesterol (mg/dL)|available for period 3 only|
|BMI|Body Mass Index, weight in kilograms/height <br> meters squared||
|GLUCOSE|Casual serum GLUCOSE (mg/dL)||
|DIABETES|Diabetic according to criteria of first exam <br> treated or first exam with casual GLUCOSE <br> of 200 mg/dL or more|0=Not a diabetic <br> 1=Diabetic|
|HEARTRTE|Heart rate (Ventricular rate) in beats/min|
|PREVHYP|Prevalent Hypertensive. Subject was defined as <br> hypertensive if treated or if second exam at <br>which mean systolic was >=140 mmHg or mean<br>  Diastolic >=90 mmHg|0=Free of disease <br> 1=Prevalent disease|
|ANYCHD|Angina Pectoris, <br> Myocardial infarction (Hospitalized and silent or unrecognized), <br>Coronary Insufficiency (Unstable Angina), <br> or Fatal Coronary Heart Disease|0=Event did not occur during follow up <br> 1=Event occurred during followup|
|STROKE|Atherothrombotic infarction, <br>Cerebral Embolism, <br>Intracerebral HemorrhAGE, <br> or Subarachnoid HemorrhAGE, <br> or Fatal Cerebrovascular Disease|0=Event did not occur during follow up <br> 1=Event occurred during followup|
|DEATH|Death from any cause|0=Event did not occur during follow up <br> 1=Event occurred during followup|

In [2]:
framingham = read.csv('/Users/silviacatalina/Google Drive/BethelTech/GitHub/wozU-DataSci/DS0110-FinalProject/Framingham/Data/csv/frmgham2.csv')
head(framingham)

Unnamed: 0_level_0,RANDID,SEX,TOTCHOL,AGE,SYSBP,DIABP,CURSMOKE,CIGPDAY,BMI,DIABETES,⋯,CVD,HYPERTEN,TIMEAP,TIMEMI,TIMEMIFC,TIMECHD,TIMESTRK,TIMECVD,TIMEDTH,TIMEHYP
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,2448,1,195,39,106.0,70.0,0,0,26.97,0,⋯,1,0,8766,6438,6438,6438,8766,6438,8766,8766
2,2448,1,209,52,121.0,66.0,0,0,,0,⋯,1,0,8766,6438,6438,6438,8766,6438,8766,8766
3,6238,2,250,46,121.0,81.0,0,0,28.73,0,⋯,0,0,8766,8766,8766,8766,8766,8766,8766,8766
4,6238,2,260,52,105.0,69.5,0,0,29.43,0,⋯,0,0,8766,8766,8766,8766,8766,8766,8766,8766
5,6238,2,237,58,108.0,66.0,0,0,28.5,0,⋯,0,0,8766,8766,8766,8766,8766,8766,8766,8766
6,9428,1,245,48,127.5,80.0,1,20,25.34,0,⋯,0,0,8766,8766,8766,8766,8766,8766,8766,8766


In [3]:
str(framingham)

'data.frame':	11627 obs. of  39 variables:
 $ RANDID  : int  2448 2448 6238 6238 6238 9428 9428 10552 10552 11252 ...
 $ SEX     : int  1 1 2 2 2 1 1 2 2 2 ...
 $ TOTCHOL : int  195 209 250 260 237 245 283 225 232 285 ...
 $ AGE     : int  39 52 46 52 58 48 54 61 67 46 ...
 $ SYSBP   : num  106 121 121 105 108 ...
 $ DIABP   : num  70 66 81 69.5 66 80 89 95 109 84 ...
 $ CURSMOKE: int  0 0 0 0 0 1 1 1 1 1 ...
 $ CIGPDAY : int  0 0 0 0 0 20 30 30 20 23 ...
 $ BMI     : num  27 NA 28.7 29.4 28.5 ...
 $ DIABETES: int  0 0 0 0 0 0 0 0 0 0 ...
 $ BPMEDS  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ HEARTRTE: int  80 69 95 80 80 75 75 65 60 85 ...
 $ GLUCOSE : int  77 92 76 86 71 70 87 103 89 85 ...
 $ educ    : int  4 4 2 2 2 1 1 3 3 3 ...
 $ PREVCHD : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PREVAP  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PREVMI  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PREVSTRK: int  0 0 0 0 0 0 0 0 0 0 ...
 $ PREVHYP : int  0 0 0 0 0 0 0 1 1 0 ...
 $ TIME    : int  0 4628 0 2156 4344 0 2199 0 1977 0 ...

# Data Wrangling

## Select Columns of interest

In [4]:
keep=c('RANDID', 'SEX', 'PERIOD', 'AGE', 'SYSBP', 'DIABP', 'BPMEDS', 'CURSMOKE', 'CIGPDAY', 'TOTCHOL', 
       'BMI', 'GLUCOSE', 'DIABETES', 'HEARTRTE', 'PREVHYP', 'ANYCHD', 'STROKE', 'DEATH')
framingham1 = framingham[keep]
str(framingham1)

'data.frame':	11627 obs. of  18 variables:
 $ RANDID  : int  2448 2448 6238 6238 6238 9428 9428 10552 10552 11252 ...
 $ SEX     : int  1 1 2 2 2 1 1 2 2 2 ...
 $ PERIOD  : int  1 3 1 2 3 1 2 1 2 1 ...
 $ AGE     : int  39 52 46 52 58 48 54 61 67 46 ...
 $ SYSBP   : num  106 121 121 105 108 ...
 $ DIABP   : num  70 66 81 69.5 66 80 89 95 109 84 ...
 $ BPMEDS  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ CURSMOKE: int  0 0 0 0 0 1 1 1 1 1 ...
 $ CIGPDAY : int  0 0 0 0 0 20 30 30 20 23 ...
 $ TOTCHOL : int  195 209 250 260 237 245 283 225 232 285 ...
 $ BMI     : num  27 NA 28.7 29.4 28.5 ...
 $ GLUCOSE : int  77 92 76 86 71 70 87 103 89 85 ...
 $ DIABETES: int  0 0 0 0 0 0 0 0 0 0 ...
 $ HEARTRTE: int  80 69 95 80 80 75 75 65 60 85 ...
 $ PREVHYP : int  0 0 0 0 0 0 0 1 1 0 ...
 $ ANYCHD  : int  1 1 0 0 0 0 0 0 0 0 ...
 $ STROKE  : int  0 0 0 0 0 0 0 1 1 0 ...
 $ DEATH   : int  0 0 0 0 0 0 0 1 1 0 ...


## Split Dataset into Periods 1, 2, 3; Remove NAs

In [5]:
framinghamP1 = framingham1 %>% filter(PERIOD == 1)
framinghamP1 = subset(framinghamP1, select  = - PERIOD)
framinghamP1 = NaRV.omit(framinghamP1)
head(framinghamP1)

Unnamed: 0_level_0,RANDID,SEX,AGE,SYSBP,DIABP,BPMEDS,CURSMOKE,CIGPDAY,TOTCHOL,BMI,GLUCOSE,DIABETES,HEARTRTE,PREVHYP,ANYCHD,STROKE,DEATH
Unnamed: 0_level_1,<int>,<int>,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,2448,1,39,106.0,70,0,0,0,195,26.97,77,0,80,0,1,0,0
2,6238,2,46,121.0,81,0,0,0,250,28.73,76,0,95,0,0,0,0
3,9428,1,48,127.5,80,0,1,20,245,25.34,70,0,75,0,0,0,0
4,10552,2,61,150.0,95,0,1,30,225,28.58,103,0,65,1,0,1,1
5,11252,2,46,130.0,84,0,1,23,285,23.1,85,0,85,0,0,0,0
6,11263,2,43,180.0,110,0,0,0,228,30.3,99,0,77,1,1,0,0


In [6]:
framinghamP2 = framingham1 %>% filter(PERIOD == 2)
framinghamP2 = subset(framinghamP2, select  = - PERIOD)
framinghamP2 = NaRV.omit(framinghamP2)
head(framinghamP2)

Unnamed: 0_level_0,RANDID,SEX,AGE,SYSBP,DIABP,BPMEDS,CURSMOKE,CIGPDAY,TOTCHOL,BMI,GLUCOSE,DIABETES,HEARTRTE,PREVHYP,ANYCHD,STROKE,DEATH
Unnamed: 0_level_1,<int>,<int>,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,6238,2,52,105,69.5,0,0,0,260,29.43,86,0,80,0,0,0,0
2,9428,1,54,141,89.0,0,1,30,283,25.34,87,0,75,0,0,0,0
3,10552,2,67,183,109.0,0,1,20,232,30.18,89,0,60,1,0,1,1
4,11252,2,51,109,77.0,0,1,30,343,23.48,72,0,90,0,0,0,0
5,11263,2,49,177,102.0,1,0,0,230,31.36,86,0,120,1,1,0,0
6,12629,2,70,149,81.0,0,0,0,220,36.76,98,0,80,1,1,0,0


In [7]:
framinghamP3 = framingham1 %>% filter(PERIOD == 3)
framinghamP3 = subset(framinghamP3, select  = - PERIOD)
framinghamP3 = NaRV.omit(framinghamP3)
head(framinghamP3)

Unnamed: 0_level_0,RANDID,SEX,AGE,SYSBP,DIABP,BPMEDS,CURSMOKE,CIGPDAY,TOTCHOL,BMI,GLUCOSE,DIABETES,HEARTRTE,PREVHYP,ANYCHD,STROKE,DEATH
Unnamed: 0_level_1,<int>,<int>,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
2,6238,2,58,108,66,0,0,0,237,28.5,71,0,80,0,0,0,0
4,11263,2,55,180,106,1,0,0,220,31.17,81,1,86,1,1,0,0
5,12806,2,57,110,46,0,1,30,320,22.02,87,0,75,0,0,0,0
6,14367,1,64,168,100,0,0,0,280,25.72,82,0,92,1,0,0,0
7,16365,1,55,173,123,1,0,0,211,29.11,85,0,75,1,0,0,0
9,23727,2,53,124,78,0,0,0,159,26.62,135,0,68,1,0,0,1


# <font color = blue> Statistical Analyses, Categorical IVs: Independent Chi Squares </font>
***
## Question setup
### _1. How do demographic and behavioral factors influence the risk for heart disease?_
* DV: __ANYCHD__, categorical variable with 2 levels
* Categorical IVs: __*Independent Chi-Square*__ - Does the risk for heart disease vary by:
    * Gender (**SEX**), 2 levels?
    * __CURSMOKE__, 2 levels?
    
### _2. How do health metrics influence the risk for heart disease?_
* DV: __ANYCHD__, categorical variable with 2 levels
* Categorical IVs: __*Independent Chi-Square*__ - Does the risk for heart disease vary by:
    * __DIABETES__, 2 levels?
    * __PREVHYP__, 2 levels?
    * __STROKE__, 2 levels?

### _3. How do demographic and behavioral factors influence the risk of death?_
* DV: __DEATH__, categorical variable with 2 levels
* Categorical IVs: __*Independent Chi-Square*__ - Does the risk for heart disease vary by:
    * Gender (**SEX**), 2 levels?
    * __CURSMOKE__, 2 levels?
    
### _4. How do health metrics influence risk of death?_
* DV: __DEATH__, categorical variable with 2 levels
* Categorical IVs: __*Independent Chi-Square*__ - Does the risk for heart disease vary by:
    * __DIABETES__, 2 levels?
    * __ANYCHD__, 2 levels?
    * __PREVHYP__, 2 levels?
    * __STROKE__, 2 levels?

## Load libraries

In [8]:
library(gmodels)

## 1. Independent Chi-Square (ANYCHD): __Gender__

In [9]:
CrossTable(framinghamP1$SEX, framinghamP1$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                 | framinghamP1$ANYCHD 
framinghamP1$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |     1133  |      649  |     1782  | 
                 | 1282.387  |  499.613  |           | 
                 |   17.402  |   44.667  |           | 
                 |   63.580% |   36.420% |   45.378% | 
                 |   40.092% |   58.946% |           | 
                 |   28.852% |   16.527% |           | 
                 |   -4.172  |    6.683  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1693  |      452  |     2145  | 
                 | 1543.613  |  601.38

In [10]:
CrossTable(framinghamP2$SEX, framinghamP2$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                 | framinghamP2$ANYCHD 
framinghamP2$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |      959  |      506  |     1465  | 
                 | 1067.869  |  397.131  |           | 
                 |   11.099  |   29.845  |           | 
                 |   65.461% |   34.539% |   44.273% | 
                 |   39.760% |   56.410% |           | 
                 |   28.982% |   15.292% |           | 
                 |   -3.332  |    5.463  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1453  |      391  |     1844  | 
                 | 1344.131  |  499.86

In [11]:
CrossTable(framinghamP3$SEX, framinghamP3$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                 | framinghamP3$ANYCHD 
framinghamP3$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |      664  |      342  |     1006  | 
                 |  748.007  |  257.993  |           | 
                 |    9.435  |   27.354  |           | 
                 |   66.004% |   33.996% |   43.287% | 
                 |   38.426% |   57.383% |           | 
                 |   28.571% |   14.716% |           | 
                 |   -3.072  |    5.230  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1064  |      254  |     1318  | 
                 |  979.993  |  338.00

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of coronary heart disease between genders.

### 3. Post Hoc Analysis and Conclusions
* All of the standardized residuals > 2
* There were more men than expected with coronary heart disease
* There were less women than expected with coronary heart disease

### 4. Conclusion
* There is a statistically significant difference in the incidence of heart disease between genders
* According to this data, being a male puts one at a higher risk for heart disease

## 2. Independent Chi-Square (ANYCHD): __Smoking__

In [12]:
CrossTable(framinghamP1$CURSMOKE, framinghamP1$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                      | framinghamP1$ANYCHD 
framinghamP1$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1457  |      561  |     2018  | 
                      | 1452.220  |  565.780  |           | 
                      |    0.016  |    0.040  |           | 
                      |   72.200% |   27.800% |   51.388% | 
                      |   51.557% |   50.954% |           | 
                      |   37.102% |   14.286% |           | 
                      |    0.125  |   -0.201  |           | 
----------------------|-----------|-----------|-----------|
                    1 |     1369  |    

In [13]:
CrossTable(framinghamP2$CURSMOKE, framinghamP2$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                      | framinghamP2$ANYCHD 
framinghamP2$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1356  |      503  |     1859  | 
                      | 1355.064  |  503.936  |           | 
                      |    0.001  |    0.002  |           | 
                      |   72.942% |   27.058% |   56.180% | 
                      |   56.219% |   56.076% |           | 
                      |   40.979% |   15.201% |           | 
                      |    0.025  |   -0.042  |           | 
----------------------|-----------|-----------|-----------|
                    1 |     1056  |    

In [14]:
CrossTable(framinghamP3$CURSMOKE, framinghamP3$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                      | framinghamP3$ANYCHD 
framinghamP3$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1111  |      406  |     1517  | 
                      | 1127.959  |  389.041  |           | 
                      |    0.255  |    0.739  |           | 
                      |   73.237% |   26.763% |   65.275% | 
                      |   64.294% |   68.121% |           | 
                      |   47.806% |   17.470% |           | 
                      |   -0.505  |    0.860  |           | 
----------------------|-----------|-----------|-----------|
                    1 |      617  |    

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, meaning that the size assumption is met 

### 2. Interpretation of the results
* _p_ value  = >.05 for all periods, so the test is not significant
* According to this data, there seems to be no statistically significant relationship between being a smoker and having heart disease
* Please note that this has been proven to be scientifically incorrect, however the connection is not immediately apparent in this data [[source: CDC](https://www.cdc.gov/tobacco/campaign/tips/diseases/heart-disease-stroke.html)]

## 3. Independent Chi-Square (ANYCHD): __Diabetes__

In [15]:
CrossTable(framinghamP1$DIABETES, framinghamP1$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                      | framinghamP1$ANYCHD 
framinghamP1$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     2776  |     1039  |     3815  | 
                      | 2745.401  | 1069.599  |           | 
                      |    0.341  |    0.875  |           | 
                      |   72.765% |   27.235% |   97.148% | 
                      |   98.231% |   94.369% |           | 
                      |   70.690% |   26.458% |           | 
                      |    0.584  |   -0.936  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       50  |    

In [16]:
CrossTable(framinghamP2$DIABETES, framinghamP2$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                      | framinghamP2$ANYCHD 
framinghamP2$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     2358  |      825  |     3183  | 
                      | 2320.156  |  862.844  |           | 
                      |    0.617  |    1.660  |           | 
                      |   74.081% |   25.919% |   96.192% | 
                      |   97.761% |   91.973% |           | 
                      |   71.260% |   24.932% |           | 
                      |    0.786  |   -1.288  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       54  |    

In [17]:
CrossTable(framinghamP3$DIABETES, framinghamP3$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                      | framinghamP3$ANYCHD 
framinghamP3$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1642  |      517  |     2159  | 
                      | 1605.315  |  553.685  |           | 
                      |    0.838  |    2.431  |           | 
                      |   76.054% |   23.946% |   92.900% | 
                      |   95.023% |   86.745% |           | 
                      |   70.654% |   22.246% |           | 
                      |    0.916  |   -1.559  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       86  |    

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of heart disease between people with diabetes and people without diabetes.

### 3. Post Hoc Analysis and Conclusions
* Two of the standardized residuals > 2, for each period
* There were more people with diabetes than expected that also had coronary heart disease

### 4. Conclusion
* There is a statistically significant difference in the incidence of heart disease between people with diabetes and people without diabetes
* According to this data, having diabetes puts one at a higher risk for heart disease

## 4. Independent Chi-Square (ANYCHD): __Hypertension__

In [18]:
CrossTable(framinghamP1$PREVHYP, framinghamP1$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                     | framinghamP1$ANYCHD 
framinghamP1$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |     2052  |      602  |     2654  | 
                     | 1909.907  |  744.093  |           | 
                     |   10.571  |   27.134  |           | 
                     |   77.317% |   22.683% |   67.583% | 
                     |   72.611% |   54.678% |           | 
                     |   52.254% |   15.330% |           | 
                     |    3.251  |   -5.209  |           | 
---------------------|-----------|-----------|-----------|
                   1 |      774  |      499  |    

In [19]:
CrossTable(framinghamP2$PREVHYP, framinghamP2$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                     | framinghamP2$ANYCHD 
framinghamP2$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |     1346  |      318  |     1664  | 
                     | 1212.925  |  451.075  |           | 
                     |   14.600  |   39.260  |           | 
                     |   80.889% |   19.111% |   50.287% | 
                     |   55.804% |   35.452% |           | 
                     |   40.677% |    9.610% |           | 
                     |    3.821  |   -6.266  |           | 
---------------------|-----------|-----------|-----------|
                   1 |     1066  |      579  |    

In [20]:
CrossTable(framinghamP3$PREVHYP, framinghamP3$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                     | framinghamP3$ANYCHD 
framinghamP3$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |      808  |      155  |      963  | 
                     |  716.034  |  246.966  |           | 
                     |   11.812  |   34.246  |           | 
                     |   83.904% |   16.096% |   41.437% | 
                     |   46.759% |   26.007% |           | 
                     |   34.768% |    6.670% |           | 
                     |    3.437  |   -5.852  |           | 
---------------------|-----------|-----------|-----------|
                   1 |      920  |      441  |    

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of heart disease between people with hypertension and people with normal blood pressure.

### 3. Post Hoc Analysis and Conclusions
* All of the standardized residuals > 2, for all periods
* There were more people with hypertension than expected that also had heart disease
* There were less people with normal blood pressure than expected that also had heart disease

### 4. Conclusion
* There is a statistically significant difference in the incidence of heart disease between people with hypertension and people without hypertension
* According to this data, having high blood pressure puts one at a higher risk for heart disease

## 5. Independent Chi-Square (ANYCHD): __Stroke__

In [21]:
CrossTable(framinghamP1$STROKE, framinghamP1$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                    | framinghamP1$ANYCHD 
framinghamP1$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     2610  |      947  |     3557  | 
                    | 2559.736  |  997.264  |           | 
                    |    0.987  |    2.533  |           | 
                    |   73.376% |   26.624% |   90.578% | 
                    |   92.357% |   86.013% |           | 
                    |   66.463% |   24.115% |           | 
                    |    0.993  |   -1.592  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      216  |      154  |      370  | 
  

In [22]:
CrossTable(framinghamP2$STROKE, framinghamP2$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                    | framinghamP2$ANYCHD 
framinghamP2$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     2230  |      766  |     2996  | 
                    | 2183.848  |  812.152  |           | 
                    |    0.975  |    2.623  |           | 
                    |   74.433% |   25.567% |   90.541% | 
                    |   92.454% |   85.396% |           | 
                    |   67.392% |   23.149% |           | 
                    |    0.988  |   -1.619  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      182  |      131  |      313  | 
  

In [23]:
CrossTable(framinghamP3$STROKE, framinghamP3$ANYCHD, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                    | framinghamP3$ANYCHD 
framinghamP3$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     1613  |      518  |     2131  | 
                    | 1584.496  |  546.504  |           | 
                    |    0.513  |    1.487  |           | 
                    |   75.692% |   24.308% |   91.695% | 
                    |   93.345% |   86.913% |           | 
                    |   69.406% |   22.289% |           | 
                    |    0.716  |   -1.219  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      115  |       78  |      193  | 
  

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of heart disease between people who have had a stroke and people who haven't had a stroke.

### 3. Post Hoc Analysis and Conclusions
* Two of the standardized residuals > 2, for each period
* There were more people than expected who have had a stroke and also had heart disease

### 4. Conclusion
* There is a statistically significant difference in the incidence of heart disease between people who have had a stroke when compared with people who haven't had a stroke
* According to this data, having had a stroke puts one at a higher risk for heart disease

## 6. Independent Chi-Square (DEATH): __Gender__

In [24]:
CrossTable(framinghamP1$SEX, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                 | framinghamP1$DEATH 
framinghamP1$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |     1009  |      773  |     1782  | 
                 | 1158.050  |  623.950  |           | 
                 |   19.184  |   35.605  |           | 
                 |   56.622% |   43.378% |   45.378% | 
                 |   39.538% |   56.218% |           | 
                 |   25.694% |   19.684% |           | 
                 |   -4.380  |    5.967  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1543  |      602  |     2145  | 
                 | 1393.950  |  751.050

In [25]:
CrossTable(framinghamP2$SEX, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                 | framinghamP2$DEATH 
framinghamP2$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |      929  |      536  |     1465  | 
                 | 1025.367  |  439.633  |           | 
                 |    9.057  |   21.124  |           | 
                 |   63.413% |   36.587% |   44.273% | 
                 |   40.112% |   53.978% |           | 
                 |   28.075% |   16.198% |           | 
                 |   -3.009  |    4.596  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1387  |      457  |     1844  | 
                 | 1290.633  |  553.367

In [26]:
CrossTable(framinghamP3$SEX, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                 | framinghamP3$DEATH 
framinghamP3$SEX |        0  |        1  | Row Total | 
-----------------|-----------|-----------|-----------|
               1 |      716  |      290  |     1006  | 
                 |  788.697  |  217.303  |           | 
                 |    6.701  |   24.320  |           | 
                 |   71.173% |   28.827% |   43.287% | 
                 |   39.297% |   57.769% |           | 
                 |   30.809% |   12.478% |           | 
                 |   -2.589  |    4.932  |           | 
-----------------|-----------|-----------|-----------|
               2 |     1106  |      212  |     1318  | 
                 | 1033.303  |  284.697

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of death between genders.

### 3. Post Hoc Analysis and Conclusions
* All of the standardized residuals > 2, for all periods
* There were more men than expected who died
* There were less women than expected who died

### 4. Conclusion
* There is a statistically significant difference in the incidence of death between genders
* According to this data, being a male puts one at a higher risk for death

## 7. Independent Chi-Square (DEATH): __Smoking__

In [27]:
CrossTable(framinghamP1$CURSMOKE, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                      | framinghamP1$DEATH 
framinghamP1$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1330  |      688  |     2018  | 
                      | 1311.417  |  706.583  |           | 
                      |    0.263  |    0.489  |           | 
                      |   65.907% |   34.093% |   51.388% | 
                      |   52.116% |   50.036% |           | 
                      |   33.868% |   17.520% |           | 
                      |    0.513  |   -0.699  |           | 
----------------------|-----------|-----------|-----------|
                    1 |     1222  |     

In [28]:
CrossTable(framinghamP2$CURSMOKE, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                      | framinghamP2$DEATH 
framinghamP2$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1305  |      554  |     1859  | 
                      | 1301.131  |  557.869  |           | 
                      |    0.012  |    0.027  |           | 
                      |   70.199% |   29.801% |   56.180% | 
                      |   56.347% |   55.791% |           | 
                      |   39.438% |   16.742% |           | 
                      |    0.107  |   -0.164  |           | 
----------------------|-----------|-----------|-----------|
                    1 |     1011  |     

In [29]:
CrossTable(framinghamP3$CURSMOKE, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                      | framinghamP3$DEATH 
framinghamP3$CURSMOKE |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1196  |      321  |     1517  | 
                      | 1189.318  |  327.682  |           | 
                      |    0.038  |    0.136  |           | 
                      |   78.840% |   21.160% |   65.275% | 
                      |   65.642% |   63.944% |           | 
                      |   51.463% |   13.812% |           | 
                      |    0.194  |   -0.369  |           | 
----------------------|-----------|-----------|-----------|
                    1 |      626  |     

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, meaning that the size assumption is met 

### 2. Interpretation of the results
* _p_ value  = >.05 for all periods, so the test is not significant
* According to this data, there seems to be no statistically significant relationship between being a smoker and dying

## 8. Independent Chi-Square (DEATH): __Diabetes__

In [30]:
CrossTable(framinghamP1$DIABETES, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                      | framinghamP1$DEATH 
framinghamP1$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     2528  |     1287  |     3815  | 
                      | 2479.216  | 1335.784  |           | 
                      |    0.960  |    1.782  |           | 
                      |   66.265% |   33.735% |   97.148% | 
                      |   99.060% |   93.600% |           | 
                      |   64.375% |   32.773% |           | 
                      |    0.980  |   -1.335  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       24  |     

In [135]:
CrossTable(framinghamP2$DIABETES, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                      | framinghamP2$DEATH 
framinghamP2$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     2270  |      913  |     3183  | 
                      | 2227.811  |  955.189  |           | 
                      |    0.799  |    1.863  |           | 
                      |   71.316% |   28.684% |   96.192% | 
                      |   98.014% |   91.944% |           | 
                      |   68.601% |   27.591% |           | 
                      |    0.894  |   -1.365  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       46  |     

In [136]:
CrossTable(framinghamP3$DIABETES, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                      | framinghamP3$DEATH 
framinghamP3$DIABETES |        0  |        1  | Row Total | 
----------------------|-----------|-----------|-----------|
                    0 |     1736  |      423  |     2159  | 
                      | 1692.641  |  466.359  |           | 
                      |    1.111  |    4.031  |           | 
                      |   80.408% |   19.592% |   92.900% | 
                      |   95.280% |   84.263% |           | 
                      |   74.699% |   18.201% |           | 
                      |    1.054  |   -2.008  |           | 
----------------------|-----------|-----------|-----------|
                    1 |       86  |     

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of death between people with and without diabetes.

### 3. Post Hoc Analysis and Conclusions
* Two of the standardized residuals > 2 for each period
* There were more people than expected who had diabetes and died
* There were less people than expected who had diabetes and didn't die

### 4. Conclusion
* There is a statistically significant difference in the incidence of death between people with diabetes and people without diabetes
* According to this data, having diabetes puts one at a higher risk for death

## 9. Independent Chi-Square (DEATH): __Hypertension__

In [31]:
CrossTable(framinghamP1$PREVHYP, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                     | framinghamP1$DEATH 
framinghamP1$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |     1961  |      693  |     2654  | 
                     | 1724.728  |  929.272  |           | 
                     |   32.367  |   60.073  |           | 
                     |   73.888% |   26.112% |   67.583% | 
                     |   76.842% |   50.400% |           | 
                     |   49.936% |   17.647% |           | 
                     |    5.689  |   -7.751  |           | 
---------------------|-----------|-----------|-----------|
                   1 |      591  |      682  |     

In [32]:
CrossTable(framinghamP2$PREVHYP, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                     | framinghamP2$DEATH 
framinghamP2$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |     1326  |      338  |     1664  | 
                     | 1164.649  |  499.351  |           | 
                     |   22.354  |   52.136  |           | 
                     |   79.688% |   20.312% |   50.287% | 
                     |   57.254% |   34.038% |           | 
                     |   40.073% |   10.215% |           | 
                     |    4.728  |   -7.221  |           | 
---------------------|-----------|-----------|-----------|
                   1 |      990  |      655  |     

In [33]:
CrossTable(framinghamP3$PREVHYP, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                     | framinghamP3$DEATH 
framinghamP3$PREVHYP |        0  |        1  | Row Total | 
---------------------|-----------|-----------|-----------|
                   0 |      845  |      118  |      963  | 
                     |  754.985  |  208.015  |           | 
                     |   10.732  |   38.952  |           | 
                     |   87.747% |   12.253% |   41.437% | 
                     |   46.378% |   23.506% |           | 
                     |   36.360% |    5.077% |           | 
                     |    3.276  |   -6.241  |           | 
---------------------|-----------|-----------|-----------|
                   1 |      977  |      384  |     

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of death between people with hypertension and people without hypertension.

### 3. Post Hoc Analysis and Conclusions
* All of the standardized residuals > 2, for all periods
* There were more people than expected who had hypertension and died
* There were less people than expected with normal blood pressure who died

### 4. Conclusion
* There is a statistically significant difference in the incidence of death between people with hypertension and people without hypertension
* According to this data, having high blood pressure puts one at a higher risk for death, and having normal blood pressure reduces the risk for death

## 10. Independent Chi-Square (DEATH): __Stroke__

In [34]:
CrossTable(framinghamP1$STROKE, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                    | framinghamP1$DEATH 
framinghamP1$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     2422  |     1135  |     3557  | 
                    | 2311.552  | 1245.448  |           | 
                    |    5.277  |    9.795  |           | 
                    |   68.091% |   31.909% |   90.578% | 
                    |   94.906% |   82.545% |           | 
                    |   61.676% |   28.902% |           | 
                    |    2.297  |   -3.130  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      130  |      240  |      370  | 
   

In [35]:
CrossTable(framinghamP2$STROKE, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                    | framinghamP2$DEATH 
framinghamP2$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     2197  |      799  |     2996  | 
                    | 2096.928  |  899.072  |           | 
                    |    4.776  |   11.139  |           | 
                    |   73.331% |   26.669% |   90.541% | 
                    |   94.862% |   80.463% |           | 
                    |   66.395% |   24.146% |           | 
                    |    2.185  |   -3.337  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      119  |      194  |      313  | 
   

In [36]:
CrossTable(framinghamP3$STROKE, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                    | framinghamP3$DEATH 
framinghamP3$STROKE |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     1728  |      403  |     2131  | 
                    | 1670.689  |  460.311  |           | 
                    |    1.966  |    7.135  |           | 
                    |   81.089% |   18.911% |   91.695% | 
                    |   94.841% |   80.279% |           | 
                    |   74.355% |   17.341% |           | 
                    |    1.402  |   -2.671  |           | 
--------------------|-----------|-----------|-----------|
                  1 |       94  |       99  |      193  | 
   

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of death between people who had a stroke and people who haven't had a stroke

### 3. Post Hoc Analysis and Conclusions
* All but one of the standardized residuals > 2, for all periods
* There were more people than expected who had a stroke and died
* There were less people than expected who didn't have a stroke and died

### 4. Conclusion
* There is a statistically significant difference in the incidence of death between people who had a stroke, when compared with people who haven't had a stroke
* According to this data, having had a stroke puts one at a higher risk for death

## 11. Independent Chi-Square (DEATH): __Heart Disease__

In [37]:
CrossTable(framinghamP1$ANYCHD, framinghamP1$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3927 

                    | framinghamP1$DEATH 
framinghamP1$ANYCHD |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     2099  |      727  |     2826  | 
                    | 1836.504  |  989.496  |           | 
                    |   37.519  |   69.636  |           | 
                    |   74.275% |   25.725% |   71.963% | 
                    |   82.249% |   52.873% |           | 
                    |   53.450% |   18.513% |           | 
                    |    6.125  |   -8.345  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      453  |      648  |     1101  | 
   

In [38]:
CrossTable(framinghamP2$ANYCHD, framinghamP2$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  3309 

                    | framinghamP2$DEATH 
framinghamP2$ANYCHD |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     1898  |      514  |     2412  | 
                    | 1688.181  |  723.819  |           | 
                    |   26.078  |   60.822  |           | 
                    |   78.690% |   21.310% |   72.892% | 
                    |   81.952% |   51.762% |           | 
                    |   57.359% |   15.533% |           | 
                    |    5.107  |   -7.799  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      418  |      479  |      897  | 
   

In [39]:
CrossTable(framinghamP3$ANYCHD, framinghamP3$DEATH, fisher=TRUE, chisq = TRUE, expected = TRUE, sresid=TRUE, format="SPSS")


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2324 

                    | framinghamP3$DEATH 
framinghamP3$ANYCHD |        0  |        1  | Row Total | 
--------------------|-----------|-----------|-----------|
                  0 |     1483  |      245  |     1728  | 
                    | 1354.740  |  373.260  |           | 
                    |   12.143  |   44.073  |           | 
                    |   85.822% |   14.178% |   74.355% | 
                    |   81.394% |   48.805% |           | 
                    |   63.812% |   10.542% |           | 
                    |    3.485  |   -6.639  |           | 
--------------------|-----------|-----------|-----------|
                  1 |      339  |      257  |      596  | 
   

### 1. Checking Assumptions
* Data is independent, i.e. one observation per subject
* Expected values (frequencies) > 5, so the size assumption is met 

### 2. Interpretation of the results
* _p_ value  <.05 for all periods
* The tests are significant, meaning that there is a significant difference in the incidence of death between people with heart disease and without heart disease.

### 3. Post Hoc Analysis and Conclusions
* All of the standardized residuals > 2, for each period
* There were more people than expected who had heart disease and died 
* There were less people than expected who didn't have heart disease and died

### 4. Conclusion
* There is a statistically significant difference in the incidence of death between people who have heart disease and people who don't have heart disease
* According to this data, having heart disease puts one at a higher risk for death