# 08 - Poisson regression

## Data

Source of data: R-Package "AER", data Affairs

Data was filtered with 20 rows remaining.

Dataset affairs_subsetxx.csv

In [2]:
library(readr)
affairs_subset <- read_csv("data/affairs_subset.csv",
                 show_col_types = FALSE)
head(affairs_subset)


affairs,gender,age,yearsmarried,children,religiousness,education,rating
<dbl>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
12,female,42,15,yes,5,9,1
0,female,32,15,yes,2,14,4
0,male,32,10,yes,3,20,5
0,female,32,15,yes,4,18,4
12,male,37,15,yes,5,17,2
12,female,42,15,yes,4,12,1


## SAS program snippet

The following SAS code will be executed.

## Results

The output is divided into blocks to explain it and to reproduce it afterwards in the different languages.

### Block 1
![Block 1](img_screenshots/block_1.png)

This block lists the dataset, the distribution, which PROC GENMOD used, the link function and the dependent variable.

### R chunk for reproduction

In [5]:
library(broom)
my_glm <- glm(affairs ~ age + education + gender + rating + religiousness + yearsmarried,
                     family = 'poisson', data = affairs_subset)
glance(my_glm)

null.deviance,df.null,logLik,AIC,BIC,deviance,df.residual,nobs
<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>
150.3413,19,-36.16375,86.3275,93.29762,41.23102,13,20


The number of used observations can be gained from the glance() function.

The distribution is part of the function call.

The number of observations in the dataset can be retrieved from the nrow() function.

In [7]:
nrow(affairs_subset)

my_glm


Call:  glm(formula = affairs ~ age + education + gender + rating + religiousness + 
    yearsmarried, family = "poisson", data = affairs_subset)

Coefficients:
  (Intercept)            age      education     gendermale         rating  
     14.13138       -0.17494       -0.51178        2.97300       -0.81165  
religiousness   yearsmarried  
     -0.07533        0.11765  

Degrees of Freedom: 19 Total (i.e. Null);  13 Residual
Null Deviance:	    150.3 
Residual Deviance: 41.23 	AIC: 86.33

### Block 2
![Block 2](img_screenshots/block_2.png)

This block gives the number of observations in the dataset and the numbers of observations which were used to fit the model.

### R chunk for reproduction

In [8]:
glance(my_glm)
nrow(affairs_subset)


null.deviance,df.null,logLik,AIC,BIC,deviance,df.residual,nobs
<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>
150.3413,19,-36.16375,86.3275,93.29762,41.23102,13,20


The number of used observations can be gained from the glance() function.

The number of observations in the dataset can be retrieved from the nrow() function.

### Block 3
![Block 3](img_screenshots/block_3.png)

For the class variables the levels are given.

### R chunk for reproduction

In [9]:
table(affairs_subset$children)

table(affairs_subset$gender)


 no yes 
  2  18 


female   male 
    11      9 

null.deviance,df.null,logLik,AIC,BIC,deviance,df.residual,nobs
<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>
150.3413,19,-36.16375,86.3275,93.29762,41.23102,13,20


### Block 4
![Block 4](img_screenshots/block_4.png)

A lot of criteria for assessing the goodness of fit are given.


### R chunk for reproduction

In [11]:
glance(my_glm)

null.deviance,df.null,logLik,AIC,BIC,deviance,df.residual,nobs
<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>
150.3413,19,-36.16375,86.3275,93.29762,41.23102,13,20


Nearly all of the criteria can be provided by the glance() function.

Log Likelihood is calculated differently or has a different term (TODO)

Missing criteria

- Scaled daviance
- Pearson Chi-Square
- Scaled Pearson Chi-Square
- Full Log likelihood or log likelihood

### Block 5
![Block 5](img_screenshots/block_5.png)

This block provides parameter estimates with standard error, confidence interval, test statistic and probability.

### R chunk for reproduction

In [10]:
my_model <- glm(affairs ~ age + education + gender + rating + religiousness + yearsmarried,
                     family = 'poisson', data = affairs_subset)
tidy(my_model)

term,estimate,std.error,statistic,p.value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),14.13137756,2.28380578,6.1876442,6.106998e-10
age,-0.17493751,0.05743164,-3.0460131,0.002318977
education,-0.51178177,0.12333162,-4.1496398,3.329989e-05
gendermale,2.97300483,0.80465782,3.6947442,0.0002201083
rating,-0.8116548,0.17507142,-4.6361354,3.549835e-06
religiousness,-0.07532957,0.17921813,-0.4203234,0.6742493
yearsmarried,0.11764964,0.08208607,1.4332473,0.1517872


### Block 6
![Block 6](img_screenshots/block_6.png)

This block provides the likely ratio statistic for testing the significance of the effect. 

### R chunk for reproduction

In [6]:
my_model <- glm(affairs ~ age + education + gender + rating + religiousness + yearsmarried,
                     family = 'poisson', data = affairs_subset)
summary(my_model)



Call:
glm(formula = affairs ~ age + education + gender + rating + religiousness + 
    yearsmarried, family = "poisson", data = affairs_subset)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   14.13138    2.28381   6.188 6.11e-10 ***
age           -0.17494    0.05743  -3.046  0.00232 ** 
education     -0.51178    0.12333  -4.150 3.33e-05 ***
gendermale     2.97300    0.80466   3.695  0.00022 ***
rating        -0.81165    0.17507  -4.636 3.55e-06 ***
religiousness -0.07533    0.17922  -0.420  0.67425    
yearsmarried   0.11765    0.08209   1.433  0.15179    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 150.341  on 19  degrees of freedom
Residual deviance:  41.231  on 13  degrees of freedom
AIC: 86.327

Number of Fisher Scoring iterations: 6
