# 1. Descriptive tables

### 1.1. Statistical table

- Convenience function `print_html`

```R
print_html <- function(input) {
    
    capture.output(input) %>% 
    paste(collapse="") %>%
    IRdisplay::display_html()
    
}
```

- Table (no grouping)

```R
library(gtsummary)

data %>% select(_____,_____,_____) %>%
         tbl_summary(statistic = all_continuous() ~"{mean} ({sd})") %>%                                  print_html
```

- Table (split by group)

```R
library(gtsummary)

data %>% select(_____,_____,_____) %>%
         tbl_summary(by = _______, # grouping variable 
                     statistic = all_continuous() ~"{mean} ({sd})") %>%                                  print_html
```

---
# 2. Statistical tests and plotting 

![summary_table.png](images/summary_table.png)

---
### 2.1. Comparing 2 measures (unrelated subjects)

#### 2.1.1 Parametric (unpaired t-test)

```R
library(ggstatsplot)

# adjust size of the image output e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data_grouped %>% ggbetweenstats(x = _______,     # group
                                y = _______) +   # measurement
                 theme_classic(base_size = __)   # size of plot e.g. 16
```

**Effect size (Cohen's d/Hedge's g)**
- d<0.2 - Verysmall
- 0.2<=d<0.5 - Small
- 0.5<=d<0.8 - Medium
- d>=0.8 - Large

#### 2.1.2. Non-parametric (Mann-Whitney U)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data_grouped %>% ggbetweenstats(x = _______,               # group
                                y = _______,               # measurement
                                type = "nonparametric") +  # non-parametric
            theme_classic(base_size = __)             # size of plot e.g. 16
```

**Effect size (rank biserial)**
- r<0.05 - Tiny
- 0.05<=r<0.1 - Very small 
- 0.1<=r<0.2 - Small
- 0.2<=r<0.3 - Medium
- 0.3<=r<0.4 - Large
- r>=0.4 - Very large

---
### 2.2. Comparing 2 measures (related subjects)

#### 2.2.1. Parametric (paired t-test)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data_grouped %>% ggwithinstats(x = _______,     # group
                                y = _______) +   # measurement
                 theme_classic(base_size = __)   # size of plot e.g. 16
```

**Effect size (Cohen's d/Hedge's g)**
- d<0.2 - Verysmall
- 0.2<=d<0.5 - Small
- 0.5<=d<0.8 - Medium
- d>=0.8 - Large

#### 2.2.2. Non-parametric (Wilcoxon signed rank test)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data_grouped %>% ggwithinstats(x = _______,               # group
                               y = _______,               # measurement
                               type = "nonparametric") +  # non-parametric
                 theme_classic(base_size = __)             # size of plot e.g. 16
```

**Effect size (rank biserial)**
- r<0.05 - Tiny
- 0.05<=r<0.1 - Very small 
- 0.1<=r<0.2 - Small
- 0.2<=r<0.3 - Medium
- 0.3<=r<0.4 - Large
- r>=0.4 - Very large

---
### 3.1. Comparing > 2 measures (unrelated)

#### 3.1.1. Parametric (1-way ANOVA)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggbetweenstats(
            x = ________,                   # group
            y = ________,                   # measurement
            pairwise.comparisons = TRUE) +  # show pairwise
         theme_classic(base_size = __)     # size of plot e.g. 16
```

**Effect size (omega squared)**
- ES<0.01 - Very small
- 0.01<=ES<0.06 - Small
- 0.16<=ES<0.14 - Medium 
- ES >= 0.14 - Large

#### 3.1.2. Non-parametric (Kruskal-Wallis)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggbetweenstats(
            x = ________,                   # group
            y = ________,                   # measurement
            type = "nonparametric",         # non-parametric
            pairwise.comparisons = TRUE) +  # show pairwise
         theme_classic(base_size = __)     # size of plot e.g. 16
```

**Effect size (eta squared)**
- ES<0.01 - Very small
- 0.01<=ES<0.06 - Small
- 0.16<=ES<0.14 - Medium 
- ES >= 0.14 - Large

---
### 3.2. Comparing > 2 measures (related)

#### 3.2.1. Parametric (Repeated measures ANOVA)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggwithinstats(
            x = ________,                   # group
            y = ________,                   # measurement
            pairwise.comparisons = TRUE) +  # show pairwise
          theme_classic(base_size = __)     # size of plot e.g. 16
```

**Effect size (omega squared)**
- ES<0.01 - Very small
- 0.01<=ES<0.06 - Small
- 0.16<=ES<0.14 - Medium 
- ES >= 0.14 - Large

#### 3.2.2. Non-parametric (Friedman test)

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggwithinstats(
            x = ________,                   # group
            y = ________,                   # measurement
            type = "nonparametric",         # non-parametric
            pairwise.comparisons = TRUE) +  # show pairwise
          theme_classic(base_size = __)     # size of plot e.g. 16
```

**Effect size (Kendall's W)**
- 0.00 <= w < 0.20 - Slight agreement
- 0.20 <= w < 0.40 - Fair agreement
- 0.40 <= w < 0.60 - Moderate agreement 
- 0.60 <= w < 0.80 - Substantial agreement 
- w >= 0.80 - Almost perfect agreement

---
### 4.1. Comparing counts 2x2

#### 4.1.1. Fisher's Exact Test
- small numbers
- 2x2 table

#### Preparation

- Convenience `print_table` function

```R
print_table <- function(input, margin=F) {
    
    input <- htmlTable::txtRound(input,1)
    if (margin == T) { input <- addmargins(input)}

    input %>% 
    htmlTable::htmlTable(css.rgroup = "font-weight: 900; text-align: left;") %>%
    IRdisplay::display_html() 
    
}
```

- Format table

![test_count_fisher_table_small_label.png](images/test_count_fisher_table_small_label.png)

```R
library(tidyverse)

table_name <- data %>% 
              select(__exposure__, __outcome__) %>%
              mutate(__exposure__ = fct_relevel(__exposure__, "+","-"),
                     __outcome__ = fct_relevel(__outcome__, "+","-")) %>%
              table

print_table(table_name)
                       
```

#### Plotting

```R
library(tidyverse)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggplot(aes(x=__exposure__,fill=__outcome__)) +
           geom_bar(position="fill") +
           theme_grey(base_size=_____) # size of plot e.g. 16
```

#### Statistical test

- Fisher's exact test

```R
table_name %>% rstatix::fisher_test(detailed=T)
```

- Effect size (odds ratio)

```R
epitools::oddsratio(table_name, rev="both")$measure
```

---
### 4.2. Comparing counts RxC

#### 4.2.1. Chi-squared test
- R X C
- large counts

#### Preparation

- Convenience function `print_table`

```R
print_table <- function(input, margin=F) {
    
    input <- htmlTable::txtRound(input,1)
    if (margin == T) { input <- addmargins(input)}

    input %>% 
    htmlTable::htmlTable(css.rgroup = "font-weight: 900; text-align: left;") %>%
    IRdisplay::display_html() 
    
}
```

- Format table

```R
table_name <- data %>%
              select(__exposure__, __outcome__) %>%
              table

print_table(table_name)
```

#### Statistical plotting

```R
library(ggstatsplot)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=__)

table_name %>% ggbarstats(x = __exposure__, 
                          y = __outcome__) +
               theme_grey(base_size=16)
```

**Effect size (Cramer's V)**
- ES <= 0.2 - weak
- 0.2 < ES <=0.6 - moderate
- ES > 0.6 - strong


---
### 5.1. Correlating 2 measures

#### 5.1.1. Parametric (Pearson's correlation)

```R

library(ggstatsplots)

# set plot dimensions e.g 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data %>% ggscatterstats(x = ______,          # measure 1
                        y = ______,          # measure 2
                        marginal = FALSE) +  # suppress marginal plot 
         theme_grey(base_size=__) # size of plot e.g. 16
```

**Effect size (r)**
- r<0.05 - Tiny
- 0.05<=r<0.1 - Very small 
- 0.1<=r<0.2 - Small
- 0.2<=r<0.3 - Medium
- 0.3<=r<0.4 - Large
- r>=0.4 - Very large

#### 5.1.2. Non-parametric (Spearman's correlation)

```R

library(ggstatsplots)

# set plot dimensions e.g 10, 10
options(repr.plot.width=__, repr.plot.height=__)

data_mz %>% ggscatterstats(x = ______,            # measure 1
                           y = ______,            # measure 2
                           type = "nonparametric" # non-parametric spearman's
                           marginal = FALSE) +    # suppress marginal plot 
            theme_grey(base_size=__) # size of plot e.g. 16

```

**Effect size (rho)**
- r<0.05 - Tiny
- 0.05<=r<0.1 - Very small 
- 0.1<=r<0.2 - Small
- 0.2<=r<0.3 - Medium
- 0.3<=r<0.4 - Large
- r>=0.4 - Very large

---
# 3. Statistical modeling

### 3.1. Build model
- Continuous outcome (e.g. BP, heart rate, glucose)

```R
model <- lm(__response__ ~ __covariate1__ + __covariate2__, # selected covariates
            data) 
 
model <- lm(__response__ ~ . , # all covariates
            data) 
```

- Binary outcome (e.g. disease/no disease, alive/dead)

```R
model <- glm(__response__ ~ __covariate1__ + __covariate2__, # selected covariates 
             data, 
             family="binomial") 

model <- glm(__response__ ~ ., # all covariates
             data, 
             family="binomial") 
```

### 3.2. Check collinearity

```R
library(tidyverse)

model %>% car::vif() %>% 
          bind_rows %>%
          pivot_longer(cols = everything(), 
                       names_to="covariate", 
                       values_to = "VIF") %>%
          ggplot(aes(x=covariate, y=VIF)) + 
            geom_bar(stat="identity") +
            geom_hline(yintercept = 5, 
                       linetype="dashed", 
                       color="red", 
                       size=1) +
            theme_grey(base_size=__) # size of plot e.g. 16
```

### 3.3. Statistical table

- Convenience function `print_html`

```R
print_html <- function(input) {
    
    capture.output(input) %>% 
    paste(collapse="") %>%
    IRdisplay::display_html()
    
}
```

- `stargazer` table

```R
library(stargazer)

stargazer(model, ci=TRUE, type="html") %>% print_html # can take multiple models
```

### 3.4. Statistical plotting

```R
library(ggstatsplot)

# set plot dimensions e.g. 10,10
options(repr.plot.width=__, repr.plot.height=__)

model %>% ggcoefstats(exclude.intercept = T,  # omit intercept
                      stats.label.args=list(nudge_y=0.1, 
                                            size=5, 
                                            label.size=NA)) +
          theme_grey(base_size=__) # size of plot eg. 16
```

### 3.5. Diagnostic plots

```R
library(ggfortify)

# set plot dimensions e.g. 10, 10
options(repr.plot.width=__, repr.plot.height=_)

model %>% autoplot(which = 1:2) +  # show only first 2 plots
          theme_grey(base_size=__) # set plot size e.g. 16
```

### 3.6. Modeling for prediction
- Best subset selection for **continuous outcome** (e.g. BP)

```R
library(glmulti)

models <- glmulti(___response__ ~ .,        # . = consider all covariates
                  data = data, 
                  level = 1,                # consider only independent covariates
                  method = "h",             # exhaustive search
                  report = FALSE,           # suppress messages
                  plotty = FALSE)           # suppress messages

weightable(models) %>% head(10) # show top 10 models by AIC

top_model <- models@objects[[1]] # top model with lowest AIC
```

- Best subset selection for **binary categorical outcome** (e.g. death)

```R
library(glmulti)

models <- glmulti(___response__ ~ .,        # . = consider all covariates
                  data = data, 
                  level = 1,                # consider only independent covariates
                  method = "h",             # exhaustive search
                  fitfunction = "glm",      # glm function
                  family = binomial,        # logistic regression
                  report = FALSE,           # suppress messages
                  plotty = FALSE)           # suppress messages

weightable(models) %>% head(10) # show top 10 models by AIC

top_model <- models@objects[[1]] # top model with lowest AIC
```

### 3.7. Modeling for explanation
Build a causal graph using http://www.dagitty.net/dags.html
- adjust for fork
- do not adjust for collider or pipe