## Observational Studies

We need to understand the difference between observational studies and randomised trials, and how to bridge the difference with matching.

Consider the following DAG:

<img src="./img/dags_confounding.png" >

In this case, X is sufficient to control for confounding.
- Ignorability assumption holds:

$Y^0, Y^1 \perp A |X$

In a randomized trial, treatment assignment A would be determined by a coin toss.
- This effectively erases the arrow from X to A.

<img src="./img/dags_rct_vs_observational.png" >

In a randomized trial, the distribution of X will be the same in both treatment groups.

<img src="./img/rct_population.png" >

In summary:
- Distribution of preteatment variabels X that affect Y are the same in both treatment groups.
 - __Covariate balance is ensured__
- Thus, if the outcome distribution ends up differing, it will not be because of differences in X.
- X is dealth with at the design phase

<u> Issues with Randomization:</u>
- Randomized trials are expensive
- Sometimes randomizing treatment/exposure is unethical
- Some (many) people will refuse to partcipate in trials
- Randomized trials take time (you have to wait for outcome data).
 - In some cases, by the time you have outcome data, the question might no longer be relevant.

<u>Observational Studies</u>
Planned, prospective, observational studies with active data collection:
- __Like trials:__ data collected on a commn set of variables at planned times; outcomes are carefully measured; study protocols.
- __Unlike trials:__ regulations much weaker, since not intervening; broader population eligible for the study.

Databases, retrospective, passive data collection:
- large sample sizes; inexpensive; potential for rapid analysis
- Data quality typically lower; no uniform standard of collection

In observational studies, the distribution of X will differ between treatment groups (since there is no control of the covariate to ensure balance).
- For example, if older people are more likely to get A = 1, we might see distributions like this:

<img src="./img/obs_vs_rct_distribution.png" >

### Matching

Matching is a method that attempts to make an observational study more like a randomized trial.

Main idea:
- Match individuals in the treated group (A = 1) to individuals in the control group (A=0) on the covariates X.

In the example where older people are more likely to get A = 1:
- At younger ages, there are more people with A = 0
- At older ages, there are more people with A = 1

In a RCT, for any particular age, there should be about the same number of treated and untreated people.

__By matching treated people to control people of the same age, there will be about the same number of treated and controls at any age.__

<u> Advantages of matching </u>

Controlling for confounders is acheived at the design phase (without looking at the outcome)
- the difficult statistical work can be done completely blinded to the outcomes

Matching will __reveal lack of overlap__ in covariate distribution
- Positivity assumption will hold in the population that can be matched

Once data are matched, essentially treated as if the data is produced from a randomized trial with __ensured covariate balance__.

### Single Covariate Matching


Consider the following covariate distribution of a single covariate between the treatment groups.

<img src="./img/matching_single_covariate_1.png" >

We can match each treated subject to a control subject

<img src="./img/matching_single_covariate_2.png" >

and then we eliminate the excess "blue" subjects in the Control group.

<img src="./img/matching_single_covariate_3.png" >

This ensures a balance in the covariate X.

### Many covariates

We will not be able to exactly match on the full set of covariates.

In a randomized trial, treated and control subjects are not perfect matches either.
- The distribution of covariates is balanced between groups (stochastic balance)

With observational data, matching closely on covariates can achieve stochastic balance.

Example with two covariates (sex, age)

<img src="./img/matching_double_covariates_1.png" >

It is easy to match on discrete type covariates (sex), but not so easy to match on continuous type covariates (age).

<img src="./img/matching_double_covariates_2.png" >

Note that we are making the __distribution of covariates in the control population look like that in the treated population__:
- Doing so means we are find the causal treatment on the treated.

This is represented by the following population breakdown

<img src="./img/hypo_worlds_treated.png" >

There are matching methods that can be used to target a different population, but this requires more advanced techniques.

### Fine Balance

Sometimes it is difficult to find great matches. We might be willing to accept some non-ideal matches if treated and control groups have same distribution of covariates.
- This is known as __"fine balance"__.

For example:
- Match 1: 
 - Treated: Male, Age 40
 - Control: Female, Age 45
- Match 2:
 - Treated: Female, Age 45
 - Control: Male, Age 40
 
Average age and percent female are the same in both groups, __even though neither match is great__.
- Percentage of Male is 50% which is the same in both treatment groups
- Average age in both treatment groups is 42.5

__We achieve fine balance even though the matches are not great by tolerating non-ideal matches.__

<u> Number of matches </u>
- __One to one (pair matching)__
 - Match exactly one control to every treated subject
 - Discard those without matches so you might lose some efficiency
- __Many to one__
 - Match some fixed number K controls to every treated subject (e.g., 5 to 1 matching)
- __Variable__
 - Sometimes match 1, sometimes more than 1, control to treated subjects
  - If multiple good matches available, use them. 
  - If not, do not.

### How to match?

Because we typically cannot match exactly, we first need to choose some metric of closeness.

We will consider two options (for now):
- Mahalanobis distance
- Robust Mahalanobis distance

### Mahalanobis distance

Denote by $X_j$ (a vector of covariates for subject j).

The Mahalanobis distance between covariates for subject i and subject j is:

$D(X_i, X_j) = \sqrt{(X_i - X_j)^TS^{-1}(X_i-X_j)}$

This metric is the square root of the sum of squared distances between each covariate scaled by the covariance matrix
- We need to scale because some dimensions may be on a much larger quantum, so "big" should be a relative notion.

<img src="./img/mahalanobis_dist_1.png" >

### Robust Mahalanobis distance

Motivation is to deal with outlier data.
- Outliers (in a specific dimension/covariate) can create large distances between subjects, even if the covariates are otherwise similar
- __Ranks__ might be more relevant
 - e.g. highest and second highest ranked valeus of covariates perhaps should be treated as similar, even if the values are far apart.
 

Robust Mahalanobis distance:
- Replace each covariate value with its rank
- Constant diagonal on covariance matrix (since ranks should be on the same scale)
- Calculate the usual Mahalanobis distance on the ranks

### Other distance measures
- If you want an exact match on a few important covariates, you can essentially make the distance infinity if they are not equal. 
 - In other words, strong penalty/weightage for specific covariates dimensions
- Distance on propensity score 

Once you have a distance score, how should you select matches?
- __Greedy (nearest neighbor) matching__
 - Not as good but coputationally fast
- __Optimal matching__
 - Better but computationally demanding.

### Greedy (nearest neighbor) matching

Experiment Setup:
- Selected a set of pre-treatment covariates X that (hopefully) satisfy the ignoraibility assumption
- You have calculated a distance $d_{ij}$ between each treated subject with every control subject
- You have many more controls subjects than treated subjects
 - This is often the case in observational studies
- Focus is on pair (one-to-one) matching 

Steps:
1. Randomly order list of treated subejcts and control subjects
2. Start with the first trated subject. Match to the control with the smallest distance (this is greedy).
3. Remove the matched control from the list of available matches.
4. Move on to the next treated subject. Match tot he control with the smallest distance.
5. Repeat steps 3 and 4 until you have matched all treated subjects.

<u> Greedy Matching</u>
- Intuitive 
- Computationally fast
    - Involves a series of simple algorithms (identifying min distance)
    - Fast even for large data sets
    - R package: MatchIt
- Not invariant to intial order of list
- Not optimal
    - Always taking the smallest distance match does not minimize total distance
    - Can lead to some bad matches

<u> Many-to-one Matching</u>
- For k:1 matching:
    - After everyone has 1 match, go through the list again and find 2nd matches from the remaining pool

<u> Tradeoffs</u>
- Pair matching
    - Closer matches
    - Faster computing time
- Many-to-one
    - Larger sample size
- Largely a bias-variance tradeoff issue
    - Pair matching has less bias because the matching is closer, but it should be less efficient because you are discarding data.
    - Many to one matching has more bias, but smaller variance.
- Note that the efficiency gain for using "many-to-one" is not as much as if you were adding an additional treated subject that you can find matches for in the control subjects.

<u> Caliper </u>
- We might prefer to exclude treated subjects for whom there does not exist a good match.
- A bad match can be defined using a caliper (max acceptable distance)
    - Only match a treated subject if the best control match has distance less than the caliper
    - Otherwise, get rid of that treated subject
    - Recall positivity assumption (prob of each treatment given X should be non-zero): 
        - If no matches within caliper, it is a sign that positivity assumption would be violated.
        - Excluding these subjects makes assumption more realistic
        - Drawback: population might be hard to define

### Optimal Matching
- Greedy matching is not typically optimal
- Optimal matching
    - Minimizes global distance measure
    - Computationally demanding
    - R packages:
        - `optmatch`
        - `rcbalance`

Feasibility:
- Where or not it is feasible to perform optimal matching depends on the size of the problem.
- Constraints can be imposed to make optimal matching computationally feasible for larger data sets.
    - For example:
        - Match within hospitals in a multi-site clinical study
        - Match within primary disease categoy
        - These are "blocks"
    - This is known as sparse matching
        - Mismatches can be tolerated if fine balance can still be achieved.

### Assessing Balance

<u> Did matching work? </u>
- After you ahve matched, you should assess whether matching worked.
    - __Covariate balance__
        - Standardized differences
            - Similar means?
    - __This can/should be done without looking at the outcome__
- Commonly, a "Table 1" is created, where pre-matching and post-matching balance is compared.

<u> Hypothesis Tests and p-values </u>
- Balance can be assessed with hypothesis tests
    - i.e., test for a difference in means between treated and controls for each covariate
        - Two sample t-tests (for continuous covariates) or chi-square test (for discrete covariates) and report p-value for each test
    - Drawback:
        - p-values are dependent on sample size
        - Small differences in means will have a small p-value if the sample size is large (which is often in most cases).
            - We probably do not care much if mean differences are small.


<u> Standardised differences </u>

A standardised difference is the difference in means between groups, divided by the (pooled) standard deviation.

<img src="./img/standardised_diff_formula.png" >

Standardised differences:
- Does not depend on sample size
- Often, absolue value of smd is reported (ignore polarity)
- Calculate for each variable that you match on

Rules of thumb:
- Values < 0.1 indicate adequate balance
- Values 0.1 to 0.2 are not too alarming
- Values > 0.2 indicate serious imbalance

### Table 1

<img src="./img/table1_1.png" >

### SMD Plot with Threshold = 0.1
<img src="./img/smd_plot.png" >


### Analysing Data After Matching

After successfully matching and acheiving adequate balance (SMD for each covariate <=0.1), we can proceed with outcome analysis.
- Test for a treatment effect
- Estimate a treatment effect and confidence interval
- Methods should take matching into account

<u> __Randomisation tests (Binomial data)__ </u>

Randomisation tests are tests that you can use if you already have data from a randomised trial. These tests are also known as:
- Permutation tests
- Exact tests

Main idea:
- Compute test statistic from __observed data__
- Assume null hypothesis of __no treatment effect__ is true
- Randomly __permute treatment assignment__ within pairs and re-compute test statistic.
- Repeat many times and see how unusual the observed statistic is under the assumption of the null hypothesis.

Toy Example:
- Suppose we have a binary outomce and 13 matched pairs
- We will use as the test statistic the number of events in the treated group. 

__Discordant pairs__
- The only pairs between control and treated groups that can change under permutation.
- Under the null hypothesis, the treatment and control groups are the same so it should be ok to swap their results.

<img src="./img/discordant_pairs.png" >

Methodology:
- Through randomisation, one can flip/swap the results of the data for the discordant pairs. For example, if Treated = 1 and Control = 0, the flipped case will be Treated = 0 and Control = 1. 
- To be specific, for each discordant pair, you flip a coin: If heads, you swap. If tails, you keep it the same.
- Test statistic is then recalculated when permutation randomisation is performed for all the discordant pairs.
- Repeat permutations multiple time and record the test statistic to obtain the distribution of the test statistic under the null hypothesis.

__McNemar test__
- This test is equivalent to the __McNemar test for paired binomial data__
- In R, use the `mcnemar.test(contrast_table)` function, where `contrast_table` is a matrix comprising of the 2 by 2 table of Control vs Treatment group outcome.

<img src="./img/mcnemar_test.png" >

<u> __Randomisation tests (Continuous data)__ </u>
- Basic approach also works for continuous data
- Test statistic is the difference in sample means between treatment and control group.

Methodology:
- For continuous data, there's no notion of discordant pairs.
- Likewise, under the null hypothesis assumption, there is no difference between the treatment and control group. Thus, we can __swap the data within each pair__.
- For each observation pair of control and treatment subject, we can randomly permute the labels 
- The test statistic is then re-calculated. 
- We can perform multiple permutation and obtain the distribution of the statistic under the null hypothesis.

<img src="./img/randomised_tests_continuous.png" >

__Paired T-test__: 
- We can use a paired t-test in R with `t.test(treatment_vct, control_vct, paired = TRUE)`

<u> Other outcome models </u>
- Conditional logistic regression
    - Matched binary outcome data
- Stratified Cox model
    - Time-to-event (survival) outcome data
    - Baseline hazard stratefied on matched sets
- Generalized estimating equations (GEE)
    - Match ID variable used to specify clusters
    - For binary outcomes, can estimate a causal risk difference, causal risk ratio, or causal odds ratio (depending on link function)

### Sensitivity Analysis

<u> Motivations: Hidden bias </u>
- Matching aims to achieve balance on observed covariates
    - __Overt bias__ could occur if there was imbalance on observed covariates (if we did not fully control for these variables)
- There is no guarantee matching will result in balance on variables that we did not match on (including unobserved variables).
    - If these unobserved variables are confounders, then we have __hidden bias__
        - Implications are that the ignorability assumption is violated.
    
__Sensitivity Analysis:__
- If there is hidden bias, determine how severe it would have to be to __change conclusions__:
    - Change from statisically significant to not
    - Change in direction of effect
    
Example with Terminology:
- Let $\pi_j$ be the probability that person j receives treatment
- Let $\pi_k$ be the probability that person k receives treatment
- Suppose that person j and k are perfectly matched, so that their observed covariates, $X_j$ and $X_k$, are the same
- If $\pi_j = \pi_k$  then there is no hidden bias.
- Consider the following inequality:

$\frac{1}{\Gamma} \le \frac{\frac{\pi_j}{1 - \pi_j}}{\frac{\pi_k}{1-\pi_k}} \le \Gamma$  
$\frac{1}{\Gamma} \le \frac{Odds\ of\ treatment\ for\ person\ j}{Odds\ of\ treatment\ for\ person\ k} \le \Gamma$

- $\Gamma$ is odds ratio
    - If $\Gamma = 1$, then no overt bias
    - If $\Gamma > 1$, then it implies hidden bias

- Suppose we have evidence of a treatment effect.
    - This is under the assumption that \$Gamma$ = 1 (assume no hidden bias)
- We can then increase $\Gamma$ until evidence of treatment effect goes away (i.e. no longer statistically significant).
    - If this happens when \$Gamma = 1.1$, then it is __very sensitive__ to unmeasured confounding (hidden bias).
    - If it does not happen until $\Gamma = 5$, then it is __not very sensitive__ to hidden bias.
    
R packages for sensitivity analysis:
- `sensitivity2x2xk` 
- `sensitivityfull`

# General Workflow in R
- Matching
    - With data, create TableOne comparison of SMD based on "treatment" strata
        - `CreateTableOne(vars = xvars, strata = "treatment", data = mydata, test = FALSE)`
    - Perform Matching (greedy or optimal) on the data using Mahalanobis distance
        - `greedymatch <- Match(Tr = treatment, M = 1, X = mydata[xvars])`
        - `matched <- mydata[unlist(greedymatch[c("index.treated", "index.control")]),]`
    - With matched data, create TableOne comparison of SMD
        - `CreateTableOne(vars = xvars, strata = "treatment", data = matched, test = FALSE)`
        - Note that the effective `n` paired samples will be smaller than the original data's  TableOne
- Outcome analysis
    - If we want a causal risk difference, carry out a paired t-test using vectors from matched data
        - `y_trt <- matched$died[matched$treatment == 1]`
        - `y_con <- matched$died[matched$treatment == 0]`
        - `diffy <- y_trt - y_con`
        - `t.test(diffy)`
    - McNemar test
        - `table(y_trt, y_con)`
        - `mcnemar.test(matrix(c(994,493,394,305),2,2))`, where index 2 and 3 elements are the discordant pairs.
            - 493 and 394 are the discordant pairs
    
    

### Propensity Scores

The propensity score is the probability of receiving treatment, rahter than control, given covariates X.
- Define A = 1 for treatmment and A = 0 for control
- Propensity score for subject i by $\pi_i$.
- __Propensity is a function of covariates.__

$\pi_i = P(A=1|X_i)$

Example:
- Suppose age was the only X variable, and older people are more likely to get treatment.
- Then propensity score would be larger for older ages
    - P(A=1|age = 60) > P(A=1|age = 30)
    - $\pi_i > \pi_j$ if $age_i > age_j$

<u> Balancing Score </u>
- Suppose  2 subjects have the same value of the propensity score, but they possibly have __different__ covariate values X.
    - Despite the different covariate values, they were both equally likely to have been treated.
        - This means that both subjects' X is __just as likely__ to be found in the treatment group. 
        - If you restrict to a subpopulation of subjects who have teh same value of the propensity score, there should be balance in the two treatment groups.
        - The propensity score is a __balancing score__.

A balancing score is something where if you condition on it, you will have balance. 
- Propensity score is an example of a balancing score.
    - If we __restrict our analysis to people who have only the same value for the propensity score__, then if we __stratify on the actual treatment received__, we should see the __same distribution of covariates (Xs) in those two treatment groups__.

More formally,

$P(X = x| \pi(X) = p, A = 1) = P(X = x | \pi(X) = p, A = 0) $

Implication: if we __match on the propensity score__, we should achieve balance.
- This makes sense since we assumed ignorability that treatment is randomized given X.
    - Conditioning on the propensity score is conditioning on an __allocation probability__.
        - If allocation probability = 0.3, prob of treatment = 0.3, prob of control = 0.7

__Estimated Propensity Score__
- In a randomised trial, the propensity score is generally known.
    - e.g. $ P(A=1|X) = P(A=1) = 0.5$
- In an observational study, it will be unknown.
    - Notice however that the propensity score involves observed data: A and X.
        - We therefore can estimate it.
        - Typically when people talk about a propensity score they are referring to the estimated propensity score.

__Methodology for estimating $P(A=1|X)$__:
- Outcome here is treatment variable A, which is binary.
- Use logistic regression to estimate the propensity score. 
    - Other ML methods can also be used to estimate
- Fit a logistic regression model (outcome A, covariates X)
- From that model, obtain the predicted probability (fitted value) for each subject
    - That is the estimated propensity score for each subject.
    

<u> __Propensity Score Matching__ </u>
- Propensity score is a __balancing score__
    - Matching on the propensity score should achieve balance
- Propensity score is a scalar where each subject will have exactly one value of the propensity score
    - The matching problem is simplified in that we are only matching on one variable
    
__Overlap__
- Once the __propensity score is estimated, but before matching__, it is useful to look for overlap.
    - Compare the distribution of the propensity score for treated and control subjects
    

__Good Overlap of Propensity Scores__
<img src="./img/propensitymatching_good.png" >

__Poor Overlap of Propensity Scores__
<img src="./img/propensitymatching_poor.png" >

__Trimming Tails:__
- If there is a lack of overlap, trimming the tails is an option.
    - This means removing subjects who have extreme values of the propensity score
    - For example:
        - Removing control subjects whose propensity score is less than the min in the treatment group
        - Removing treated subjects whose propensity score is greater than the max in the control group
- Trimming tails makes the positivity assumption more reasonable
    - Prevents extrapolation


__Matching (on Propensity Scores):__
-  We can proceed by __computing a distance between the propensity score for each treated subject with every control subject__.
    - Then use nearest neighbor or optimal matching as before.
    - Same steps as before, except distance is now based off on propensity score only as opposed to a distance based on a collection of covariates.
- In practice, logit (log odds) of the propensity socre is often used rahter than the propensity score itself
    - The propensity score is bound between 0 and 1, making many values seem similar
    - Logit of propensity score is unbounded (this transformation essentially stretches the distribution while preserving ranks)
    - Match on logit($\pi$) rahter than $\pi$.

__Caliper:__
- To ensure that we do not accept any bad matches, a caliper can be used.
    - Recall that a caliper is the max distance tht we are willing to tolerate.
- In practice, a common choice for a caliper is the 0.2 times the standard deviation of logit of the propensity score.
    1. Estimat ehte propensity score (e.g. using logistic regression)
    2. logit-transform the propensity score
    3. Take the standard deviation of this transformed variable
    4. Set the caliper ot 0.2 times the value from step 3.
- This is commonly done in practice because it seems to work well, but it is somewhat arbitrary.
    - Smaller caliper: less bias, more variance.

__After matching__
- The outcome analysis methods can be the same as would be used if matching directly on covariates.
    - Randomization tests
    - Conditional logistic regression, GEE ,Stratified Cox model

# General Workflow in R
- __Fit propensity score model using original data__
    - Use log-reg model
        - `psmodel <- glm(treatment ~ X1+ X2 + X3 , family = binomial(), data= mydata)`
        - `summary(psmodel)`
    - Obtain predicted propensity score for each subject
        - `pscore <- psmodel$fitted.values`
- __Check on Overlap of propensity scores (PRE-MATCHING)__
    - Plot of propensity score with custom R code
        - <img src="./img/propensitymatching_R_1.png" >
- __Matching on Propensity Scores__
    - Use R package `MatchIt` to perform matching using nearest neighbor
        - `m.out <- matchit(treatment ~ X1 + X2 + X3, data = mydata, method = "nearest")`
        - `summary(m.out)`
        - `plot(m.out, type = "jitter")`
            - <img src="./img/propensitymatching_R_2.png" >
        - `plot(m.out, type = "hist")`
            - <img src="./img/propensitymatching_R_3.png" >
    - Use R package `Match` to perform matching using greedy matching 
        - Match on "logit(propensity_score)" WITHOUT a caliper
            - `psmatch <- Match(Tr =mydata$treatment, M = 1, X = logit(pscore), replace = FALSE)`
            - `matched <- mydata[unlist(psmatch[c("index.treated", "index.control")]),]`
            - `xvars <- c("X1","X2","X3")`
            - `matchedtab1 <- CreateTableOne(vars = xvars, strata = "treatment", data = matched, test = FALSE)`
        - Match on "logit(propensity_score)" WITH a caliper
            - `psmatch <- Match(Tr =mydata$treatment, M = 1, X = logit(pscore), replace = FALSE, caliper = 0.2)`
            - `matched <- mydata[unlist(psmatch[c("index.treated", "index.control")]),]`
            - `xvars <- c("X1","X2","X3")`
            - `matchedtab1 <- CreateTableOne(vars = xvars, strata = "treatment", data = matched, test = FALSE)`
    - With matched data, create TableOne comparison of SMD
        - `CreateTableOne(vars = xvars, strata = "treatment", data = matched, test = FALSE)`
        - Note that the effective `n` paired samples will be smaller than the original data's  TableOne
- __Outcome analysis__
    - If we want a causal risk difference, carry out a paired t-test using vectors from matched data
        - `y_trt <- matched$died[matched$treatment == 1]`
        - `y_con <- matched$died[matched$treatment == 0]`
        - `diffy <- y_trt - y_con`
        - `t.test(diffy)`
    - McNemar test
        - `table(y_trt, y_con)`
        - `mcnemar.test(matrix(c(994,493,394,305),2,2))`, where index 2 and 3 elements are the discordant pairs.
            - 493 and 394 are the discordant pairs
    
    