# Building contrasts

This section explains how to build contrasts with a model that implements the `formulaic-contrasts` API. We start 
with the simple example of a pairwise comparison, then moving into more complex contrasts featuring 
interaction terms. 

All examples shown here are still relatively simple and the contrast vectors could be reasonably built manually. 
However, the approach still works for design matrices that have dozens or even hundreds of columns due to many variables
and/or categories. 

:::{note}

`formulaic-contrasts` doesn't implement any statistcal tests, it just provides tools for conveniently 
building contrast vectors that can be used with various statistical models. 

:::

Let's consider the following toy dataset that mimicks a 2-arm clinical trial (`drugA` vs. `drugB`)
with Responders (`responder`) and Non-responders (`non_responder`): 

In [1]:
import formulaic_contrasts

df = formulaic_contrasts.datasets.treatment_response()
df

Unnamed: 0,treatment,response,biomarker
0,drugA,non_responder,6.595490
1,drugA,non_responder,7.071509
2,drugA,non_responder,8.537421
3,drugA,non_responder,6.787991
4,drugA,non_responder,10.109717
...,...,...,...
75,drugB,responder,11.167627
76,drugB,responder,9.493773
77,drugB,responder,5.027817
78,drugB,responder,9.800762


The measured biomarker has different means for responders and non-responders for each treatment, which suggests the biomarker 
could have predictive value.

In [2]:
df.groupby(["treatment", "response"]).agg("mean")

Unnamed: 0_level_0,Unnamed: 1_level_0,biomarker
treatment,response,Unnamed: 2_level_1
drugA,non_responder,6.791881
drugA,responder,5.142661
drugB,non_responder,4.720502
drugB,responder,10.282279


## Simple pairwise comparison

Arguably the most commonly used contrast is to compare between two levels of a categorical variable. 
For instance, we could 
investigate differences between responders and non-responders, independent of treatment by fitting the model 
`~ response + treatment` and then comparing the category `"responder"` in the column `response` with the category `"non_responder"`.
This can be achieved using the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` method. 

In [3]:
from formulaic_contrasts import FormulaicContrasts

mod = FormulaicContrasts(df, "~ response + treatment")

contrast = mod.contrast(
    column="response",
    baseline="non_responder",
    group_to_compare="responder",
)
contrast

Intercept                0.0
response[T.responder]    1.0
treatment[T.drugB]       0.0
Name: 0, dtype: float64

This is equivalent to the following {func}`~formulaic_contrasts.FormulaicContrasts.cond` call:

In [4]:
mod.cond(response="responder") - mod.cond(response="non_responder")

Intercept                0.0
response[T.responder]    1.0
treatment[T.drugB]       0.0
Name: 0, dtype: float64

## Comparison within a subset

Additionally, we could be interested in differences between responders and non-responders *in `drugB` only*. 
While we could subset the data before fitting the model, we could also fit the full model including an interaction term `~ response * treatment` and define the contrast such that it compares within drugB only. 

In [5]:
mod = FormulaicContrasts(df, "~ response * treatment")

contrast = mod.cond(treatment="drugB", response="responder") - mod.cond(
    treatment="drugB", response="non_responder"
)
contrast

Intercept                                   0.0
response[T.responder]                       1.0
treatment[T.drugB]                          0.0
response[T.responder]:treatment[T.drugB]    1.0
Name: 0, dtype: float64

## Drug/response interaction

Now, we are interested in the drug/response interaction, i.e. the difference of the differences between responders and non-responders 
in drugA and drugB, respectively. This is captured by the `response[T.responder]:treatment[T.drugB]` coefficient
in the design matrix. This is how we derive this contrast vector using `.cond`: 

In [6]:
mod = FormulaicContrasts(df, "~ response * treatment")

contrast = (
    mod.cond(treatment="drugB", response="responder")
    - mod.cond(treatment="drugB", response="non_responder")
) - (
    mod.cond(treatment="drugA", response="responder")
    - mod.cond(treatment="drugA", response="non_responder")
)
contrast

Intercept                                   0.0
response[T.responder]                       0.0
treatment[T.drugB]                          0.0
response[T.responder]:treatment[T.drugB]    1.0
Name: 0, dtype: float64