# Usage in custom model 

## Via Inheritance 

The most straightforward way to use `formulaic-contrasts` with a custom model is to use {class}`~formulaic_contrasts.FormulaicContrasts`
as a base class or mixin class.

As an example, let's wrap an Ordinary Least Squares ({class}`~statsmodels.regression.linear_model.OLS`) linear model into a custom class
for the use with `formulaic-contrasts`. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input
allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast. 

This can be achived with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`
base class:

In [57]:
import formulaic_contrasts
import numpy as np
import statsmodels.api as sm


class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):
    def fit(self, variable: str):
        self.mod = sm.OLS(self.data[variable], self.design)
        self.mod = self.mod.fit()

    def t_test(self, contrast: np.ndarray):
        return self.mod.t_test(contrast)

Let's apply our model to an example dataset. The toy data mimicks a 2-arm clinical trial (`drugA` vs. `drugB`)
with Responders (`responder`) and Non-responders (`non_responder`) and a continuous biomarker that can protentially
predict response of the different treatments. 

In [65]:
df = formulaic_contrasts.datasets.treatment_response()
df

Unnamed: 0,treatment,response,biomarker
0,drugA,non_responder,6.595490
1,drugA,non_responder,7.071509
2,drugA,non_responder,8.537421
3,drugA,non_responder,6.787991
4,drugA,non_responder,10.109717
...,...,...,...
75,drugB,responder,11.167627
76,drugB,responder,9.493773
77,drugB,responder,5.027817
78,drugB,responder,9.800762


Let's fit the model an perform the statistical test

In [66]:
model = StatsmodelsOLS(df, "~ treatment * response")
model.fit("biomarker")
model.t_test(model.contrast("response", baseline="non_responder", group_to_compare="responder"))

<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0            -1.6492      0.935     -1.764      0.082      -3.512       0.213

## Via Composition

Alternatively, if you prefer to work without inheritance, you can use `FormulaicContrast` as an attribute. 
In this case, you need to define {func}`~formulaic_contrasts.FormulaicContrasts.cond`/{func}`~formulaic_contrasts.FormulaicContrasts.contrast` yourself, or provide a custom way to define contrasts, calling {func}`~formulaic_contrasts.FormulaicContrasts.cond` internally. A minimal implementation could look like:

In [69]:
import pandas as pd


class StatsmodelsOLS:
    def __init__(self, data: pd.DataFrame, design: str) -> None:
        self.data = data
        self.contrast_builder = formulaic_contrasts.FormulaicContrasts(data, design)

    def fit(self, variable: str):
        self.mod = sm.OLS(self.data[variable], self.contrast_builder.design)
        self.mod = self.mod.fit()

    def cond(self, **kwargs):
        return self.contrast_builder.cond(**kwargs)

    def contrast(self, *args, **kwargs):
        return self.contrast_builder.contrast(*args, **kwargs)

    def t_test(self, contrast: np.ndarray):
        return self.mod.t_test(contrast)

The result is the same:

In [70]:
model = StatsmodelsOLS(df, "~ treatment * response")
model.fit("biomarker")
model.t_test(model.contrast("response", baseline="non_responder", group_to_compare="responder"))

<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0            -1.6492      0.935     -1.764      0.082      -3.512       0.213

## Manual usage

You can also use the lower-level interface {func}`~formulaic_contrasts.get_factor_storage_and_materializer` to 
introspect formulaic models if the `FormulaicContrasts` class doesn't fit your needs. 

In [39]:
factor_storage, variables_to_factors, materializer_class = get_factor_storage_and_materializer()

`factor_storage` will keep track of *factors* used in the formula, while `variables_to_factors` will keep 
track of *variables* used in the formula, whenever a formula is materialized into a design matrix using the `materializer_class`. 

In [41]:
design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix("~ treatment * response")

In [46]:
pprint(factor_storage)

defaultdict(<class 'list'>,
            {'response': [FactorMetadata(name='response',
                                         reduced_rank=True,
                                         custom_encoder=False,
                                         categories=('non_responder',
                                                     'responder'),
                                         kind=<Kind.CATEGORICAL: 'categorical'>,
                                         drop_field='non_responder',
                                         column_names=('non_responder',
                                                       'responder'),
                                         colname_format='{name}[T.{field}]')],
             'treatment': [FactorMetadata(name='treatment',
                                          reduced_rank=True,
                                          custom_encoder=False,
                                          categories=('drugA', 'drugB'),
                          

In [43]:
variables_to_factors

defaultdict(set, {'treatment': {'treatment'}, 'response': {'response'}})

In [52]:
factor_storage, variables_to_factors, materializer_class = get_factor_storage_and_materializer()
design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(
    "~ biomarker + np.log(biomarker) + C(treatment, contr.treatment(base='drugB'))"
)

  result = getattr(ufunc, method)(*inputs, **kwargs)


In [54]:
pprint(factor_storage.keys())

dict_keys(['biomarker', 'np.log(biomarker)', "C(treatment, contr.treatment(base='drugB'))"])


In [56]:
pprint(variables_to_factors)

defaultdict(<class 'set'>,
            {'C': {"C(treatment, contr.treatment(base='drugB'))"},
             'biomarker': {'biomarker', 'np.log(biomarker)'},
             'contr.treatment': {"C(treatment, contr.treatment(base='drugB'))"},
             'np.log': {'np.log(biomarker)'},
             'treatment': {"C(treatment, contr.treatment(base='drugB'))"}})
