ENH: automatically building contrast/restriction matrices #4361

josef-pkt · 2018-03-19T17:39:03Z

We have several open issues for this. e.g. #852, #1668, #2362
The attempt that is currently most integrated with patsy is in an extension of #2687 (comment) extension merged in #4365 t_test_pairwise

There is also preliminary code all over the place, but I never managed to figure out a general structure and API

two possibilities using patsy formula factor information and specifying "base contrasts or encoding" explicitly

patsy

patsy attaches the contrast matrix, cm below, used to build the encoding, #2687 We don't currently use that information.
AFAICS:

dot product of cm and params provides base parameterization and corresponding cov_params
cov_params will be singular, and I guess we need to check the rank (?) for estimable contrasts/restrictions
based on restriction value (diff) and reparameterized cov_params (where we need only submatrix of cov_params), we can work on selected effects/constraints

arrays and explicit formulas

what do we need?
description of current encoding, e.g. is there a reference level, what is the label for the reference level
labels, column/params indices

target contrast/restriction
mnemonic all pairs, ....

current: ???

sandbox.regression.penalized contains 3 function like coef_restriction_meandiff for structured penalization or stochastic constraints.
sandbox.stats.multicomp contains 3 functions like contrast_all_one (which is one of my early attempts)
sandbox.stats.contrast_tools is a more general attempt based on multicomp
stats.contrats has pre-statsmodel contrastfromcols to check full rank that I never figured out AFAIR
tools.tools has pre-statsmodels isestimable which I also never figured out
special case multivariate with MANOVA tests (2-D params, col and row hypothesis), no multiple testing AFAIR

related: contrasts/constructed exog for prediction, LSMEANS, margins, ... #2167 and hypothesis tests, inference on linear restrictions of predicted means (like treatment effects)

TODO/Usage:

partially based on what I started in #2687

get term and factor info from design_info
"normalize", i.e. get a base parameterization for "means", i.e. coefficients
can be subset cov_params, so we are independent of other params in model?
construct target restriction/contrast matrix
t_test
multiple testing p-value correction, corrected confidence intervals (not available yet, e.g. SUMM/FAQ-D: simultaneous confidence intervals #2955)
reporting, summary and plots

update in #4365 I combined step 2 "normalize" and step 4 target restriction matrix into a combined restriction matrix that applies to the actual parameterization in the model and can directly be used with t_test and similar.

implementation detail: Where do we put these things?

Aside:
issue ??? suggest using prediction instead of coefficients. This would work for linear prediction (e.g. in GLM) but nonlinear prediction of response would have different effect. Predicted values are likely less close to normal distributed than coefficients, predicted values depend on values of other exog as in margins or predict_at.
prediction and LSMEANS runs into the same problem as here in that we need to construct the auxiliary exog (i.e. artificial sample reflecting the factor levels)

The text was updated successfully, but these errors were encountered:

josef-pkt · 2018-03-19T18:53:27Z

annoying implementation detail:

We need a sign convention.

When we have an ordered set of levels, e.g. A, B, C, do we compute A - B or B - A in the pair comparison?
This looks currently inconsistent across older versions in sandbox.
The labels assigned to the pairs should be consistent with this in literal interpretation,
e.g. label is A-B, but values are B - A
that is, dash should not interpreted as minus.

res0 = get_pairwise(model, 'C(DETERGENT)', method=['hommel'], ignore=False, factor_labels='A B C D'.split())
print(res0.result_frame)
 
         coef  std err         t     P>|t|  Conf. Int. Low  Conf. Int. Upp.  \
A-B  2.000000  1.44658  1.382572  0.216055       -1.539653         5.539653   
A-C  4.666667  1.44658  3.226001  0.018001        1.127014         8.206319   
A-D -3.666667  1.44658 -2.534715  0.044396       -7.206319        -0.127014   
B-C -2.666667  1.44658 -1.843429  0.114831       -6.206319         0.872986   
B-D  5.666667  1.44658  3.917286  0.007826        2.127014         9.206319   
C-D  8.333333  1.44658  5.760715  0.001193        4.793681        11.872986   

     pvalue-hommel reject-hommel  
A-B       0.216055         False  
A-C       0.072003         False  
A-D       0.133189         False  
B-C       0.216055         False  
B-D       0.039132          True  
C-D       0.007157          True

TukeyHSD displays with two columns group1 group2, and AFAICS (in what I'm doing now) does not create combined labels.
aside: AFAICS TukeyHSD does not create a results dataframe, no summary_frame. summary() returns a SimpleTable

josef-pkt · 2018-03-19T19:13:12Z

storing computation for using design_info design matrix for base coefficients.

this works manually, but we don't have a standalone t_test function to do this automatically (maybe #3570 or others that I cannot find)
I added a FakeModel in #3570 (comment) that works for this example

# ### try work using patsy design_info constrast_matrix

# In[70]:
res = model
cols = np.arange(3, 6)
cov1 = res._results.cov_params(column=cols)

# In[71]:
p1 = res._results.params[cols]

# In[80]:
p1f = cmm.dot(p1)
p1f

# In[81]:
cov1f = cmm.dot(cov1).dot(cmm.T)
cov1f

# In[75]:
cov1

# In[87]:
import statsmodels.sandbox.stats.multicomp as mc
c4 = -mc.contrast_allpairs(4)
c4

# In[91]:
eff = c4.dot(p1f)
eff

# In[92]:
cov4f = c4.dot(cov1f).dot(c4.T)
bse = np.sqrt(np.diag(cov4f))

# In[90]:
tt.effect

# In[102]:
from scipy import stats
stats.t.sf(np.abs(eff)/bse, res.df_resid)*2

# In[94]:
tt.pvalue

# In[100]:
tt.df_denom

# In[101]:
res.df_resid

josef-pkt · 2019-05-20T16:09:02Z

parking two list comprehensions, I don't know if they are useful somewhere

analogue of triu, all pairwise comparisons

>>> cat = ['a', 'b', 'c', 'ddd', 'e']
>>> cat = list(range(5))
>>> for x in [(i, j) for idx, i in enumerate(cat) for j in cat[idx + 1 :]]: print(x)
... 
(0, 1)
(0, 2)
(0, 3)
(0, 4)
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)

simple product of two lists

>>> [[i, j] for i in [0, 1] for j in ['a', 'b', 'c']]
[[0, 'a'], [0, 'b'], [0, 'c'], [1, 'a'], [1, 'b'], [1, 'c']]

josef-pkt · 2020-01-12T21:35:44Z

another case

testing: all levels are equal to mean
Even if we have k independent samples, the contrast is a linear combination of all variances. However, the model wald test, like t_test automatically compute the cov of an array of linear combinations. We can reuse this for some contrasts even if the intention is for standalone k-sample tests.

The contrast/restriction should be something like np.eye(k) - np.ones((k, k)) / k
(patsy has names for various encodings that correspond to different contrasts, maybe is Helmert)

If we want to reuse the models for standalone hypothesis test, then we control the structure of exog and we don't have to reverse engineer from patsy's design_info.

However, we can also just use basic linalg to get cov directly assuming we have cov_params = diag(var_means). Then cov_restriction is just C @ cov_params @ C'

Related: Can we handle trend test for ordered categorical exog, or equality null with trend alternative using the models/results? AFAIR, we are still missing standard (standalone) trend tests.
For independent samples, this would have a specific structure of one-sided hypothesis.

josef-pkt added type-enh comp-stats comp-tools labels Mar 19, 2018

This was referenced Mar 20, 2018

ENH: add t_test_pairwise to LikelihoodModelResults #4365

Merged

ENH collect tools for linear restrictions, constraints, contrasts #1668

Open

josef-pkt mentioned this issue Apr 29, 2018

ENH: define constraints by strings with wild cards #4568

Open

josef-pkt mentioned this issue May 20, 2019

REF: put contrast_allpairs directly in contrast, avoiding sandbox imports #5745

Closed

josef-pkt mentioned this issue Sep 5, 2022

SUMM/ENH/REF multiple comparison - overview, open, roadmap #8396

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: automatically building contrast/restriction matrices #4361

ENH: automatically building contrast/restriction matrices #4361

josef-pkt commented Mar 19, 2018 •

edited

josef-pkt commented Mar 19, 2018

josef-pkt commented Mar 19, 2018 •

edited

josef-pkt commented May 20, 2019

josef-pkt commented Jan 12, 2020

ENH: automatically building contrast/restriction matrices #4361

ENH: automatically building contrast/restriction matrices #4361

Comments

josef-pkt commented Mar 19, 2018 • edited

patsy

arrays and explicit formulas

josef-pkt commented Mar 19, 2018

josef-pkt commented Mar 19, 2018 • edited

josef-pkt commented May 20, 2019

josef-pkt commented Jan 12, 2020

josef-pkt commented Mar 19, 2018 •

edited

josef-pkt commented Mar 19, 2018 •

edited