ENH: add multilink models and distribution.dfamilies #7793

josef-pkt · 2021-10-12T21:25:33Z

closes #7778

This will be a continuation of #7778

now starting with distribution dfamilies
similar to genmod families but focused on MLE with scipy optimizers

I'm planning to add cleaner parts of #7778 based on MultiLinkModel to this PR

pep8speaks · 2021-10-12T21:25:37Z

Hello @josef-pkt! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-14 17:42:06 UTC

lgtm-com · 2021-10-12T22:09:39Z

This pull request introduces 1 alert when merging 20f8bc0 into a5ec3cb - view on LGTM.com

new alerts:

1 for Unreachable code

lgtm-com · 2021-10-13T14:55:21Z

This pull request introduces 1 alert when merging 438f79c into a5ec3cb - view on LGTM.com

new alerts:

1 for Unreachable code

josef-pkt · 2021-10-13T15:05:03Z

older scipy don't have stats.betabinom, e.g. scipy==1.3.3
based on github PR in scipy, betabinom was added in scipy 1.4.0
Fails on import of statsmodels/distributions/dfamilies/_discrete.py with AttributeError, distribution is a class attribute.

Not worth fixing with compat, we can increase min scipy version soon.

solution for now: add attribute to scipy stats.betabinom = None

otherwise on pep-8 style check failure

josef-pkt · 2021-10-13T16:01:21Z

all green here
pre fails in statespace test_dynamic_factor_mq

lgtm-com · 2021-10-13T16:05:23Z

This pull request introduces 1 alert when merging a523aad into a5ec3cb - view on LGTM.com

new alerts:

1 for Unreachable code

josef-pkt · 2021-10-13T16:45:18Z

funny idea: constant like, ilink/transformation function that returns a constant.

That would be one way of encoding args with fixed value. But there are no params in that case, i.e. linpred would be empty.
I'm not sure that will get less messy than transform_params handling as in current TModel.

josef-pkt · 2021-10-13T17:06:49Z

We need a PredictMixin with fittedvalues, resid and related, that we can use for models where the first parameter is the mean.
Symmetric distributions on R, GLM type distribution with mean-dispersion parameterization, and other models that have a mean parameterization like Beta and BetaBinomial.

For other models we have to decide whether to provide a predicted mean. Not too difficult for scipy distributions.
Very difficult for flexible distributions on R+ that don't have simple expressions for mean and are focused on quantiles.
Asymmetric distributions like johnsonsu might also be difficult (I don't remember details for that.). That will often apply to flexible distributions generated by transformations.

lgtm-com · 2021-10-13T18:06:45Z

This pull request introduces 1 alert when merging 2c30dea into 123cca1 - view on LGTM.com

new alerts:

1 for Unreachable code

josef-pkt · 2021-10-13T18:07:04Z

I copied most parts over from #7778, not included is the Het base model. This is mostly obsolete with the more flexible MultiLinkModel. We might create a subclass for MultiLinkModel for loc/mean - scale HET models, mainly for name recognition and start_params.
(depends on whether MultiLinkModel will be refactored to have only one explicit link/exog. That's still open)

This still includes the full THet model because it's the only one with a fixed arg

adjustments are for changed paths and subclassing MultiLinkModel instead of Het Model

josef-pkt · 2021-10-13T18:09:42Z

a unit test on one machine fails statsmodels\regression\tests\test_quantile_regression.py:319
No seed for random numbers in test_nontrivial_singular_matrix

josef-pkt · 2021-10-13T19:07:03Z

Stata has hetregress, MLE and 2 step FGLS, only since version 15. I only have version 14
My results agree with their first doc example at print precision.
But I cannot check the other things that are not in the summary.
hetregress has vce(robust) option

…s#7778

josef-pkt · 2021-10-13T20:38:42Z

In R: it looks like https://search.r-project.org/CRAN/refmans/crch/html/crch.html looks useful
censored regression (Tobit) with heteroscedasticity
uses two part formula y ~ x1 | x2

estimate by MLE or another method that I never heard about.

josef-pkt · 2021-10-17T19:34:24Z

large sample results compared to GLSHet are in #647 (comment)
agreement is good.

josef-pkt · 2021-10-18T21:08:46Z

parking example dataset for two-way anova (while shutting down windows)

10 39 0-26 5 6 0-83 8 16 0-50 3 12 0-25 23 62 0-37 53 74 0-72 10 30 0-33 22 41 0-54 23 81 0-28 55 72 0-76 8 28 0-29 15 30 0-50 26 51 0-51 32 51 0-63 23 45 0.51 32 51 0-63 17 39 0-44 46 79 0*58 0 4 0-00 3 7 0-43 10 13 0-77

from Martin J. Crowder, 1978, Beta-Binomial Anova for Proportions

josef-pkt · 2021-10-21T14:55:54Z

statsmodels/othermod/base_model.py

+        # TODO: here or in __init__
+        self.k_vars = self.exog.shape[1]
+        self.k_params = (self.exog.shape[1] + self.exog_scale.shape[1] +
+                         self.k_extra)


this assumes we don't have exog for the extra

lgtm-com · 2021-10-21T16:21:48Z

This pull request introduces 2 alerts when merging 76e9b75 into 63a32e9 - view on LGTM.com

new alerts:

2 for Unreachable code

josef-pkt · 2021-10-25T15:54:19Z

If we want to generically support a Binomial count MLE model, then we would need that the MultiLinkModel supports also the case when we have only a single darg, link and exog.

Aside Zipf/Zipfian distribution is also a one parameter count distribution, with possibly finite support 0, ... n

josef-pkt · 2021-12-02T16:45:25Z

decision: (i)links go into the dfamilies
two reasons:

reduce signature of model.__init__, no *args or list of links needed as separate argument (exogs and offsets make the signature already messy enough), _init_keys only needs dfamily
easier to define default links that are specific to family or groups of families. (e.g. NBP needs exp (log-link) for dispersion parameter, GPP can use identity link)

I might still add a model.links attribute as shortcut and consistency with models that don't have (d)families.

I'm still not sure how to handle a variable number of exogs and offsets. If I put them in a list or tuple, then super in base datahandling will not handle missing values across data arrays.
If I put them all in **kwargs, then the subclass needs to define a list or tuple of exog.
I used this already for score_test in multi parameter (dargs) models.

As reference implementation:
Stata's parametric survival models (like weibull) allow exog_xxx for the additional parameters, i.e. they are multi-exog models.
Has predefined links, no link option.

josef-pkt · 2021-12-02T18:32:45Z

I think the way forward here is to keep this PR, and make new PRs with refactored versions.
And then decide which version to merge.

josef-pkt · 2022-05-09T16:48:50Z

postponed for now, likely for 0.15

(I had applied for a small grant, but my proposal wasn't good enough to get it)

josef-pkt added comp-distributions type-enh comp-othermod labels Oct 12, 2021

josef-pkt force-pushed the multilink branch from 20f8bc0 to 438f79c Compare October 13, 2021 14:11

josef-pkt added 2 commits October 13, 2021 11:09

ENH: add distribution.dfamilies

0197fec

Compat/TST: handle betabinom not available in scipy < 1.4.0

a523aad

josef-pkt force-pushed the multilink branch from 438f79c to a523aad Compare October 13, 2021 15:21

ENH: add Gamma family, start unittest for derivatives

5f438e1

josef-pkt mentioned this pull request Oct 13, 2021

ENH: add dispersion models, heteroscedasticity with full MLE #7778

Open

ENH: add MultiLinkModel and dispersion models, copied from statsmodel…

f330c28

…s#7778

josef-pkt mentioned this pull request Oct 13, 2021

ENH: censored and truncated models with heteroscedasticity with loc - scale #7797

Open

ENH: add experimental _MultiLinkFamilyModel

1fb1998

josef-pkt force-pushed the multilink branch from 2c30dea to 1fb1998 Compare October 14, 2021 17:42

This was referenced Oct 15, 2021

DOC: example oneway or multiway anova, GaussianHet, HC cov_type #7803

Open

GLSHet status ? #647

Open

josef-pkt mentioned this pull request Oct 18, 2021

ENH: Het Models with heteroscedasticity depending on mean #7807

Open

josef-pkt commented Oct 21, 2021

View reviewed changes

BUG/REF: add df_null, fix df_model

76e9b75

josef-pkt mentioned this pull request Oct 25, 2021

ENH: Diagnostic class rebased #7597

Merged

This was referenced Oct 27, 2021

ENH: score_test for multi-link and multi-parameter (multi distribution args) models #7821

Open

SUMM roadmap 0.14 josef #7720

Open

This was referenced Nov 11, 2021

ENH: add predict and get_prediction for which="cdf" #7873

Open

ENH: overdispersed Binomial, Beta-Binomial #2632

Open

josef-pkt mentioned this pull request Nov 27, 2021

ENH/REF: Scoretest betareg #7907

Merged

josef-pkt mentioned this pull request Dec 2, 2021

ENH/Design: Mixin, Subclasses for censoring, truncation, .... #7922

Open

josef-pkt mentioned this pull request Feb 21, 2022

ENH: discrete models with identity link #8144

Open

josef-pkt added this to the 0.15 milestone Jun 22, 2022

josef-pkt mentioned this pull request Mar 26, 2023

ENH: Are there any options to allow something like a GLS or WLS in the Ordered Models? #8759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add multilink models and distribution.dfamilies #7793

ENH: add multilink models and distribution.dfamilies #7793

josef-pkt commented Oct 12, 2021 •

edited

pep8speaks commented Oct 12, 2021 •

edited

lgtm-com bot commented Oct 12, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 13, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 13, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 17, 2021

josef-pkt commented Oct 18, 2021

josef-pkt Oct 21, 2021

lgtm-com bot commented Oct 21, 2021

josef-pkt commented Oct 25, 2021 •

edited

josef-pkt commented Dec 2, 2021 •

edited

josef-pkt commented Dec 2, 2021

josef-pkt commented May 9, 2022

ENH: add multilink models and distribution.dfamilies #7793

Are you sure you want to change the base?

ENH: add multilink models and distribution.dfamilies #7793

Conversation

josef-pkt commented Oct 12, 2021 • edited

pep8speaks commented Oct 12, 2021 • edited

Comment last updated at 2021-10-14 17:42:06 UTC

lgtm-com bot commented Oct 12, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 • edited

josef-pkt commented Oct 13, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 13, 2021

lgtm-com bot commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 • edited

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 13, 2021 • edited

josef-pkt commented Oct 13, 2021

josef-pkt commented Oct 17, 2021

josef-pkt commented Oct 18, 2021

josef-pkt Oct 21, 2021

Choose a reason for hiding this comment

lgtm-com bot commented Oct 21, 2021

josef-pkt commented Oct 25, 2021 • edited

josef-pkt commented Dec 2, 2021 • edited

josef-pkt commented Dec 2, 2021

josef-pkt commented May 9, 2022

josef-pkt commented Oct 12, 2021 •

edited

pep8speaks commented Oct 12, 2021 •

edited

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 13, 2021 •

edited

josef-pkt commented Oct 25, 2021 •

edited

josef-pkt commented Dec 2, 2021 •

edited