ENH: added intervening_variable.py and tests. Updated docs accordingly #8733

codydance · 2023-03-15T17:40:48Z

added intervening variable analysis- classical sobel method and bootstrapping

Notes:

It is essential that you add a test when making code changes. Tests are not
needed for doc changes.
When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
When fixing a bug, you must add a test that would produce the bug in main and
then show that it is fixed with the new code.
New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
verify you changes are well formatted by running
```
git diff upstream/main -u -- "*.py" | flake8 --diff --isolated
```
assuming flake8 is installed. This command is also available on Windows
using the Windows System for Linux once flake8 is installed in the
local Linux environment. While passing this test is not required, it is good practice and it help
improve code quality in statsmodels.
Docstring additions must render correctly, including escapes and LaTeX.

josef-pkt · 2023-03-15T18:19:56Z

docs/source/release/version0.13.0.rst

+:class:`~statsmodels.stats.intervening_variable.InterveningVariables` creates 
+confidence intervals for the indirect effect using Sobel's classical method 
+as well as bootstrapping techniques.
+


this file is for 0.13 which was already released

This PR is a candidate for 0.15

I don't see any release notes after 0.13. Should I add a document?

The release notes are mostly autogenerated. So we don't add them until a release.

You could park this section at the bottom of the top comment of the PR. Then we can copy it over when we have the release note file.

(We should find some better way to add to release notes, but don't have one yet)

josef-pkt · 2023-03-15T18:28:17Z

Hi, thanks for the pull request

I'm not really familiar with details in this area.

How much does this differ from the current stats.mediation analysis?
Is there a possibility of common interface, code sharing, ... with it?

The mediation module has around 400 lines, so it would be possible to add it to that module, or at least make that the public access to this.

@kshedden Do you have time to look at this or comment on the background for this?

codydance · 2023-03-15T20:22:16Z

Hi @josef-pkt , thanks for checking-in.

The intent of these techniques are the same as those in stats.Mediation, though the implementation and background assumptions are very different. Confusingly, both are often referred to as 'Mediation Analysis'; however, I followed the convention of Mackinnon, et al. (2002) and reserved the name 'Mediation Analysis' to refer to the causal inference methods implemented in stats.Mediation and the name 'Intervening Variable Analysis' to refer to the product of coefficients methods I implemented. I prefer this differentiation because they really are different techniques with different applications and there is minimal code-sharing between the two implementations. In addition, stats.Mediation can be significantly expanded to include non-binary treatment and mediation variables, as well as sensitivity parameter analysis. My proposed module could also be significantly expanded to include ~14 other intervening variable techniques. In my view, it makes sense to keep them as separate modules.

Also, do I need to do anything about the statsmodels.statsmodels failure below? I'm not sure what that is indicating.

Cody

josef-pkt · 2023-03-15T20:41:05Z

you have 3 failures like

        cur_dir = os.path.dirname(os.path.abspath(__file__))
>       data = pd.read_csv(os.path.join(cur_dir, 'results', "mackinnon2008.csv",
                                        index_col='id'))
E       TypeError: join() got an unexpected keyword argument 'index_col'

code should be robust to several versions of pandas. I'm not familiar with the error here.

Also you have style pep-8 violations, mainly trailing whitespaces and lines too long
https://dev.azure.com/statsmodels/statsmodels-testing/_build/results?buildId=5061&view=logs&j=cee8c96f-7e65-5602-f593-266823630fd5&t=0bf77771-d04a-5c5f-6285-ba31ad6bef7d

To the content
I will have to read up to understand more of the background, and the distinction between the two approaches.
I recently downloaded some mediation articles to get an overview, but did not get around to reading them yet.

josef-pkt · 2023-03-15T20:55:35Z

ok, based on a very quick look

mediation is related to average treatment effect literature (statsmodels.treatment)
while the intervening variable analysis comes more from the system of equation, linear structural equations, multivariate modelling, path analysis literature.

Rubin versus Pearl?

It's better to keep them separate, but I'm not sure yet how we want to frame this.

codydance · 2023-03-15T22:41:55Z

Yes, that's right.

Pandas error and pep-8 are easy fixes. I'll update the code in the next several days.

Thanks...

…way indirect proportion value is calculated to more align with R packages

josef-pkt · 2023-03-16T19:59:06Z

more general thought, independently of merging this PR

I don't know yet where in which statsmodels structure this should be going. I always have problems finding names and categories for new folders.

Mediation analysis, traditional or treatment effect, seems to become much more popular in recent years.
eventually we need to get this out of stats and into a more dedicated folder.

I just did a quick google search for "mediation with instrumental variables" and there are a pretty large number of recent articles.
There is also a literature for mediation analysis for Poisson, Logit or similar.

roughly related areas that we have missing

SEM, system of equation (not a high priority)
IV parametric, currently minimal IV in sandbox for continuous treatment, nothing yet for binary treatment
IV parametric, equivalent for Poisson and other nonlinear (control function) (nothing yet)
treatment effect, non-parametric with ignorability basic linear outcome, binary treatment in treatment
IV-treatment effect non-parametric (nothing, Stata has it)
mediation what's here and in stats.mediation
IV-mediation ???

I don't know how to group and where to put those. #7691 for cases with IV or both treatment and outcome model.

(Aside: I'm against using the word "causal" only in the Rubin tradition. OLS and 2SLS are also "causal" with appropriate assumptions. "causal" != "non-parametrically identified average effect under ignorable unobserved heterogeneity" :)

codydance · 2023-03-16T20:40:10Z

I agree about 'causal', Rubin followers owning the word feels a bit elitist. There does seem to be a lot of literature coming out on the subject and it is quite technical.

I'll leave naming the folders to you :), though 'causal inference' seems plausible.

Anything else I need to do to merge this PR? statsmodels.statsmodels error doesn't seem to be from me.

josef-pkt · 2023-03-16T20:51:13Z

I have not looked much at your code yet. Overall looks good, but I have not checked the details yet.
(I have a big problem reading "black"ened parentheses and indentation. Skimming code takes much longer.)

codydance · 2023-03-16T20:54:00Z

Yeah, I don't like the black parentheses either, but the other code linter I have doesn't remove trailing whitespace and I was feeling lazy.

Please reach out if you have any questions!

josef-pkt · 2023-03-16T21:17:56Z

I always remove trailing whitespace with my code editor, when I see those in my git gui. (I usually just set a key shortcut to F12 )

josef-pkt · 2023-03-16T21:20:49Z

your non-black first commit looks much easier to read.

codydance · 2023-03-16T23:25:51Z

Sorry. I can go back and redo it without the black, if you like.

josef-pkt · 2023-03-17T14:05:13Z

I'm trying to think whether we want statsmodels.causal as an umbrella folder.
I wanted to avoid the word "causal" but it would be a useful umbrella for anything with ignorable or not ignorable endogeity, both traditional (parametric) and ATE/Rubin style, from heckman to treatment, mostly 2 equation or 2 stage models.
including IVPoisson or PoissonGMM.
(not sure whether this should also include endogenous missing, i.e. missing not completely at random. I guess yes.)
Throwing everything into causal would also work against having the name completely taken over by the Rubin tradition.
(I would have liked statsmodels.endogenity but we already got too many complaints about endog and exog)

(pure SEM, system of equations would still be outside of this, for now system of equation is only in linearmodels package)

I still don't like the word "causal" because it depends on identifying assumptions that we don't (or the model doesn't) know whether they hold, i.e. causality is an interpretation of the model or estimated effects. It does not really describe an algorithm or estimator. (The proper name for methods will be more explicit about the underlying assumptions on the data generating process and how it is handled, e.g. IVPoisson (parametric with endogenous regressors) or "non-parametrically identified ate under endogeneity with conditional independence or ignorability")
The only reason to use the name causal is name recognition.

josef-pkt · 2023-03-17T16:32:51Z

I'm still browsing literature (on various "causal" issues)

I think fit here should get a cov_type option, so we don't need to compute bootstrap standard errors unless requested.
Most likely there will be other options for how to compute standard errors, e.g. is it possible to compute heteroskedasticity or correlation robust standard errors?

codydance · 2023-03-17T17:40:44Z

Regarding names- I would also be hesitant to have this module and other pathway analysis stuff in a folder labelled 'causal' for the same reasons you mentioned. They are not causal analyses without further assumptions.

Regarding standard error- yes, there are many, many ways to compute the standard error in this context. See the Mackinnon paper referenced above for a good overview. However, in my view, the main point of this module is the confidence intervals, not the standard error. The bootstrap is central in this regard because it makes no assumptions about the distribution of the product of coefficients, while many of the other methods (including Sobel) assume normality (which isn't true).

I think that if this model were totally built out to include 14+ ways for computing the confidence interval for the indirect effect, then I would agree that fit() should take an argument specifying the method. For the time being, however, I think it's useful to be able to see all the information together. For what it's worth, the 3 R packages I have seen display all the information in a single table.

josef-pkt · 2023-03-17T21:49:00Z

I'm getting more in favor of statsmodels.causal as umbrella
I was just reading parts of Guido Imbens' nobel prize lecture and Jim Heckman on Haavelmo

We need "causal analysis", however that's defined. :)

codydance · 2023-03-17T22:32:37Z

I’m not familiar with that lecture! I’ll have to check it out.

…

On Friday, March 17, 2023, Josef Perktold ***@***.***> wrote: I'm getting more in favor of statsmodels.causal as umbrella I was just reading parts of Guido Imbens' nobel prize lecture and Jim Heckman on Haavelmo We need "causal analysis", however that's defined. :) — Reply to this email directly, view it on GitHub <#8733 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

codydance · 2023-04-05T17:16:20Z

Hi Josef, Just checking-in if there's anything I need to do here. Thanks!

…

On Fri, Mar 17, 2023 at 3:32 PM Cody Dance ***@***.***> wrote: I’m not familiar with that lecture! I’ll have to check it out. On Friday, March 17, 2023, Josef Perktold ***@***.***> wrote: > I'm getting more in favor of statsmodels.causal as umbrella > I was just reading parts of Guido Imbens' nobel prize lecture and Jim > Heckman on Haavelmo > > We need "causal analysis", however that's defined. :) > > — > Reply to this email directly, view it on GitHub > <#8733 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

codydance · 2023-05-12T18:28:41Z

Hi Josef, Are you planning to approve this? Is there anything that needs to be done? Cody

…

On Wed, Apr 5, 2023 at 10:16 AM Cody Dance ***@***.***> wrote: Hi Josef, Just checking-in if there's anything I need to do here. Thanks! On Fri, Mar 17, 2023 at 3:32 PM Cody Dance ***@***.***> wrote: > I’m not familiar with that lecture! I’ll have to check it out. > > On Friday, March 17, 2023, Josef Perktold ***@***.***> > wrote: > >> I'm getting more in favor of statsmodels.causal as umbrella >> I was just reading parts of Guido Imbens' nobel prize lecture and Jim >> Heckman on Haavelmo >> >> We need "causal analysis", however that's defined. :) >> >> — >> Reply to this email directly, view it on GitHub >> <#8733 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

ENH: added intervening_variable.py and tests. Updated docs accordingly

beffcce

josef-pkt reviewed Mar 15, 2023

View reviewed changes

improved pep-8 linting, improved testing, fixed testing bug, updated …

58674a1

…way indirect proportion value is calculated to more align with R packages

josef-pkt mentioned this pull request Mar 17, 2023

ENH/Design module structure for IV, GMM, endogeneity #7691

Open

josef-pkt mentioned this pull request Mar 17, 2023

ENH: path analysis and directed acyclic graphs DAG #8739

Open

josef-pkt mentioned this pull request Aug 10, 2023

Structural Equation Modelling (CB-SEM & PLS-SEM) #8966

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: added intervening_variable.py and tests. Updated docs accordingly #8733

ENH: added intervening_variable.py and tests. Updated docs accordingly #8733

codydance commented Mar 15, 2023

josef-pkt Mar 15, 2023

codydance Mar 16, 2023

josef-pkt Mar 16, 2023

josef-pkt commented Mar 15, 2023

codydance commented Mar 15, 2023 •

edited

josef-pkt commented Mar 15, 2023

josef-pkt commented Mar 15, 2023

codydance commented Mar 15, 2023

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023 •

edited

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023

josef-pkt commented Mar 16, 2023

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023

josef-pkt commented Mar 17, 2023 •

edited

josef-pkt commented Mar 17, 2023

codydance commented Mar 17, 2023 •

edited

josef-pkt commented Mar 17, 2023

codydance commented Mar 17, 2023 via email

codydance commented Apr 5, 2023 via email

codydance commented May 12, 2023 via email

ENH: added intervening_variable.py and tests. Updated docs accordingly #8733

Are you sure you want to change the base?

ENH: added intervening_variable.py and tests. Updated docs accordingly #8733

Conversation

codydance commented Mar 15, 2023

josef-pkt Mar 15, 2023

Choose a reason for hiding this comment

codydance Mar 16, 2023

Choose a reason for hiding this comment

josef-pkt Mar 16, 2023

Choose a reason for hiding this comment

josef-pkt commented Mar 15, 2023

codydance commented Mar 15, 2023 • edited

josef-pkt commented Mar 15, 2023

josef-pkt commented Mar 15, 2023

codydance commented Mar 15, 2023

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023 • edited

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023

josef-pkt commented Mar 16, 2023

josef-pkt commented Mar 16, 2023

codydance commented Mar 16, 2023

josef-pkt commented Mar 17, 2023 • edited

josef-pkt commented Mar 17, 2023

codydance commented Mar 17, 2023 • edited

josef-pkt commented Mar 17, 2023

codydance commented Mar 17, 2023 via email

codydance commented Apr 5, 2023 via email

codydance commented May 12, 2023 via email

codydance commented Mar 15, 2023 •

edited

codydance commented Mar 16, 2023 •

edited

josef-pkt commented Mar 17, 2023 •

edited

codydance commented Mar 17, 2023 •

edited