-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: added intervening_variable.py and tests. Updated docs accordingly #8733
base: main
Are you sure you want to change the base?
Conversation
:class:`~statsmodels.stats.intervening_variable.InterveningVariables` creates | ||
confidence intervals for the indirect effect using Sobel's classical method | ||
as well as bootstrapping techniques. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is for 0.13 which was already released
This PR is a candidate for 0.15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any release notes after 0.13. Should I add a document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release notes are mostly autogenerated. So we don't add them until a release.
You could park this section at the bottom of the top comment of the PR. Then we can copy it over when we have the release note file.
(We should find some better way to add to release notes, but don't have one yet)
Hi, thanks for the pull request I'm not really familiar with details in this area. How much does this differ from the current stats.mediation analysis? The mediation module has around 400 lines, so it would be possible to add it to that module, or at least make that the public access to this. @kshedden Do you have time to look at this or comment on the background for this? |
Hi @josef-pkt , thanks for checking-in. The intent of these techniques are the same as those in stats.Mediation, though the implementation and background assumptions are very different. Confusingly, both are often referred to as 'Mediation Analysis'; however, I followed the convention of Mackinnon, et al. (2002) and reserved the name 'Mediation Analysis' to refer to the causal inference methods implemented in stats.Mediation and the name 'Intervening Variable Analysis' to refer to the product of coefficients methods I implemented. I prefer this differentiation because they really are different techniques with different applications and there is minimal code-sharing between the two implementations. In addition, stats.Mediation can be significantly expanded to include non-binary treatment and mediation variables, as well as sensitivity parameter analysis. My proposed module could also be significantly expanded to include ~14 other intervening variable techniques. In my view, it makes sense to keep them as separate modules. Also, do I need to do anything about the statsmodels.statsmodels failure below? I'm not sure what that is indicating. Cody |
you have 3 failures like
code should be robust to several versions of pandas. I'm not familiar with the error here. Also you have style pep-8 violations, mainly trailing whitespaces and lines too long To the content |
ok, based on a very quick look
Rubin versus Pearl? It's better to keep them separate, but I'm not sure yet how we want to frame this. |
Yes, that's right. Pandas error and pep-8 are easy fixes. I'll update the code in the next several days. Thanks... |
…way indirect proportion value is calculated to more align with R packages
more general thought, independently of merging this PR I don't know yet where in which statsmodels structure this should be going. I always have problems finding names and categories for new folders. Mediation analysis, traditional or treatment effect, seems to become much more popular in recent years. I just did a quick google search for "mediation with instrumental variables" and there are a pretty large number of recent articles. roughly related areas that we have missing
I don't know how to group and where to put those. #7691 for cases with IV or both treatment and outcome model. (Aside: I'm against using the word "causal" only in the Rubin tradition. OLS and 2SLS are also "causal" with appropriate assumptions. "causal" != "non-parametrically identified average effect under ignorable unobserved heterogeneity" :) |
I agree about 'causal', Rubin followers owning the word feels a bit elitist. There does seem to be a lot of literature coming out on the subject and it is quite technical. I'll leave naming the folders to you :), though 'causal inference' seems plausible. Anything else I need to do to merge this PR? statsmodels.statsmodels error doesn't seem to be from me. |
I have not looked much at your code yet. Overall looks good, but I have not checked the details yet. |
Yeah, I don't like the black parentheses either, but the other code linter I have doesn't remove trailing whitespace and I was feeling lazy. Please reach out if you have any questions! |
I always remove trailing whitespace with my code editor, when I see those in my git gui. (I usually just set a key shortcut to F12 ) |
your non-black first commit looks much easier to read. |
Sorry. I can go back and redo it without the black, if you like. |
I'm trying to think whether we want (pure SEM, system of equations would still be outside of this, for now system of equation is only in linearmodels package) I still don't like the word "causal" because it depends on identifying assumptions that we don't (or the model doesn't) know whether they hold, i.e. causality is an interpretation of the model or estimated effects. It does not really describe an algorithm or estimator. (The proper name for methods will be more explicit about the underlying assumptions on the data generating process and how it is handled, e.g. IVPoisson (parametric with endogenous regressors) or "non-parametrically identified ate under endogeneity with conditional independence or ignorability") |
I'm still browsing literature (on various "causal" issues) I think |
Regarding names- I would also be hesitant to have this module and other pathway analysis stuff in a folder labelled 'causal' for the same reasons you mentioned. They are not causal analyses without further assumptions. Regarding standard error- yes, there are many, many ways to compute the standard error in this context. See the Mackinnon paper referenced above for a good overview. However, in my view, the main point of this module is the confidence intervals, not the standard error. The bootstrap is central in this regard because it makes no assumptions about the distribution of the product of coefficients, while many of the other methods (including Sobel) assume normality (which isn't true). I think that if this model were totally built out to include 14+ ways for computing the confidence interval for the indirect effect, then I would agree that |
I'm getting more in favor of statsmodels.causal as umbrella We need "causal analysis", however that's defined. :) |
I’m not familiar with that lecture! I’ll have to check it out.
…On Friday, March 17, 2023, Josef Perktold ***@***.***> wrote:
I'm getting more in favor of statsmodels.causal as umbrella
I was just reading parts of Guido Imbens' nobel prize lecture and Jim
Heckman on Haavelmo
We need "causal analysis", however that's defined. :)
—
Reply to this email directly, view it on GitHub
<#8733 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Josef,
Just checking-in if there's anything I need to do here.
Thanks!
…On Fri, Mar 17, 2023 at 3:32 PM Cody Dance ***@***.***> wrote:
I’m not familiar with that lecture! I’ll have to check it out.
On Friday, March 17, 2023, Josef Perktold ***@***.***>
wrote:
> I'm getting more in favor of statsmodels.causal as umbrella
> I was just reading parts of Guido Imbens' nobel prize lecture and Jim
> Heckman on Haavelmo
>
> We need "causal analysis", however that's defined. :)
>
> —
> Reply to this email directly, view it on GitHub
> <#8733 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Hi Josef,
Are you planning to approve this? Is there anything that needs to be done?
Cody
…On Wed, Apr 5, 2023 at 10:16 AM Cody Dance ***@***.***> wrote:
Hi Josef,
Just checking-in if there's anything I need to do here.
Thanks!
On Fri, Mar 17, 2023 at 3:32 PM Cody Dance ***@***.***> wrote:
> I’m not familiar with that lecture! I’ll have to check it out.
>
> On Friday, March 17, 2023, Josef Perktold ***@***.***>
> wrote:
>
>> I'm getting more in favor of statsmodels.causal as umbrella
>> I was just reading parts of Guido Imbens' nobel prize lecture and Jim
>> Heckman on Haavelmo
>>
>> We need "causal analysis", however that's defined. :)
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#8733 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AMZZLQKEKNEUAMXPZLRJ5D3W4TL5PANCNFSM6AAAAAAV4FJLZE>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
added intervening variable analysis- classical sobel method and bootstrapping
Notes:
needed for doc changes.
then show that it is fixed with the new code.
verify you changes are well formatted by running
flake8
is installed. This command is also available on Windowsusing the Windows System for Linux once
flake8
is installed in thelocal Linux environment. While passing this test is not required, it is good practice and it help
improve code quality in
statsmodels
.