Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How/where does this compare to EconML #88

Closed
tszumowski opened this issue Dec 3, 2019 · 3 comments
Closed

Question: How/where does this compare to EconML #88

tszumowski opened this issue Dec 3, 2019 · 3 comments
Labels
discussion Discussion about causal inference and DoWhy's roadmap.

Comments

@tszumowski
Copy link

I've been experimenting with DoWhy recently and really enjoy the structure. I noticed some other libraries out there, such as EconML (mentioned here) and Uber's CausalML. I'll focus on EconML for this discussion, particularly because I see some recent PRs that brought in EconML's CATE estimator.

I'd just like to confirm the differences and overlap between DoWhy and EconML. Please let me know if I understand this correctly.

My Attempt at Comparisons

Let's start with DoWhy's structure:

  1. model (make assumptions),
  2. identify (find what to estimate given the assumptions),
  3. estimate
  4. refute (sensitivity and robustness checks).

1. Model

  • DoWhy: Provides ability to explicitly define complex causal graphs. Or alternately, (though not preferred?), define common confounders to assess.
  • EconML: I didn't see a means to define the causal graph other than through variable definition Y, T, X, W, Z

2. Identify

  • DoWhy: Hunts down causal effects using graph analysis and do-calculus
  • EconML: Not sure I saw this explicitly in the library?

3. Estimate

  • DoWhy: Backdoor, instrumental variables, and most recently do-sampling (Which is aweeesoome!)
  • EconML: Seems to be where EconML currently shines. There's a whole slew of approaches, many implementing approaches very recent ML research papers

4. Refute

  • DoWhy: Heavy focus on model validation with several methods
  • EconML: I didn't notice anything explicit.

Where they overlap

It seems to me they overlap most heavily in the Estimation section. That's where I saw some references here on the roadmap in bringing in EconML calls. Is that correct?

Question on estimators and terminology

I see EconML has the following called out for estimators:

  • Potential Outcomes
  • Structural Equations
  • CATE

What is the difference between the estimators listed above and the ones built into DoWhy? I was just struggling to connect the dots there.

Thank you.

@amit-sharma
Copy link
Member

thanks @tszumowski for starting this discussion. Yes, you are right---DoWhy and Econml overlap only in the estimate section. DoWhy implements the full process of causal reasoning including model, identify, estimate and refute. In comparison, Econml implements only the estimate step.

In designing DoWhy, we kept a focus on the "ideal" process of doing a causal analysis, which includes identification and more importantly, refutation so that modeling assumptions can be tested. The estimators in DoWhy currently are the standard estimators for causal inference. As you rightly point out, EconML has much more advanced estimators for estimating the conditional average treatment effect (CATE). This is why we are implementing an interface in DoWhy so that you can call Econml methods directly from dowhy's estimate function. Here's an experimental Jupyter notebook to see it in action.

For your second question, there are actually two considerations when designing a causal analysis. One is about the modeling framework, and the other is about the target estimand.

Potential outcomes and structural equations are ways to construct a causal model. Another way is to use structural causal model which is based on a graphical model. The differences between these frameworks are often a matter of detail (and of big academic debate). But in practice, both econml and dowhy are compatible with these different ways of expressing a causal model. In DoWhy specifically, we use the structural causal model framework in the identify step, and rely heavily on methods derived from the potential outcomes framework in the estimate step.

The other consideration is the target estimand: do you want an effect for the full population (average treatment effect, ATE) or for a specific population, e.g., conditioned on "Gender=Female", (conditional average treatment effect, CATE). EconML methods are designed to estimate CATE, which is a subject of active research. Most of DoWhy's methods focus on estimating the ATE so far, although we are extending some of the methods to also estimate CATE.

Hope this helps.

@amit-sharma amit-sharma added the discussion Discussion about causal inference and DoWhy's roadmap. label Dec 4, 2019
@tszumowski
Copy link
Author

Wow. @amit-sharma thank you for the fantastic summary. I'm only recently trying to ramp up on causal analysis, coming from a stats/ML/bayesian background. I am noticing what you mean by "big academic debate" as I was getting a bit lost in the various different methods. This clarified a lot for me.

Regarding the EconML integration, I ran into an issue when trying to run that notebook. I'll create a separate issue for that. Otherwise, this ticket can be closed. Thank you.

@tonyabracadabra
Copy link

we use the structural causal model framework in the identify step

What does it mean by this? Is the identify step used as input for the estimate step? How are we integrating the prior into the estimation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about causal inference and DoWhy's roadmap.
Projects
None yet
Development

No branches or pull requests

3 participants