Skip to content
No description, website, or topics provided.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
notebooks
.gitignore
readme.md
slides.pdf

readme.md

Causal Inference in Data Science

A computational introduction to causality and counterfactual reasoning with Python

Binder

This repository contains the exercises and data for the Causal Inference in Data Science Live Training. This training provides an invaluable, hands-on guide to applying causal inference in the wild to solve real-world data science tasks. Using an end-to-end example, we will walk through the process of posing a causal hypothesis, modeling our beliefs with causal graphs, estimating causal effects with the doWhy library in Python, and finally evaluating the soundness of our results. Rather than taking an abstract and mathematical approach to these steps, the focus of this training will be on accessible computational methods to practically answer causal questions in the context of a data science workflow.

And/or please do not hesitate to reach out to me directly via email at jondinu@gmail.com or over twitter @jonathandinu

If you find any errors in the code or materials, please open a Github issue in this repository

What you'll learn-and how you can apply it

  • Understand how to reason causally and why it is necessary for modern data science.
  • Use the doWhy library to build, estimate, and evaluate causal models.
  • Learn how to practically apply causal inference to real-world data science problems.

This training course is for you because...

  • You have taken an introductory data science course or statistics course but want to take the next step to understand the foundations of causal inference and how to effectively apply the theory to real-world problems.
  • You have heard about the power of causal reasoning, but do not know how to get started learning its basics or applying it to your own problems.
  • You are an aspiring data scientist looking to break into the field and need to learn the practical skills necessary for what you will encounter on the job.
  • You are a quantitative researcher interested in applying theory to real projects by taking a computational approach to causal inference.
  • You are a software engineer interested in leveraging analytics to augment your application development process.

Prerequisites

  • Experience with an object-oriented programming language, e.g., Python (all code demos during the training will be in Python)
  • Familiarity with basic probability and statistics (e.g. distributions and hypothesis testing).
  • A working knowledge of the scientific Python libraries (numpy, pandas and scikit-learn) is helpful but not required.

Course Set-up

Download the appropriate Python 3.7 Anaconda Distribution for your operating system: https://www.anaconda.com/distribution/

Recommended Preparation

Data

Schedule/Outline

The time frames are only estimates and may vary according to how the class is progressing

Identifying Causal Effects (50min)

  • Randomized Control Trials
  • Counterfactuals and Potential Outcomes
  • Causal Graphical Models

Estimating Causal Effects (50min)

  • Propensity Score Matching
  • Instrument Variables
  • Causal Effect Inference with Machine Learning

Break 10 min

Evaluating Causal Models (30min)

  • Random Confounders and Placebos
  • Cross Validation
  • Sensitivity Analysis

Discovering Causal Structure (25min)

  • Guess and Test
  • Automated Graph Discovery
  • Q&A

Books

Courses

References

You can’t perform that action at this time.