SOCIOL 690S: Machine Learning in Causal Inference

Taught by Wenhao Jiang · Department of Sociology · Duke University · Fall 2025

Week 1 Introduction, Motivation, and Linear Regression

This week sets the stage for the course and introduces how and why Machine Learning (ML) can be integrated into causal inference.

Roadmap

Motivate the integration of statistical prediction with causal inference in response to the emergence of high-dimensional data and the need for flexible, non-linear modeling of covariates.
Review the statistical properties of the Conditional Expectation Function (CEF) and linear regression in a low-dimensional setting.
- The basic matrix formulation of linear regression is revisited.
Introduce the Frisch–Waugh–Lovell (FWL) Theorem as a partialling-out technique in linear regression.
Review asymptotic OLS inference and discuss issues with standard error estimation in high-dimensional settings.
Summarize the concept of Neyman Orthogonality as an extension of the FWL Theorem to motivate Double Machine Learning (DML) in high-dimensional settings.

Materials

Optional Reading: For students who wish to explore the asymptotic properties of OLS in greater depth, see the Week 1 Supplements on asymptotic inference. Models that satisfy Neyman Orthogonality retain the classic asymptotic properties required for valid statistical inference.

Week 2 Machine Learning Basics

Building on Week 1, where we introduced both the benefits and the challenges of high-dimensional data, this week focuses on regularization regression methods. These approaches address high dimensionality in order to improve out-of-sample prediction and strengthen statistical inference.

Roadmap

Review the motivation for using high-dimensional data in analysis, and examine the limitations of ordinary linear regression in high-dimensional settings.
Introduce regularization methods for handling high-dimensional data. We focus in particular on LASSO regression as a feature selection method under approximate sparsity, and Ridge regression for dense coefficient distributions. We also cover variants that combine LASSO and Ridge penalties.
Introduce cross-validation and plug-in methods for fine-tuning the penalty level in regularization.
Revisit the Frisch–Waugh–Lovell (FWL) Theorem and introduce Double LASSO for statistical inference in high-dimensional settings.
Present other LASSO-like methods that satisfy Neyman orthogonality for valid inference.
Demonstrate R implementations of regularization methods and Double LASSO, applying them to test the Convergence Hypothesis in Macroeconomics with high-dimensional data.

Materials

Week 3 Machine Learning Advanced

Building on Week 2, where we introduced linear regularization methods to address high-dimensional data, this week we turn to non-linear models in Machine Learning. These approaches are designed to capture flexible and complex relationships among covariates. Our focus will be on two broad classes: Tree-based Methods and Neural Networks, along with their key variants.

Roadmap

Formally introduce the concept of the bias-variance tradeoff and explain its role in tuning Machine Learning models.
Present classic Tree-based Methods, including Regression Trees, Bagging, Random Forests, and Boosted Trees, showing how each builds on the bias-variance tradeoff.
Introduce the foundational Neural Network framework and discuss the theoretical background of training a Neural Network model.

Materials

Week 4 Neyman Orthogonality and Potential Outcome Framework

Building on the Machine Learning methods introduced in the last two weeks, this week we focus on the Double Machine Learning (DML) approach in partial linear regression, where covariates may be high-dimensional. We formally justify DML using the concept of Neyman Orthogonality, a framework that ensures consistent estimation of the treatment effect even when nuisance functions are estimated with ML. We then connect DML to the potential outcomes framework in causal inference, introducing the key assumption of conditional ignorability, which links regression-based estimation to causal interpretation.

Roadmap

Formally introduce Neyman Orthogonality and explain why orthogonality is key to making ML-based nuisance estimates usable for valid inference in Double Machine Learning (DML)
Connect DML to the partial linear regression model with high-dimensional covariates. We explain the importance of hyperparameter tuning and cross-fitting in DML and demonstrate the technique based on the high-dimensional data we used to test the Convergence Hypothesis.
Link DML to the potential outcomes framework and conditional ignorability. We highlight how the regression-based approach ties to causal interpretation under ignorability.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Homework		Homework
Readings		Readings
Week 1 Motivation and Linear Regression		Week 1 Motivation and Linear Regression
Week 2 Machine Learning Basics		Week 2 Machine Learning Basics
Week 3 Machine Learning Advanced		Week 3 Machine Learning Advanced
Week 4 Neyman Orthogonality and Causal Inference Basics		Week 4 Neyman Orthogonality and Causal Inference Basics
Week 5 Causal Inference through DAG		Week 5 Causal Inference through DAG
.DS_Store		.DS_Store
.gitignore		.gitignore
GitHub.Rproj		GitHub.Rproj
MLCI SOC690S Machine Learning in Causal Inference.Rproj		MLCI SOC690S Machine Learning in Causal Inference.Rproj
README.md		README.md
Syllabus.pdf		Syllabus.pdf
duke_logo.png		duke_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SOCIOL 690S: Machine Learning in Causal Inference

Week 1 Introduction, Motivation, and Linear Regression

Roadmap

Materials

Week 2 Machine Learning Basics

Roadmap

Materials

Week 3 Machine Learning Advanced

Roadmap

Materials

Week 4 Neyman Orthogonality and Potential Outcome Framework

Roadmap

Materials

About

Uh oh!

Releases

Packages

Languages

rakeentanvir/CML

Folders and files

Latest commit

History

Repository files navigation

SOCIOL 690S: Machine Learning in Causal Inference

Week 1 Introduction, Motivation, and Linear Regression

Roadmap

Materials

Week 2 Machine Learning Basics

Roadmap

Materials

Week 3 Machine Learning Advanced

Roadmap

Materials

Week 4 Neyman Orthogonality and Potential Outcome Framework

Roadmap

Materials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages