Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Stephen Reed.
Lecture Tuesday and Thursday, 2:15pm–3:45pm, Zoom and/or MCK 204A
Lab Friday, 12:30pm–1:30pm Zoom
Office hours
- R for Data Science
- Introduction to Data Science (not available without purchase)
- The Elements of Statistical Learning
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
001 - Statistical learning foundations
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
- Model accuracy
- Loss for regression and classification
- The variance-bias tradeoff
- The Bayes classifier
- KNN
- Review
- The validation-set approach
- Leave-out-out cross validation
- k-fold cross validation
- The bootstrap
004 - Linear regression strikes back
- Returning to linear regression
- Model performance and overfit
- Model selection—best subset and stepwise
- Selection criteria
In between: tidymodels
-ing
- An introduction to preprocessing with
tidymodels
. (Kaggle notebook) - An introduction to modeling with
tidymodels
. (Kaggle notebook) - An introduction to resampling, model tuning, and workflows with
tidymodels
(Kaggle notebook) - Introduction to
tidymodels
: Follow up for Kaggle
(AKA: Penalized or regularized regression)
- Ridge regression
- Lasso
- Elasticnet
- Introduction to classification
- Why not regression?
- But also: Logistic regression
- Assessment: Confusion matrix, assessment criteria, ROC, and AUC
- Introduction to trees
- Regression trees
- Classification trees—including the Gini index, entropy, and error rate
- Introduction
- Bagging
- Random forests
- Boosting
- Hyperplanes and classification
- The maximal margin hyperplane/classifier
- The support vector classifier
- Support vector machines
000 Predicting sales price in housing data (Kaggle)
Help:
- A simple example/walkthrough
- Kaggle notebooks (from Connor Lennon)
001 Validation and out-of-sample performance
002 Cross validation, penalized regression, and tidymodels
Paper: Prediction Policy Problems
003 In class: MNIST image classification (with multiple classes!)
Topic and group due by 25 February 2021.
Final project submission due by midnight on March 10th.
- General "best practices" for coding
- Working with RStudio
- The pipe (
%>%
) - Cleaning and Kaggle follow up
001 - Data cleaning: Multiple mutations
- Creating a training and validation data set from your observations dataframe in R
- Writing a function to iterate over multiple models to test and compare MSEs
003 - Practice using tidymodels
- Cleaning data quickly and efficiently with
tidymodels
- R-script used in the lab
004 - Ridge, Lasso and Elasticnet Regressions in tidymodels
- Ridge, Lasso and Elasticnet regressions in
tidymodels
from start to finish with a new dataset. - Using the best model to then predict onto a test dataset.
005 - Forcing splits in tidymodels
and penalized regression
- Combining pre-split data together and then defining a custom split
- Running a Ridge, Lasso or Elasticnet logistic regression in
tidymodels
using a fresh dataset. - Predicting the model onto test data and then viewing the confusion matrix.
- RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
- YaRrr! The Pirate’s Guide to R (free online)
- UO library resources/workshops
- Eugene R Users
- Python Data Science Handbook by Jake VanderPlas
- Elements of AI
- Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube
- From Google:
- Geocomputation with R (free online)
- Spatial Data Science (free online)
- Applied Spatial Data Analysis with R