edrubin / EC524W21 Public

Notifications You must be signed in to change notification settings
Fork 13
Star 18

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2021 Taught by Ed Rubin

18 stars 13 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
exam		exam
examples		examples
lab		lab
lecture		lecture
projects		projects
resources		resources
syllabus		syllabus
.gitignore		.gitignore
README.md		README.md

Repository files navigation

EC 524, Winter 2021

Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Stephen Reed.

Schedule

Lecture Tuesday and Thursday, 2:15pm–3:45pm, Zoom and/or MCK 204A

Lab Friday, 12:30pm–1:30pm Zoom

Office hours

Ed Rubin (Zoom): TBD
Stephen Reed (Zoom): TBD

Syllabus

Books

Required books

Suggested books

R for Data Science
Introduction to Data Science (not available without purchase)
The Elements of Statistical Learning

Lecture notes

000 - Overview (Why predict?)

Why do we have a class on prediction?
How is prediction (and how are its tools) different from causal inference?
Motivating examples

Formats .html | .pdf | .Rmd

001 - Statistical learning foundations

Why do we have a class on prediction?
How is prediction (and how are its tools) different from causal inference?
Motivating examples

Formats .html | .pdf | .Rmd

002 - Model accuracy

Model accuracy
Loss for regression and classification
The variance-bias tradeoff
The Bayes classifier
KNN

Formats .html | .pdf | .Rmd

003 - Resampling methods

Review
The validation-set approach
Leave-out-out cross validation
k-fold cross validation
The bootstrap

Formats .html | .pdf | .Rmd

004 - Linear regression strikes back

Returning to linear regression
Model performance and overfit
Model selection—best subset and stepwise
Selection criteria

Formats .html | .pdf | .Rmd

In between: tidymodels-ing

An introduction to preprocessing with tidymodels. (Kaggle notebook)
An introduction to modeling with tidymodels. (Kaggle notebook)
An introduction to resampling, model tuning, and workflows with tidymodels (Kaggle notebook)
Introduction to tidymodels: Follow up for Kaggle

005 - Shrinkage methods

(AKA: Penalized or regularized regression)

Ridge regression
Lasso
Elasticnet

Formats .html | .pdf | .Rmd

006 - Classification intro

Introduction to classification
Why not regression?
But also: Logistic regression
Assessment: Confusion matrix, assessment criteria, ROC, and AUC

Formats .html | .pdf | .Rmd

007 - Decision trees

Introduction to trees
Regression trees
Classification trees—including the Gini index, entropy, and error rate

Formats .html | .pdf | .Rmd

008 - Ensemble methods

Introduction
Bagging
Random forests
Boosting

Formats .html | .pdf | .Rmd

009 - Support vector machines

Hyperplanes and classification
The maximal margin hyperplane/classifier
The support vector classifier
Support vector machines

Formats .html | .pdf | .Rmd

Projects

000 Predicting sales price in housing data (Kaggle)

Help:

A simple example/walkthrough
Kaggle notebooks (from Connor Lennon)

001 Validation and out-of-sample performance

002 Cross validation, penalized regression, and tidymodels

Paper: Prediction Policy Problems

003 In class: MNIST image classification (with multiple classes!)

Class project

Outline of the project

Topic and group due by 25 February 2021.

Final project submission due by midnight on March 10th.

Lab notes

000 - Workflow and cleaning

General "best practices" for coding
Working with RStudio
The pipe (%>%)
Cleaning and Kaggle follow up

Formats .html | .pdf | .Rmd

001 - Data cleaning: Multiple mutations

Formats .html | .pdf | .Rmd

002 - Validation

Creating a training and validation data set from your observations dataframe in R
Writing a function to iterate over multiple models to test and compare MSEs

003 - Practice using tidymodels

Cleaning data quickly and efficiently with tidymodels
R-script used in the lab

004 - Ridge, Lasso and Elasticnet Regressions in tidymodels

Ridge, Lasso and Elasticnet regressions in tidymodels from start to finish with a new dataset.
Using the best model to then predict onto a test dataset.

005 - Forcing splits in tidymodels and penalized regression

Combining pre-split data together and then defining a custom split
Running a Ridge, Lasso or Elasticnet logistic regression in tidymodels using a fresh dataset.
Predicting the model onto test data and then viewing the confusion matrix.

Additional resources

R

RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
YaRrr! The Pirate’s Guide to R (free online)
UO library resources/workshops
Eugene R Users

Data Science

Python Data Science Handbook by Jake VanderPlas
Elements of AI
Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube
From Google:

Spatial data

About

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2021 Taught by Ed Rubin

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages