Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Ngan Tran.
Lecture Mondays and Wednesdays, 12:00p-1:20p (Pacific), 195 Ansett
Lab Friday, 12:00p–12:50p (Pacific), 360 Condon
Office hours
- Ed Rubin Mo., 3p–4p, PLC 530
- Ngan Tran We., 2p–3p, PLC 428
- R for Data Science
- Introduction to Data Science (not available without purchase)
- The Elements of Statistical Learning
- Data Science for Public Policy (ebook available through UO library)
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings Introduction in ISL
001 - Statistical learning foundations
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings
- Prediction Policy Problems by Kleinberg et al. (2015)
- ISL Ch1
- ISL Start Ch2
Supplements Unsupervised character recognization
- Model accuracy
- Loss for regression and classification
- The variance-bias tradeoff
- The Bayes classifier
- KNN
Readings
- ISL Ch2–Ch3
- Optional: 100ML Preface and Ch1–Ch4
- Review
- The validation-set approach
- Leave-out-out cross validation
- k-fold cross validation
- The bootstrap
Readings
- ISL Ch5
- Optional: 100ML Ch5
004 - Linear regression strikes back
- Returning to linear regression
- Model performance and overfit
- Model selection—best subset and stepwise
- Selection criteria
Readings
- ISL Ch3
- ISL Ch6.1
In between: tidymodels-ing
- An introduction to preprocessing with
tidymodels. (Kaggle notebook) - An introduction to modeling with
tidymodels. (Kaggle notebook) - An introduction to resampling, model tuning, and workflows with
tidymodels(Kaggle notebook) - Introduction to
tidymodels: Follow up for Kaggle
(AKA: Penalized or regularized regression)
- Ridge regression
- Lasso
- Elasticnet
Readings
- ISL Ch4
- ISL Ch6
- Introduction to classification
- Why not regression?
- But also: Logistic regression
- Assessment: Confusion matrix, assessment criteria, ROC, and AUC
Readings
- ISL Ch4
- Introduction to trees
- Regression trees
- Classification trees—including the Gini index, entropy, and error rate
Readings
- ISL Ch8.1–Ch8.2
- Introduction
- Bagging
- Random forests
- Boosting
Readings
- ISL Ch8.2
- Hyperplanes and classification
- The maximal margin hyperplane/classifier
- The support vector classifier
- Support vector machines
Readings
- ISL Ch9
010 - Dimensionality reduction and unsupervised learning
- MNIST dataset (machines with vision)
- K-means clustering
- Principal component analysis (PCA)
- UMAP
Planned projects
000 Predicting sales price in housing data (Kaggle)
Due: 20 April 2024 (submit on Canvas)
Help:
- A simple example/walkthrough
- Kaggle notebooks (from Connor Lennon)
001 Validation and out-of-sample performance
Due: 03 May 2024 (submit on Canvas)
002 Penalized regression, logistic regression, and classification
Due: 18 May 2024 (submit on Canvas)
003 Trees, ensembles, and imputation
Due: 28 May 2024 (submit on Canvas)
Topic due by midnight on 03 May 2024.
Final project submission due by midnight on 03 June 2024.
Prep materials
Previous take-home exam
Previous in-class exam
Note: I am not providing keys.
Take-home portion
Submit on Canvas by 10:15a Pacific on Friday, 14 June 2024
Weight: Approximately 25%
In-class portion
Friday (14 June 2024) at 10:15a–12:15p
Weight: Approximately 75%
Approximate/planned topics...
- General "best practices" for coding
- Working with RStudio
- The pipe (
%>%) - Cleaning and Kaggle follow up
001 - Workflow and cleaning (continued)
- Finish previous lab on
dplyr - Working with projects
- Using
dpylrandggplot2to make insightful visuals - How to fix a coding error
Housing data download
- Creating a training and validation data set from your observations dataframe in R
- Writing a function to iterate over multiple models to test and compare MSEs
003 - Practice using tidymodels
- Cleaning data quickly and efficiently with
tidymodels
Formats .html
004 - Practice using tidymodels (continued)
- An introduction to preprocessing with
tidymodels(refresher from last week) - An introduction to modeling with
tidymodels - An introduction to resampling, model tuning, and workflows with
tidymodels(will finish up next week)
005 - Summarizing tidymodels
- Summarizing
tidymodels - Combining pre-split data together and then defining a custom split
006 - Penalized regression in tidymodels + functions + loops
- Running a Ridge, Lasso or Elasticnet logistic regression in
tidymodels. - A short lesson in writing functions and loops in R)
007 - Finalizing a workflow in tidymodels: Example using a random forest
- Finalizing a workflow in
tidymodels: Example using a random forest - A short lesson in writing functions and loops in R (continued)
- NPR: Google's new AI chatbot made a $100 billion mistake in a demo ad
- NYT: Disinformation Researchers Raise Alarms About A.I. Chatbots
- NPR: She was denied entry to a Rockettes show — then the facial recognition debate ignited
- LA Times: Nobody knows how widespread illegal cannabis grows are in California. So we mapped them
- NYT: Can A.I. Write Recipes Better Than Humans? We Put It to the Ultimate Test
- ChatGPT
- Business Insider: List of exams ChatGPT has passed
- NPR: 'Everybody is cheating': Why this teacher has adopted an open ChatGPT policy
- How Should Schools Respond to ChatGPT?
- Energy Institute: Can ChatGPT Save the Planet?
- MIT Tech Review: Here’s how Microsoft could use ChatGPT
- NPR: This 22-year-old is trying to save us from ChatGPT before it changes writing forever
- NYT: How ChatGPT Hijacks Democracy
- NYT: Don’t Ban ChatGPT in Schools. Teach With It.
- NYT: How to Use ChatGPT and Still Be a Good Person
- NPR: A new AI chatbot might do your homework for you. But it's still not an A+ student
- NYT: The Brilliance and Weirdness of ChatGPT
- Military applications
A funny convsersation with ChatGPT about what is real.
Parts: 1 2 3 4
- UO library resources/workshops
- RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
- YaRrr! The Pirate’s Guide to R (free online)
- Eugene R Users
- Happy Git and GitHub for the useR by Jenny Bryan, the "STAT 545 TAs", and Jim Hester
- Python Data Science Handbook by Jake VanderPlas
- Elements of AI
- Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube
- From Google:
- Geocomputation with R (free online)
- Spatial Data Science (free online)
- Applied Spatial Data Analysis with R