NOTE: check each week to see if its IN-DEV or READY for each week
These are the course materials for IS507 (previously IS542) at the iSchool, University of Illinois, Urbana-Champaign.
If you see any bugs or errors please issue a PR -- always looking to make things better!
Each week consists of lecture slides and prep-notebooks in the programming language R along with suggested resources and readings. The readings consist of what was "required" for the course as well as optional extra readings so feel free to take/leave what you'd like.
Below is the outline of the course, with links to the individual folders for each week. Each folder contains:
- an "README.md" file which lists the suggested readings, datasets and any extra resources for the week
- The lecture slides (as a pdf)
- Prep coding notebooks used for the "live coding" portions of class which generally happened after the lecture portion (.ipynb files) BUT sometimes we swtich between the lecture slides & R. This will be denoted in the slides as well as R.
- Any datasets used in the coding portion
Example syllabus here.
Week Link | Topic |
---|---|
Week 01 | Course Intro & Motivation, Intro to R |
Week 02 | Intro to Numerical Data, Intro to R |
Week 03 | Intro to Categorical Data, Table Proportions, and Probability Theory |
Week 04 | Random Variables, Continous Probability Distributions |
Week 05 | The Normal Distribution |
Week 06 | The Normal & Binomial Distributions |
Week 07 | Foundations for Inference; Hypothesis testing: Normal, T-distribution, and single proportions; differences of 2 means/proportions, paired data |
Week 08 | Hypothesis testing: ANOVA and models |
Week 09 | Fake Break! We'll do some fun stats stuff in Python. |
Week 10 | Linear Regression & Multiple Linear Regression |
Week 11 | Intro to classification & Logistic Regression |
Week 12 | Classification with KNN & Beginning Model Selection with CV & Bootstrapping |
Week 13 | Model Selection & Shrinkage Methods for Linear Regression |
Week 14 | Lasso Regression & CV; Intro to PCA; Course wrap-up |
This course is based off of the following textbooks:
- [OIS]: OpenIntro Statistics, 4th Edition which can be found on amazon or in free pdf form -- make sure you select "The Book" for free version.
- [ITR] An Introduction to R, available as a pdf.
- [ISL] An Introduction to Statistical Learning, click on the
Download the PDF
link here.
- [MIS] Intermediate Statistics with R (pdf link)
- [STOR390] STOR 390: Introduction to Data Science course page.
- Download R from the R-project webpage
- Courses were taught using RStudio which you can download right here
Totally optional: To run the Jupyter notebooks with R locally, install Anaconda and then please follow the instructions for installing R using the Anaconda Navigator. The easiest way to install packages is through the Anaconda GUI, or you can conda install -- either way you need to append an r-
to all packages!
NOTE: There are pay-versions of this software but we assume you are using the free versions.
- Add in a "who this course is for" section -- describe your typical iSchool student
- Add in photos of RStudio & label panels
- Link to Data ag install list?
- Add in pedagogy links and general references
- Add in "working on" as far as online teaching strategies -- what is currently working and is not
- add in course pre-reqs
- add in slides with lecture notes
- collapse answers to practice problems
- Add in instructor notes for all weeks pages
Weeks
- Week 02
- A-void notebook
- Week 03
- Add in extra GLM stuff, corrplot, links to datacamp
- Week 10
- ERROR: the
tabplot
library won't load in jupyter notebooks
- ERROR: the
- Week 15
- add in-person "real life" example using the motion data and KNN
- Week 16
- prep notebook for PCA isn't included as of now
Stretch Goals
- include bayesian stuffs as a bonus class
My background is in astrophysics (hydro simulations) so there will be an abundance of astronomy examples and space jokes. Also my spelling is atrocious. You have been warned.