Statistical Learning and Data Mining (QBUS6810) at the University of Sydney Business School.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Statistical Learning and Data Mining (QBUS6810)

Marcel Scharth, The University of Sydney

This is a repository for the Jupyter Notebooks and code used in Statistical Learning Data Mining, postgraduate unit at the University of Sydney Business School. I additionally provide the lectures in case you need them for future reference.

This version: Semester 2, 2017.

Tutorials in Python

Tutorial 1: Working with Data in Python
Tutorial 2: K-Nearest Neighbours Regression
Tutorial 3: Regression Modelling
Tutorial 4: Cross Validation
Tutorial 5: The Bootstrap
Tutorial 6: Linear Model Selection and Regularisation
Tutorial 7: Naive Bayes and Sentiment Analysis
Tutorial 8: Logistic Regression and Gaussian Discriminant Analysis
Tutorial 9: Regression Splines
Tutorial 10: Regression Trees
Tutorial 11: Model Stacking
Tutorial 12: Credit Risk Modelling


Module 1: Introduction to Statistical Learning
Module 2: Linear Regression and Statistical Thinking
Module 3: K-Nearest Neighbours Regression
Module 4: Regression Modelling
Module 5: Model Selection
Module 6: The Bootstrap
Module 7: Estimation Methods (reference module)
Module 8: Linear model Selection and Regularisation I
Module 9: Linear model Selection and Regularisation II
Module 10: Classification I
Module 11: Classification II
Module 12: Nonlinear Modelling
Module 13: Tree-based Methods
Module 14: Model Stacking
Module 15: Boosting

Acknowledgement: these lectures use figures from Introduction to Statistical Learning and Elements of Statistical Learning (see below).



An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.

The lectures and tutorials also draw on material from:

The Elements of Statistical Learning by Trevor Hastie and Robert Tibshirani.

Statistical Methods in Customer Relationship Management by V. Kumar and J. Andrew Petersen.

Machine Learning: A Probabilistic Perspective by Kevin P. Murphy.

Mathematical Statistics with Resampling and R by Laura M. Chihara and Tim C. Hesterberg.

Other resources

Students are highly encouraged to consider the following additional resources.

A Mind for Numbers: How to Excel at Math and Science by Barbara Oakley.

Dataquest (Python course online).

DataCamp (Python course online).

Learning Data Science (Kaggle Wiki)

Kaggle Kernels