DSCI 571: Supervised Learning I

Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.

2020-21 instructor: Varada Kolhatkar

Course Learning Outcomes

By the end of the course, students are expected to be able to:

describe supervised learning and identify what kind of tasks it is suitable for;
explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
use Python and the scikit-learn package to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.

Deliverables

The following deliverables will determine your course grade:

Assessment	Weight
Lab Assignment 1	15%
Lab Assignment 2	15%
Lab Assignment 3	15%
Lab Assignment 4	15%
Quiz 1	20%
Quiz 2	20%

Class Meetings

We will be meeting three times every week: twice for lectures and once for the lab.

Lecture format

Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.

Lecture Schedule

Lecture	Topic	Datasets	Resources and optional readings
	Motivation and course information	Indian Liver Patient Records House Sales in King County IMDB movie reviews
1	Terminology, baselines, decision trees	House Sales in King County Canada US cities toy dataset
2	ML fundamentals	Canada US cities toy dataset
3	kNNs, SVM RBF	Canada US cities toy dataset Spotify Song Attributes
4	Preprocessing and pipelines	Spotify Song Attributes California Housing
5	Categorical features and text features	The adult census dataset
6	Hyperparameter optimization, optimization bias	The adult census dataset
7	Naive Bayes	SMS Spam Collection Dataset	Conditional probability visualization
8	Logistic Regression, multi-class classification	SMS Spam Collection Dataset

Installation

We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

conda env create -f env-dsci-571.yaml
conda activate 571

In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter.

Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

Reference Material

Books

A Course in Machine Learning (CIML) by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
The Elements of Statistical Learning (ESL)
Data Mining: Practical Machine Learning Tools and Techniques (PMLTT)
Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!).

Online courses

Mike's CPSC 330
Mike is currently teaching an undergrad course on applied machine learning. Unlike DSCI 571, CPSC 330 is a semester-long course but there is a lot of overlap and sharing of notes between these courses. You might find the course useful.
Mike's CPSC 340
Machine Learning (Andrew Ng's famous Coursera course)
Foundations of Machine Learning online course from Bloomberg.
Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng's course to Python, also relevant for DSCI 561, 572, 563)

Misc

A Visual Introduction to Machine Learning (Part 1)
A Few Useful Things to Know About Machine Learning (an article by Pedro Domingos)
Metacademy (sort of like a concept map for machine learning, with suggested resources)
Machine Learning 101 (slides by Jason Mayes, engineer at Google)

Policies

Please see the general MDS policies.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
labs		labs
lectures		lectures
README.md		README.md
env-dsci-571.yaml		env-dsci-571.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

labs

labs

lectures

lectures

README.md

README.md

env-dsci-571.yaml

env-dsci-571.yaml

Repository files navigation

DSCI 571: Supervised Learning I

Course Learning Outcomes

Deliverables

Class Meetings

Lecture format

Lecture Schedule

Installation

Reference Material

Books

Online courses

Misc

Policies

About

Releases

Packages

Languages

panktiHT/DSCI_571_sup-learn-1

Folders and files

Latest commit

History

Repository files navigation

DSCI 571: Supervised Learning I

Course Learning Outcomes

Deliverables

Class Meetings

Lecture format

Lecture Schedule

Installation

Reference Material

Books

Online courses

Misc

Policies

About

Resources

Stars

Watchers

Forks

Languages