Skip to content

panktiHT/DSCI_571_sup-learn-1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DSCI 571: Supervised Learning I

Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.

2020-21 instructor: Varada Kolhatkar

Course Learning Outcomes

By the end of the course, students are expected to be able to:

  • describe supervised learning and identify what kind of tasks it is suitable for;
  • explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
  • identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
  • describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
  • use Python and the scikit-learn package to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.

Deliverables

The following deliverables will determine your course grade:

Assessment Weight
Lab Assignment 1 15%
Lab Assignment 2 15%
Lab Assignment 3 15%
Lab Assignment 4 15%
Quiz 1 20%
Quiz 2 20%

Class Meetings

We will be meeting three times every week: twice for lectures and once for the lab.

Lecture format

Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.

Lecture Schedule

Lecture Topic Datasets Resources and optional readings
Motivation and course information
  • Indian Liver Patient Records
  • House Sales in King County
  • IMDB movie reviews
  • 1 Terminology, baselines, decision trees
  • House Sales in King County
  • Canada US cities toy dataset
  • 2 ML fundamentals
  • Canada US cities toy dataset
  • 3 kNNs, SVM RBF
  • Canada US cities toy dataset
  • Spotify Song Attributes
  • 4 Preprocessing and pipelines
  • Spotify Song Attributes
  • California Housing
  • 5 Categorical features and text features
  • The adult census dataset
  • 6 Hyperparameter optimization, optimization bias
  • The adult census dataset
  • 7 Naive Bayes
  • SMS Spam Collection Dataset
  • Conditional probability visualization
    8 Logistic Regression, multi-class classification
  • SMS Spam Collection Dataset
  • Installation

    We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

    conda env create -f env-dsci-571.yaml
    conda activate 571
    

    In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter.

    Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

    Reference Material

    Books

    Online courses

    Misc

    Policies

    Please see the general MDS policies.

    About

    No description, website, or topics provided.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages

    • Jupyter Notebook 99.9%
    • Python 0.1%