Course materials for Data Mining at the University of Washington's Professional and Continuing Education.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

BI TECH CP303 - Data Mining

  • Instructor: Erin Shellman, shellman@uw.edu
  • Teaching Assistant: Bryan Mayer, mayerbry@uw.edu
  • Course Location: Puget Sound Plaza in room 406 (map)
  • Course Time: Mondays 6:00 - 9:00 PM
  • Dates: April 4, 2016 through June 13, 2016

Welcome to data mining! This is an applied course meant to teach you practical tools for data mining and knowledge discovery. The course is composed of three units: prediction of continuous outcomes, classification, and unsupervised learning. The goal is to provide experience in a breadth of applications and to prepare you for the job of an analyst, data scientist, or any role that calls for data mining. If you already have previous experience with R or data mining, there are additional readings and techniques in the projects to challenge you and elevate your skills.

Course Grade

Grading is based on classroom participation, completion of homework and projects, and attendance. Students are required to attend 80% of the lectures to receive a passing grade.

Assignments

Projects

There are three projects, one in each topic area. For each project, you will receive a business problem and a corresponding data set. You're free to use any methods you like, so long as you support your choices. You will write a brief report of your analyses and provide/receive feedback from your classmates. You will have time in class to ask questions and work on your projects.

Critiques

When you turn in your project reports, you will receive the reports of three of your classmates. During the following week, read their reports and provide thoughts and feedback. Please write at least a paragraph discussing parts of the analyses you liked and disliked. While you're reading, try to put yourself into the mind of the business stakeholder and ask if your requests were adequately met. Are you confident in the conclusions drawn? Were the figures and supporting evidence compelling? Remember to maintain a tone of mutual respect and read the section on policies and values for more information.

Due Dates

Assignment Date
Project 1 May 9
Project 1 Critiques May 16
Project 2 May 23
Project 2 Critiques May 30
No class! May 30
Project 3 June 13
Last class! June 13

Textbook

There is no required textbook for this course. Everything you need to succeed is available in the course repository.

Policies and values

A large component of this course is in-class discussion and providing critical feedback on the analyses of your peers. It is imperative that all students are thoughtful when providing written feedback and participating in class. This means using respectful language in discussions and writings, but also being respectful of our limited class time by arriving prepared and engaged.

Everyone is required to do original work for all projects. You're free to openly discuss the projects and your approaches, just like you would in a professional setting, but reports should be your own.

Students with disabilities requiring addition services can find resources at the UW Disability Resources for Students page.

Topics

The lecture notes are available here.

Week Date Topic Dataset
1 April 4 Introduction to data mining and programming with R Capital Bikeshare: bikeshare_2015.tsv
2 April 11 Linear regression Capital BikeShare
3 April 18 Linear regression extensions Capital BikeShare
4 April 25 Flex-time, Logistic regression Twitter user data: bot_or_not.tsv
5 May 2 Classification trees Twitter user data
6 May 9 Classification Twitter user data
7 May 16 Association rules colleges.tsv
8 May 23 Clustering
9 June 6 Sharing your work None!
10 June 13 Guest panel None!

Software Installation

We'll be using the statistical programming language R for this course. In addition, I highly recommend that you use RStudio, a powerful interactive development environment (IDE) for R. If you plan to use your own laptop computer in class, please install R and RStudio on your laptop before the first day of class. The computers in the classroom will have everything you need installed.

  1. Download and install R
  2. Download and install RStudio