Skip to content

fdbesanto2/t81_577_data_science

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 

Repository files navigation

T81 577 Applied Data Science for Practitioners

Washington University in St. Louis

Instructor: Asim Banskota

Spring 2020, Wednesday, 6:00 PM - 9:00 PM , Cupples II, Room L015

Course Description

Organizations are rapidly transforming the way they ingest, integrate, store, serve data, and perform analytics. In this course, students will learn the steps involved with designing and implementing data science projects. Topics addressed include: ingesting and parsing data from various sources, dealing with messy and missing data, transforming and engineering features, building and evaluating machine learning models, and visualizing results. Using Python based tools such as Numpy, Pandas, and Scikit-learn, students will complete a practical data science project that addresses the entire design and implementation process. Students will also become familiar with the best practices and current trends in data science including code documentation, version control, reproducible research, pipeline automation, and cloud computing. Upon completion of the course, students will emerge equipped with data science knowledge and skills that can be applied from day one on the job.

Syllabus

Week Content
Week 1
1/15/2020
Introductions Assignment 1.1: Install anaconda and test Jupyter notebook
Assignment 1.2: AWS fundamentals
Week 2
1/22/2020
Python Fundamentals Assignment 2: Programming practice assignment
Week 3
1/29/2020
Coding Best Practices in Data Science Assignment 3.1.: Exercise of version control with git
Assignment 3.2. Exercise on code documentation and enforcing standards
Week 4
2/5/2020
Modeling Overview
  • 4.1. Types of models
    • 4.1.1. Descriptive/Prescriptive/Predictive
    • 4.1.2. Statistical vs Machine learning
    • 4.1.3. Blackbox vs Explainable
    4.2. Model development steps
    • 4.2.1. Framing questions
    • 4.2.2. Data ingestion and wrangling
    • 4.2.3. Data Preprocessing
    • 4.2.4. Model fitting and evaluation
    • 4.2.5 Model deployment
    • 4.2.6. Performance monitoring and redevelopment
Quiz Modeling Overview
Week 5
2/12/2020
Accessing Data
  • 5.1. Introduction to RESTful APIs
  • 5.2. Accessing data from API using request module and Postman
  • 5.3. Overview of JSON-formatted data
  • 5.4. Parsing JSON data
  • 5.5. Importing commonly used files formatted data
  • 5.6. Reading data from PostgreSQL database
Assignment 4: Finalization of final project topic and data set (Not graded)
Week 6
2/19/2020
Numpy/Pandas for Data Munging/Wrangling
  • 6.1. Pandas and numpy data structure
  • 6.2. Querying and reading data
  • 6.3. Reshaping, Indexing, slicing, and filtering data
  • 6.4. Join, Merge, and Aggregation
  • 6.5. Vectorization
  • 6.6. Basic statistics and plotting
Assignment 5: Data wrangling with Numpy and Pandas
Week 7
2/26/2020
Exploratory Data Analysis (EDA)
  • 7.1. Categorical vs numeric features
  • 7.2. Datatype conversion
  • 7.3. Sampling
  • 7.4. Data summary and distribution
  • 7.5. Patterns in data
  • 7.6. Data visualization using matplotlib, seaborn, and Bokeh
  • 7.7 Anomaly/outlier detection
Assignment 6: Patterns in data: Vizualization and data summary
Week 8
3/4/2020
Data Preprocessing
  • 8.1. Basics (select, filter, removal of duplicates)
  • 8.2. Data Transformation
  • 8.3. Standardization, Binning, Missing value treatments
  • 8.4 Balancing dataset
Assignment 6: Data preprocessing
Week 9
3/18/2020
Feature Transformation and Engineering
  • 9.1. Categorical encodings
  • 9.2. Feature creation/engineering
  • 9.3. Feature extraction
Assignment Transformation of categorical and continuous features
Week 10
3/25/2020
Building and Evaluating Models
  • 10.1. Tour of machine learning algorithms using scikit learn
  • 10.2. Introduction to Scikit-learn model development API
  • 10.3. Amazon SageMaker
  • 10.4. Training and fitting classification models
  • 10.5.Training and fitting regression models
  • 10.6. Performance evaluation metrics and curves
Assignment: Model building and evaluation using Scikit-Learn
Week 11
4/1/2020
Best practices in Machine Learning
  • 11.1. Bias vs variance tradeoff
  • 11.2. Train/dev/test dataset
  • 11.3. Regularization
  • 11.4. Learning vs validation curves
  • 11.5. Hyperparameter tuning
  • 11.6. Ensemble learning
  • 11.7. Streamlining workflows with pipelines
Assignment: Regularization, cross validation and hyperparameter tuning
Week 12
4/8/2020
1. Guest Lecture: Data Science at Wells Fargo
2. Discussion on final project status
Quiz 2: Best practices on machine learning
Week 13
4/15/2020
Productionize a Machine Learning model
  • 13.1. Dev/Stage/Prod environment
  • 13.2 Docker , Docker Files, Docker Containers
  • 13.3. Deploy a machine learning model as a Flask app
  • 13.4 Introduction to Airflow
Assignment: Build and deploy a model using Docker and Heroku app
Week 14
4/22/2020
Final Project Demo
Short 5 minutes long individual project demo

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%