Skip to content

This repository contains all materials, including Labs in the Practical Data Science Specialization offered by DeepLearning.AI and Amazon Web Services on Coursera

Notifications You must be signed in to change notification settings

evgenyzorin/Practical-Data-Science

Repository files navigation

🚀 Practical Data Science Specialization

This repository contains all materials, including Labs in the Practical Data Science Specialization offered by DeepLearning.AI and Amazon Web Services on Coursera.

About this Course

In the first course of the Practical Data Science Specialization, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.

Syllabus:

Week 1: Explore the Use Case and Analyze the Dataset

Ingest, explore, and visualize a product review data set for multi-class text classification.

Week 2: Data Bias and Feature Importance

Determine the most important features in a data set and detect statistical biases.

Week 3: Use Automated Machine Learning to train a Text Classifier

Inspect and compare models generated with automated machine learning (AutoML).

Week 4: Built-in algorithms

Train a text classifier with BlazingText and deploy the classifier as a real-time inference endpoint to serve predictions.

About this Course

In the second course of the Practical Data Science Specialization, you will learn to automate a natural language processing task by building an end-to-end machine learning pipeline using Hugging Face’s highly-optimized implementation of the state-of-the-art BERT algorithm with Amazon SageMaker Pipelines. Your pipeline will first transform the dataset into BERT-readable features and store the features in the Amazon SageMaker Feature Store. It will then fine-tune a text classification model to the dataset using a Hugging Face pre-trained model, which has learned to understand the human language from millions of Wikipedia documents. Finally, your pipeline will evaluate the model’s accuracy and only deploy the model if the accuracy exceeds a given threshold.

Syllabus:

Week 1: Feature Engineering and Feature Store

Transform a raw text dataset into machine learning features and store features in a feature store.

Week 2: Week 2: Train, Debug, and Profile a Machine Learning Model

Fine-tune, debug, and profile a pre-trained BERT model.

Week 3: Deploy End-To-End Machine Learning pipelines

Orchestrate ML workflows and track model lineage and artifacts in an end-to-end machine learning pipeline.

About this Course

In the third course of the Practical Data Science Specialization, you will learn a series of performance-improvement and cost-reduction techniques to automatically tune model accuracy, compare prediction performance, and generate new training data with human intelligence. After tuning your text classifier using Amazon SageMaker Hyper-parameter Tuning (HPT), you will deploy two model candidates into an A/B test to compare their real-time prediction performance and automatically scale the winning model using Amazon SageMaker Hosting. Lastly, you will set up a human-in-the-loop pipeline to fix misclassified predictions and generate new training data using Amazon Augmented AI and Amazon SageMaker Ground Truth.

Syllabus:

Week 1: Advanced model training, tuning and evaluation

Train, tune, and evaluate models using data-parallel and model-parallel strategies and automatic model tuning.

Week 2: Advanced model deployment and monitoring

Deploy models with A/B testing, monitor model performance, and detect drift from baseline metrics.

Week 3: Data labeling and human-in-the-loop pipelines

Label data at scale using private human workforces and build human-in-the-loop pipelines.

About

This repository contains all materials, including Labs in the Practical Data Science Specialization offered by DeepLearning.AI and Amazon Web Services on Coursera

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published