This repository contains all materials, including Labs in the Practical Data Science Specialization offered by DeepLearning.AI and Amazon Web Services on Coursera.
In the first course of the Practical Data Science Specialization, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.
Ingest, explore, and visualize a product review data set for multi-class text classification.
Determine the most important features in a data set and detect statistical biases.
Inspect and compare models generated with automated machine learning (AutoML).
Train a text classifier with BlazingText and deploy the classifier as a real-time inference endpoint to serve predictions.
In the second course of the Practical Data Science Specialization, you will learn to automate a natural language processing task by building an end-to-end machine learning pipeline using Hugging Face’s highly-optimized implementation of the state-of-the-art BERT algorithm with Amazon SageMaker Pipelines. Your pipeline will first transform the dataset into BERT-readable features and store the features in the Amazon SageMaker Feature Store. It will then fine-tune a text classification model to the dataset using a Hugging Face pre-trained model, which has learned to understand the human language from millions of Wikipedia documents. Finally, your pipeline will evaluate the model’s accuracy and only deploy the model if the accuracy exceeds a given threshold.
Transform a raw text dataset into machine learning features and store features in a feature store.
Fine-tune, debug, and profile a pre-trained BERT model.
Orchestrate ML workflows and track model lineage and artifacts in an end-to-end machine learning pipeline.
In the third course of the Practical Data Science Specialization, you will learn a series of performance-improvement and cost-reduction techniques to automatically tune model accuracy, compare prediction performance, and generate new training data with human intelligence. After tuning your text classifier using Amazon SageMaker Hyper-parameter Tuning (HPT), you will deploy two model candidates into an A/B test to compare their real-time prediction performance and automatically scale the winning model using Amazon SageMaker Hosting. Lastly, you will set up a human-in-the-loop pipeline to fix misclassified predictions and generate new training data using Amazon Augmented AI and Amazon SageMaker Ground Truth.
Train, tune, and evaluate models using data-parallel and model-parallel strategies and automatic model tuning.
Deploy models with A/B testing, monitor model performance, and detect drift from baseline metrics.
Label data at scale using private human workforces and build human-in-the-loop pipelines.