Skip to content

pdefusco/CML_MLOps_ACE_HOL

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
November 5, 2023 15:38
December 7, 2023 18:13
December 7, 2023 18:13
November 5, 2023 15:38
January 18, 2023 10:06
img
November 5, 2023 15:38
October 27, 2023 21:02
src
November 12, 2023 14:32
November 12, 2023 14:32
November 12, 2023 14:41
December 7, 2023 18:13

CML MLOps Hands-On-Lab Workshop

About the Hands On Lab Workshops

The Hands-On Lab (HOL) Workshops are an initiative by Cloudera Solutions Engineering aimed at familiarizing CDP users with each Data Service. The content consists of a series of guides and exercises to quickly implement sample end-to-end use cases in the realm of Machine Learning, Datawarehousing, Data Engineering, Data Streaming and Operational Database.

The HOL is typically a three to four-hour event organized by Cloudera for CDP customers and prospects, where a small technical team from Cloudera Solutions Engineering provides cloud infrastructure for all participants and guides them through the completion of the labs with the help of presentations and open discussions.

The HOL contained in this GitHub repository is dedicated to Cloudera Machine Learning. CML is the CDP Data Service for machine learning and AI commonly in Private and Public Clouds.

The content is primarily designed for machine learning engineers, data scientists, and cloud architects. However, little to no code changes are typically required and non-technical stakeholders such as project managers and analysts are encouraged to actively take part.

HOL events are open to all CDP users and customers. If you would like Cloudera to host an event for you and your colleagues please contact your local Cloudera Representative or submit your information through this portal. Finally, if you have access to a CDE Virtual Cluster you are welcome to use this guide and go through the same concepts in your own time.

About the Cloudera Machine Learning (CML) Service

Cloudera Machine Learning (CML) is Cloudera’s platform for machine learning and AI. CML unifies self-service data science and data engineering in a single, portable service as part of an enterprise data cloud for multi-function analytics on data anywhere.

Large scale organizations use CML to build and deploy machine learning and AI capabilities for business at scale, efficiently and securely. CML is built for the agility and power of cloud computing, but can also operate inside your private and secure data center.

About the Labs

This Hands On Lab is designed to walk you through the Services's main capabilities. Throughout the exercises you will complete a text classification use case and:

  1. Use MLFLow Experiments to perform hyperparameter tuning at scale with Spark and PyTorch.
  2. Learn about Iceberg's most popular features.
  3. Use MLFlow Registry and CML APIv2 to build an MLOps pipeline.
  4. Collaborate with other team members to maintain the promoted models and MLOps pipeline in a production-like environment.

Step by Step Instructions

Detailed instructions are provided in the step_by_step_guides folder.

Other CDP Hands On Lab Workshops

CDP Data Services include Cloudera Machine Learning (CML), Cloudera Operational Database (COD), Cloudera Data Flow (CDF) and Cloudera Data Warehouse (CDW). HOL Workshops are available for each of these CDP Data Services.

  • CDE Workshop: Deploy an Ingestion, Transformation and Reporting pipeline with Spark 3.2. Learn about Iceberg's most popular features and orchestrate pipelines with Airflow. Use the CDE CLI and CDE Spark Submit Migration Tool to interact with CDE Virtual Clusters from your terminal.Finally, build a Python App leveraging the CDE API and monitor multiple CDE Virtual Clusters at the same time.
  • CDF Workshop: Build a full OT to IT workflow for an IoT Predictive Maintenance use case with: Edge Flow Management with MQTT and MiNiFi for data collection; Data Flow management was handled by NiFi and Kafka, and Spark Streaming with Cloudera Data Science Workbench (CDSW) model to process data. The lab also includes content focused on Kudu, Impala and Hue.
  • CDW Workshop: As a Big Data Engineer and Analyst for an Aeronautics corporation, build a Data Warehouse & Data Lakehouse to gain an advantage over your competition. Interactively explore data at scale. Create ongoing reports. Finally move to real-time analysis to anticipate engine failures. All using Apache Impala and Iceberg.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published