Skip to content

A curated list of examples that use Pachyderm to accomplish various tasks.

Notifications You must be signed in to change notification settings

pachyderm/examples

Repository files navigation

Pachyderm Examples

Pachyderm Examples is a curated list of examples that use Pachyderm to accomplish various tasks.

Getting Started

  • Intro to Pachyderm Tutorial - A notebook introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of Pachyderm data repositories and pipelines
  • Boston Housing Prices - A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.
  • Boston Housing Prices (Intermediate) - Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.
  • Market Sentiment - Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.
  • Object Detection - Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

Notebooks

Data Labeling

  • Label Studio Integration - Incorporate data versioning into any labeling project with Label Studio and Pachyderm.
  • Superb AI Integration - Version labeled image datasets created in Superb AI Suite using a cron pipeline.
  • Toloka Integration - Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Data Warehouse

  • BigQuery - Connector to ingests the result of a BigQuery query into Pachyderm as a parquet file.
  • Churn Prediction with Snowflake - Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Machine Learning

  • Boston Housing Prices (Intermediate) - Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.
  • Breast Cancer Detection - A breast cancer detection system based on radiology scans scaled and visualized using Pachyderm.
  • AutoML - A Pachyderm pipeline that uses the mljar-supervised to train a machine learning model on a CSV file.
  • Market Sentiment - Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.
  • Apache Spark - MLflow Integration - End-to-end example demostrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

ML Experiment Tracking

  • Weights and Biases - Log pipelines running in Pachyderm to Weights and Biases.
  • ClearML Integration - Log Pachyderm experiments to ClearML's experiment montioring platform, using Pachyderm Secrets.

Model Deployment