Skip to content

pachyderm/pachyderm

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
December 14, 2022 18:33
August 2, 2021 19:21
December 8, 2022 17:04
September 20, 2021 15:38
August 12, 2019 13:46
April 23, 2020 13:53
August 4, 2022 15:41
April 20, 2020 10:31
November 28, 2022 09:36
December 16, 2022 15:35

GitHub release GitHub license GoDoc Go Report Card Slack Status CLA assistant

Pachyderm – Automate data transformations with data versioning and lineage

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Our unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data.

Features

  • Data-driven pipelines automatically trigger based on detecting data changes.
  • Immutable data lineage with data versioning of any data type.
  • Autoscaling and parallel processing built on Kubernetes for resource orchestration.
  • Uses standard object stores for data storage with automatic deduplication.
  • Runs across all major cloud providers and on-premises installations.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, run Pachyderm locally or you can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

  • Twitter Follow us on Twitter.
  • Slack Status Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Join Us

WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about our open positions

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

License Information

Pachyderm has moved some components of Pachyderm Platform to a source-available limited license.

We remain committed to the culture of open source, developing our product transparently and collaboratively with our community, and giving our community and customers source code access and the ability to study and change the software to suit their needs.

Under the Pachyderm Community License, you can access the source code and modify or redistribute it; there is only one thing you cannot do, and that is use it to make a competing offering.

Check out our License FAQ Page for more information.