Datamon is composed of
- Datamon Core
- Datamon Content Addressable Storage
- Datamon Metadata
- Data access layer
- CLI
- FUSE
- SDK based tools
- Data consumption integrations.
- CLI
- Kubernetes integration
- InPod Filesystem
- GIT LFS
- Jupyter notebook
- JWT integration
- ML/AI pipeline run metadata: Captures the end to end metadata for a ML/AI pipeline runs.
- Datamon Query: Allows introspection on pipeline runs and data repos.
Datamon includes a
- Blob storage: Deduplicated storage layer for raw data
- Metadata storage: A metadata storage and query layer
- External storage: Plugable storage sources that are referenced in bundles.
For blob and metadata storage datamon guarantees geo redundant replication of data and is able to withstand region level failures.
For external storage based on the external source, the redundancy and ability to access can vary.
Data access layer is implemented in 3 form factors
- CLI Datamon can be used as a standalone CLI provided developer has access privileges to the backend storage. A developer can always setup datamon to host their own private instance for managing and tracking their own data.
- Filesystem: A bundle can be mounted as a file system in Linux or Mac and new bundles can be generated as well.
- Specialized tooling can be written for specific use cases. Example: Parallel ingest into a bundle for high scaled out throughput.
Datamon will act as a backend for GIT LFS
Datamon allows for Jupyter notebook to read in bundles in a repo and process them and create new bundles based on data generated
Datamon API/Tooling can be used to write custom services to ingest large data sets into datamon. These services can be deployed in kubernetes to manage the long duration ingest.