Design

Components

Datamon is composed of

Datamon Core
1. Datamon Content Addressable Storage
2. Datamon Metadata
Data access layer
1. CLI
2. FUSE
3. SDK based tools
Data consumption integrations.
1. CLI
2. Kubernetes integration
3. InPod Filesystem
4. GIT LFS
5. Jupyter notebook
6. JWT integration
ML/AI pipeline run metadata: Captures the end to end metadata for a ML/AI pipeline runs.
Datamon Query: Allows introspection on pipeline runs and data repos.

Data Storage

Datamon includes a

Blob storage: Deduplicated storage layer for raw data
Metadata storage: A metadata storage and query layer
External storage: Plugable storage sources that are referenced in bundles.

For blob and metadata storage datamon guarantees geo redundant replication of data and is able to withstand region level failures.

For external storage based on the external source, the redundancy and ability to access can vary.

Data Access layer

Data access layer is implemented in 3 form factors

CLI Datamon can be used as a standalone CLI provided developer has access privileges to the backend storage. A developer can always setup datamon to host their own private instance for managing and tracking their own data.
Filesystem: A bundle can be mounted as a file system in Linux or Mac and new bundles can be generated as well.
Specialized tooling can be written for specific use cases. Example: Parallel ingest into a bundle for high scaled out throughput.

Data consumption integration

GIT LFS

Datamon will act as a backend for GIT LFS

Jupyter notebook.

Datamon allows for Jupyter notebook to read in bundles in a repo and process them and create new bundles based on data generated

Data access layer

Datamon API/Tooling can be used to write custom services to ingest large data sets into datamon. These services can be deployed in kubernetes to manage the long duration ingest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design.md

design.md

Design

Components

Data Storage

Data Access layer

Data consumption integration

GIT LFS

Jupyter notebook.

Data access layer

Files

design.md

Latest commit

History

design.md

File metadata and controls

Design

Components

Data Storage

Data Access layer

Data consumption integration

GIT LFS

Jupyter notebook.

Data access layer