Skip to content

Metatree is a DBMS that uses the filesystem itself as a tree-structured database.

License

Notifications You must be signed in to change notification settings

oboki/metatreedb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

metatreedb

Metatree is a DBMS that uses the filesystem itself to organize and manage data in a tree-structured format.

In metadata.json in each tree node, you can manage information about child nodes, which can also be used for searching.

Features

  • metadata-based index
  • db-level concurrency control
  • abstract filesystem support (fsspec)
    • local
    • HDFS
    • S3

Installation

pip install metatreedb

Quick Start

Here's an example of using Metatree as a model repository by setting up a database with (model_name, version,) as identifiers:

from metatree import Metatree

metatree = Metatree(
    "/tmp/my-model-repository",
    (
        "model",
        "version",
    ),
)

import uuid
import pickle

from pathlib import Path

for i in range(1, 4):
    awful_uuid = uuid.uuid4()
    trained = Path("/tmp/my-model-repository/trained.pkl")
    with open(trained, "wb") as f:
        pickle.dump(awful_uuid, f)
    metatree.put(f"my-awful-model/v{i}", trained)

This will create files and directories in your filesystem as shown:

❯ tree /tmp/my-model-repository
/tmp/my-model-repository
├── metadata.json
└── my-awful-model
    ├── metadata.json
    ├── v1
    │   ├── metadata.json
    │   └── trained.pkl
    ├── v2
    │   ├── metadata.json
    │   └── trained.pkl
    └── v3
        ├── metadata.json
        └── trained.pkl

To add metadata information, use find and update:

metatree.find("my-awful-model")
metatree.update(active="v2")
for i in range(1, 4):
    metatree.find(f"my-awful-model/v{i}").update(model_file="trained.pkl")

This will update the metadata.json files as follows:

❯ cat /tmp/my-model-repository/my-awful-model/metadata.json
{"children": ["v1", "v2", "v3"], "active": "v2"}

❯ cat /tmp/my-model-repository/my-awful-model/v*/metadata.json
{"model_file": "trained.pkl"}
{"model_file": "trained.pkl"}
{"model_file": "trained.pkl"}

You can use this search index to find files. By enclosing the keys from metadata.json within angle brackets <> and substituting them in location, you can perform searches as follows:

metatree.find("my-awful-model/<active>")
print(metatree.location)
# This returns `/tmp/my-model-repository/my-awful-model/v2

file = metatree.get("my-awful-model/<active>/<model_file>")
print(file)
# The given path translates to `my-awful-model/v2/trained.pkl`,
# and it returns generator object

with WebHDFS

To use WebHDFS, set the root path to the WebHDFS URL and provide hdfs args:

from metatreedb import Metatree

metatree = Metatree(
    "webhdfs:///tmp/my-model-repository",
    ("model", "version"),
    host="localhost",
    port=9870,
    user="hadoop",
)

with S3

To use S3, set the root path to the S3 bucket URL and provide s3fs args:

from metatreedb import Metatree

metatree = Metatree(
    "s3://your-awesome-bucket/tmp/my-model-repository",
    ("model", "version"),
    endpoint_url="http://localhost:5000",
    key="testing",
    secret="testing",
)

About

Metatree is a DBMS that uses the filesystem itself as a tree-structured database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages