Skip to content

Tensorstore spec configuration #872

@minotru

Description

@minotru

Hi Orbax community,

Under the hood, Orbax uses TensorStore for tensor IO, TensorStore integration is a part of type_handlers.py.

TensorStore comes with KVStore implementations for File, GCS, S3, GRPC, etc.

Unfortunately, TensorStore integration part is quite rigid and does not support any form of extension from client code.
Namely, it only supports 'gcs' and 'file' kvstores, their spec is hard-coded and can't be configured.

I guess Orbax should provide a way to configure custom kvstore spec based on directory. I.e. make typed_handlers.py flexible and extendable and avoid hard-coded if/else. So that, as a user, I can support alternative "directory -> kvstore spec" mapping when needed and patch other parts of tsspec too.

For instance, there could be a class like:

class TsSpecStrategyBase(ABC):
    def supported_for(self, directory: str) -> bool:
        ...

    def get_spec(self, directory: str) -> dict:
        ...

class FileTsSpecStrategy(TsSpecStrategyBase):
    ...


class GrpcTsSpecStrategy(TsSpecStrategyBase):
    ...

class TsSpecStrategyResolver:
    def register_strategy(self, strategy: TsSpecStrategyBase):
        ...

    def resolve(self, directory: str) -> TsSpecStrategyBase:
        ...

Motivation. My colleagues have implemented tsgrpc-compatible storage that we want to use as a checkpoint storage. Unfortunately, we can't use it without custom patches to orbax code. Namely:

  • we've added another if/else to _get_tensorstore_spec to activate grpc driver for paths like yt://*.
  • supported patches via env variables to tspec['metadata'], tspec['kvstore']['config'], spec['kvstore'] (ocdbt) in order to configure experimental_read_coalescing_interval and disable compression.

I will be happy to assist and submit a PR

Regards,
Simon

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions