-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hi Orbax community,
Under the hood, Orbax uses TensorStore for tensor IO, TensorStore integration is a part of type_handlers.py.
TensorStore comes with KVStore implementations for File, GCS, S3, GRPC, etc.
Unfortunately, TensorStore integration part is quite rigid and does not support any form of extension from client code.
Namely, it only supports 'gcs' and 'file' kvstores, their spec is hard-coded and can't be configured.
I guess Orbax should provide a way to configure custom kvstore spec based on directory. I.e. make typed_handlers.py flexible and extendable and avoid hard-coded if/else. So that, as a user, I can support alternative "directory -> kvstore spec" mapping when needed and patch other parts of tsspec too.
For instance, there could be a class like:
class TsSpecStrategyBase(ABC):
def supported_for(self, directory: str) -> bool:
...
def get_spec(self, directory: str) -> dict:
...
class FileTsSpecStrategy(TsSpecStrategyBase):
...
class GrpcTsSpecStrategy(TsSpecStrategyBase):
...
class TsSpecStrategyResolver:
def register_strategy(self, strategy: TsSpecStrategyBase):
...
def resolve(self, directory: str) -> TsSpecStrategyBase:
...Motivation. My colleagues have implemented tsgrpc-compatible storage that we want to use as a checkpoint storage. Unfortunately, we can't use it without custom patches to orbax code. Namely:
- we've added another
if/elseto _get_tensorstore_spec to activategrpcdriver for paths likeyt://*. - supported patches via env variables to
tspec['metadata'],tspec['kvstore']['config'],spec['kvstore'] (ocdbt)in order to configureexperimental_read_coalescing_intervaland disable compression.
I will be happy to assist and submit a PR
Regards,
Simon