Similar to tape archive (tar) files, content addressable archive (car) files are a possibility to group objects to larger quantities. Besides uploading these car files to an object store, they also pose the possibility to save the collections of objects on a traditional filesystem. Accessing these collections without the need of extracting the individual objects can be realized by the usage of a reference file system.
car_referencer
can create the needed reference file from single car
s or multiple car
s that are part of the same merkle DAG.
car_referencer
creates the reference file internally in two steps. The first step is to identify all available references within the provided car
s ( here carfiles.*.car
) and save this as an index file (e.g. index.parquet
) that will be reused if it already exists. In a second step the reference file (e.g. preffs.parquet
) is created based on the ROOT-HASH
that identifies NOT the root-CID of the car file, but the root-CID of the root file-object. In case of a zarr file, like example.zarr
, the ROOT-CID would refer to example.zarr
itself.
car_referencer -c "carfiles.*.car" -p preffs.parquet -r ROOT-HASH -i index.parquet
The created file preffs.parquet
can then be opened by
import xarray as xr
ds = xr.open_zarr("preffs::preffs.parquet")
thanks to https://github.com/d70-t/preffs.
git clone https://github.com/observingClouds/car_referencer.git
cd car_referencer
pip install .
For testing purposes additional dependencies need to be installed including some packages written in go. The needed environment can be installed by
git clone https://github.com/observingClouds/car_referencer.git
cd car_referencer
mamba env create
source activate test-env
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.