You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
classTensorStoreIterator():
def__init__(self, store, num_workers=32):
self.store=storeself.shape=store.shapeself.chunk_size=int(np.ceil(self.shape[0]/num_workers))
self.n_chunks=num_workersdef__iter__(self):
self.chunk_index=0returnselfdef__next__(self):
ifself.chunk_index>=self.n_chunks:
raiseStopIteration# Start and end of chunk slice start=self.chunk_index*self.chunk_sizeend=min((self.chunk_index+1) *self.chunk_size, self.__len__())
# Get next chunkchunk=self.store[start:end]
returnchunkdef__len__(self):
returnself.shape[0]
```
ThenIwanttoapplyjoblibparallelismtoitusing```pythonfromjoblibimportParallel, delayedresults=Parallel(n_jobs=num_workers)(
delayed(np.var)(chunk)
forchunkints_data_iter
)
I am getting:
Traceback (most recent call last):
File "/home/user/base/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
ValueError: Error opening "zarr" driver: Error reading local file "/path/to/file.zarr/": Invalid key: "/path/to/file.zarr/"
I thought it might be related to chunk_layout, but was not able to confirm. Does anyone have another idea?
I did confirm that I can do np.var(ts_data[0:3]) , so the path and zarr loading as such worked fine.
The text was updated successfully, but these errors were encountered:
The main issue is that you are not incrementing your chunk index in your iterator.
Also, you don't actually need to read the tensorstore as part of the iterator, you could instead generate tensorstore.DimExpression as in the following:
But you might not want to chunk solely on the "x" dimension. You might also look at the google-research connectomics repository, which uses tensorstore for chunk-based processing:
I am trying to wrap a TensorStore dataset in an iterator to do parallel computation on it. The data uses the zarr driver and reads from local file.
TensorStore({
'context': {
'cache_pool': {},
'data_copy_concurrency': {},
'file_io_concurrency': {},
},
'driver': 'zarr',
'dtype': 'float32',
'kvstore': {
'driver': 'file',
'path': '/path/to/file.zarr/',
},
'metadata': {
'chunks': [1, 100, 100, 4],
'compressor': {
'blocksize': 0,
'clevel': 5,
'cname': 'lz4',
'id': 'blosc',
'shuffle': 1,
},
'dimension_separator': '.',
'dtype': '<f4',
'fill_value': 0.0,
'filters': None,
'order': 'C',
'shape': [1134592, 100, 100, 4],
'zarr_format': 2,
},
'transform': {
'input_exclusive_max': [[1134592], [100], [100], [4]],
'input_inclusive_min': [0, 0, 0, 0],
},
})
I wrote this Iterator:
I am getting:
Traceback (most recent call last):
File "/home/user/base/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
ValueError: Error opening "zarr" driver: Error reading local file "/path/to/file.zarr/": Invalid key: "/path/to/file.zarr/"
I thought it might be related to chunk_layout, but was not able to confirm. Does anyone have another idea?
I did confirm that I can do
np.var(ts_data[0:3])
, so the path and zarr loading as such worked fine.The text was updated successfully, but these errors were encountered: