-
Notifications
You must be signed in to change notification settings - Fork 113
Closed
Description
There seems to be an issue when 2 instances of this file system write to the same blob from 2 different processes in parallel, where one of the uploads fails with:
Azure error
File "/code/.venv/lib/python3.10/site-packages/our_package/connector/storage/blob.py", line 117, in _save
with self._fs.open(
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1963, in __exit__
self.close()
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 1908, in close
super().close()
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1930, in close
self.flush(force=True)
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1801, in flush
if self._upload_chunk(final=force) is not False:
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 2068, in _async_upload_chunk
await bc.commit_block_list(
File "/code/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 1861, in commit_block_list
process_storage_error(error)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
azure.core.exceptions.HttpResponseError: The specified block list is invalid.
RequestId:<request_id>
Time:2024-02-13T12:15:05.1957595Z
ErrorCode:InvalidBlockList
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidBlockList</Code><Message>The specified block list is invalid.
From our limited investigation, this seems to likely be caused by the way AzureBlobFile calculates the IDs of the uploaded blocks:
Lines 2102 to 2103 in 576fb7a
| block_id = len(self._block_list) | |
| block_id = f"{block_id:07d}" |
Could this be changed to a hash of the content or something similar, which would correspond to the actual contents of the uploaded block?
cmp0xff
Metadata
Metadata
Assignees
Labels
No labels