-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR - ... HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFound. #20
Comments
I've only seen this when I'm referencing a file or container that isn't present. Can you try running:
This instantiates the filesystem and should return a list of all files Dask will expect to find. The most likely explanation is that one of the items being returned has a size of 0. You can also try |
the same error occurs when I just run your code above:
so the glob is somewhat unhappy. If I look at the files found by
|
^ there appears to be a different in the initial "/" |
@danielsc -- I've written a few tests that (I think) replicate the source of the problem you're observing, and then pushed a branch (blob_not_exist_exception). Any chance you can test that branch and give some feeback? |
I get the same kind of error when reading a partitioned parquet data set. ls returns ['bike.parq/_common_metadata',
'bike.parq/_metadata',
'bike.parq/part.0.parquet',
'bike.parq/part.1.parquet',
'bike.parq/part.10.parquet',
'bike.parq/part.2.parquet',
'bike.parq/part.3.parquet',
'bike.parq/part.4.parquet',
'bike.parq/part.5.parquet',
'bike.parq/part.6.parquet',
'bike.parq/part.7.parquet',
'bike.parq/part.8.parquet',
'bike.parq/part.9.parquet', then when I attempt to read the directory it fails.
I will give your branch a run in few days |
I've confirmed that combining the two PRs fixes the glob and dask issues. |
Thanks for verifying @AlbertDeFusco. |
I am running the below code and everything works just fine -- in can process the whole dataset and no parts are missing.
Still I am getting this error message as the code is run:
ERROR - Client-Request-ID=bcfef538-1079-11ea-9010-37c2dc712507 Retry policy did not allow for a retry: Server-Timestamp=Tue, 26 Nov 2019 18:22:42 GMT, Server-Request-ID=3807c197-101e-000b-5c86-a41b61000000, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFound.
It seems to be inconsequential, but I would like to know if it can be avoided.
The text was updated successfully, but these errors were encountered: