Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull: The specified blob does not exist #9730

Closed
rlleshi opened this issue Jul 12, 2023 · 3 comments
Closed

pull: The specified blob does not exist #9730

rlleshi opened this issue Jul 12, 2023 · 3 comments
Labels
A: cloud-versioning Related to cloud-versioned remotes fs: azure Related to the Azure filesystem

Comments

@rlleshi
Copy link

rlleshi commented Jul 12, 2023

Bug Report

Description

I am pushing the files with dvc push to an Azure blob. I then try to dvc pull the files from another machine/repo using the exact same dvc.lock and authentication method to Azure. Now I get:

ERROR: unexpected error - : The specified blob does not exist.
RequestId:8fefd37f-601e-0047-12b6-b494cc000000
Time:2023-07-12T11:43:23.9683742Z
ErrorCode:BlobNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:8fefd37f-601e-0047-12b6-b494cc000000
Time:2023-07-12T11:43:23.9683742Z</Message></Error>

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

If I try previous versions of the dvc.lock in this other machine/repo, it will work as expected. The latest version just fails.

I think the files are being pushed correctly because if I revert to a previous version of the dvc.lock file on my local machine (the one where I originally pushed the data) and do pull on that, I will get the old version of the data, and if I then go back to the current version, the new version of the data will be pulled and replaces the old one.

So this error message is super confusing because it seems like the data has been pushed in the remote.

So to sum up:

  • Dvc remote config is the same
  • dvc.lock is the same
  • Azure authentication method is the same
  • Dvc version is the same (also tried updating to the latest version 3.5.1 but didn't make a diff)

But dvc pull doesn't work as expected.

Environment information

❯ dvc doctor
DVC version: 3.4.0 (pip)
------------------------
Platform: Python 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 2.3.3
        dvc_objects = 0.23.0
        dvc_render = 0.5.3
        dvc_task = 0.3.0
        scmrepo = 1.0.4
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: /home/rlleshi/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/251a55eb30f4d39de05d3ba1c5c045e9
@efiop
Copy link
Contributor

efiop commented Jul 12, 2023

@rlleshi Could you run dvc pull -v and post full verbose log, please?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jul 12, 2023
@rlleshi
Copy link
Author

rlleshi commented Jul 13, 2023

@efiop ty for the quick response.

❯ dvc pull -v
2023-07-13 08:22:59,492 DEBUG: v3.5.1 (pip), CPython 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
2023-07-13 08:22:59,492 DEBUG: command: .local/share/virtualenvs/rule_learning-fHxF2ERL/bin/dvc pull -v
2023-07-13 08:23:00,186 DEBUG: Removing 'projects/tests/rule_learning/.dvc/cache/1a/.e2ifdx4MYt6fP3wd2JTDeo.tmp'
2023-07-13 08:23:00,187 ERROR: unexpected error - : The specified blob does not exist.
RequestId:bc76d6db-501e-0011-5752-b56523000000
Time:2023-07-13T06:23:00.1732920Z
ErrorCode:BlobNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:bc76d6db-501e-0011-5752-b56523000000
Time:2023-07-13T06:23:00.1732920Z</Message></Error>
Traceback (most recent call last):
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/adlfs/spec.py", line 1666, in _get_file
    stream = await bc.download_blob(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 539, in download_blob
    await downloader._setup()  # pylint: disable=protected-access
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/aio/_download_async.py", line 288, in _setup
    self._response = await self._initial_request()
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/aio/_download_async.py", line 383, in _initial_request
    process_storage_error(error)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "<string>", line 1, in <module>
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/aio/_download_async.py", line 337, in _initial_request
    location_mode, response = await self._clients.blob.download(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 198, in download
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/azure/core/exceptions.py", line 112, in map_error
    raise error
azure.core.exceptions.ResourceNotFoundError: The specified blob does not exist.
RequestId:bc76d6db-501e-0011-5752-b56523000000
Time:2023-07-13T06:23:00.1732920Z
ErrorCode:BlobNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:bc76d6db-501e-0011-5752-b56523000000
Time:2023-07-13T06:23:00.1732920Z</Message></Error>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/cli/__init__.py", line 209, in main
    ret = cmd.do_run()
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 31, in run
    stats = self.repo.pull(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
    return f(repo, *args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/repo/pull.py", line 31, in pull
    processed_files_count = self.fetch(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
    return f(repo, *args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc/repo/fetch.py", line 140, in fetch
    fetch_transferred, fetch_failed = ifetch(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_data/index/fetch.py", line 79, in fetch
    fetched += save(fs_index, jobs=jobs, callback=cb)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_data/index/save.py", line 166, in save
    transferred += cache.add(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_data/hashfile/db/__init__.py", line 113, in add
    transferred = super().add(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/db.py", line 178, in add
    generic.transfer(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 306, in transfer
    _try_links(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 247, in _try_links
    return copy(from_fs, from_path, to_fs, to_path, callback=callback)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 93, in copy
    return _get(
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 197, in _get
    return _get_one(from_paths[0], to_paths[0])
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 189, in _get_one
    return get_file(from_path, tmp_file, callback=callback)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/callbacks.py", line 69, in func
    return wrapped(path1, path2, **kw)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/callbacks.py", line 41, in wrapped
    res = fn(*args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 550, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/fsspec/asyn.py", line 121, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/fsspec/asyn.py", line 106, in sync
    raise return_result
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/fsspec/asyn.py", line 61, in _runner
    result[0] = await coro
  File ".local/share/virtualenvs/rule_learning-fHxF2ERL/lib/python3.10/site-packages/adlfs/spec.py", line 1675, in _get_file
    raise FileNotFoundError from exception
FileNotFoundError

2023-07-13 08:23:00,233 DEBUG: Version info for developers:
DVC version: 3.5.1 (pip)
------------------------
Platform: Python 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 2.5.0
        dvc_objects = 0.23.0
        dvc_render = 0.5.3
        dvc_task = 0.3.0
        scmrepo = 1.0.4
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: .config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/d0a62724635fce91ba4020c85395b694

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-07-13 08:23:00,234 DEBUG: Analytics is enabled.
2023-07-13 08:23:00,265 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmphxe4srmw']'
2023-07-13 08:23:00,266 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmphxe4srmw']'

@efiop efiop added fs: azure Related to the Azure filesystem A: cloud-versioning Related to cloud-versioned remotes and removed awaiting response we are waiting for your reply, please respond! :) labels Jul 17, 2023
@pmrowla
Copy link
Contributor

pmrowla commented Aug 10, 2023

Closing this as duplicate of #9651

This should be resolved in the latest DVC release (3.14.0 or later). After updating please remove the Repo.site_cache_dir folder (from dvc doctor) and then retry your pull

rm -r /var/tmp/dvc/repo/...
dvc pull

If you still see this issue after updating and clearing the site cache feel free to re-open this ticket.

@pmrowla pmrowla closed this as completed Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes fs: azure Related to the Azure filesystem
Projects
None yet
Development

No branches or pull requests

3 participants