Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support virtual-hosted–style only S3-compatible remote? #10256

Open
0ut0fcontrol opened this issue Jan 25, 2024 · 7 comments
Open

Support virtual-hosted–style only S3-compatible remote? #10256

0ut0fcontrol opened this issue Jan 25, 2024 · 7 comments
Labels
feature request Requesting a new feature fs: s3 Related to the S3 filesystem

Comments

@0ut0fcontrol
Copy link

Does dvc support virtual-hosted–style S3-compatible remote?

From doc, dvc seem dvc only support path-style S3 remote.

When I try to use dvc with Tinder Object Storage (TOS) which only support virtual-hosted–style S3, it report error.

  • BytePlus | Compatibility with Amazon S3 link
  • Virtual hosting of buckets - Amazon Simple Storage Service link

Error info:

$ cat .dvc/config
[core]
    remote = tos
['remote "tos"']
    url = s3://xxx/xxx (edited)
    endpointurl = https://tos-s3-xxx.xxxx.com (edited)
$ dvc push 
Collecting                                                                                                                                                         |0.00 [00:00,    ?entry/s]
Pushing
ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden                                                                                

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
$ dvc push --verbose
2024-01-25 17:21:30,113 DEBUG: v3.42.0 (brew), CPython 3.12.1 on macOS-14.2.1-arm64-arm-64bit
2024-01-25 17:21:30,113 DEBUG: command: /opt/homebrew/bin/dvc push --verbose
Collecting                                                                                                                                                         |0.00 [00:00,    ?entry/s]
2024-01-25 17:21:30,346 DEBUG: Preparing to transfer data from 'x'
2024-01-25 17:21:30,346 DEBUG: Preparing to collect status from 'x'
2024-01-25 17:21:30,346 DEBUG: Collecting status from 'x'
2024-01-25 17:21:30,347 DEBUG: Querying 1 oids via object_exists
Pushing                                                                                                                                                                                      
2024-01-25 17:21:30,689 ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden                                                        
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/aiobotocore/client.py", line 408, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
                            ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/repo/__init__.py", line 65, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/repo/push.py", line 144, in push
    push_transferred, push_failed = ipush(
                                    ^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/index/push.py", line 75, in push
    result = transfer(
             ^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
    status = compare_status(
             ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/status.py", line 178, in compare_status
    dest_exists, dest_missing = status(
                                ^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/status.py", line 150, in status
    exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 422, in oids_exist
    return list(wrap_iter(remote_oids, callback))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 36, in wrap_iter
    for index, item in enumerate(iterable, start=1):
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 370, in list_oids_exists
    in_remote = self.fs.exists(paths, batch_size=jobs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/fs/base.py", line 472, in exists
    return fut.result()
           ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.1_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.1_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/executors.py", line 135, in batch_coros
    result = fut.result()
             ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 1035, in _exists
    await self._info(path, bucket, key, version_id=version_id)
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 1302, in _info
    out = await self._call_s3(
          ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 348, in _call_s3
    return await _error_wrapper(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 140, in _error_wrapper
    raise err
PermissionError: Forbidden

2024-01-25 17:21:30,721 DEBUG: Version info for developers:
DVC version: 3.42.0 (brew)
--------------------------
Platform: Python 3.12.1 on macOS-14.2.1-arm64-arm-64bit
Subprojects:
	dvc_data = 3.8.0
	dvc_objects = 3.0.6
	dvc_render = 1.0.1
	dvc_task = 0.3.0
	scmrepo = 2.0.4
Supports:
	azure (adlfs = 2023.12.0, knack = 0.11.0, azure-identity = 1.15.0),
	gdrive (pydrive2 = 1.19.0),
	gs (gcsfs = 2023.12.2.post1),
	http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	oss (ossfs = 2023.12.0),
	s3 (s3fs = 2023.12.2, boto3 = 1.34.22),
	ssh (sshfs = 2023.10.0),
	webdav (webdav4 = 0.9.8),
	webdavs (webdav4 = 0.9.8),
	webhdfs (fsspec = 2023.12.2)
Config:
	Global: /Users/bytedance/Library/Application Support/dvc
	System: /opt/homebrew/share/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /opt/homebrew/var/cache/dvc/repo/b8147adaf473f039b47ac961430c05d4

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-01-25 17:21:30,725 DEBUG: Analytics is enabled.
2024-01-25 17:21:30,755 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/x3/m9b0vhpn28d_557yb666lf4w0000gn/T/tmpay2t_qwd', '-v']
2024-01-25 17:21:30,762 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/x3/m9b0vhpn28d_557yb666lf4w0000gn/T/tmpay2t_qwd', '-v'] with pid 98616
@pmrowla
Copy link
Contributor

pmrowla commented Jan 25, 2024

DVC should work properly with the virtual-host style endpointurl addressing. The error indicates that you don't have the right permissions to access that bucket. Are you able to use the AWS CLI to access that byte-plus bucket?

@skshetry
Copy link
Member

skshetry commented Jan 25, 2024

It looks like you have to pass addressing_style: 'virtual' to botocore to enable this?

boto/boto3#2477

@pmrowla
Copy link
Contributor

pmrowla commented Jan 25, 2024

In botocore it defaults to auto, where if you are setting an endpointurl it uses virtual style and then falls back to path
https://github.com/boto/botocore/blob/e7c5b6ab22174797db551f44053a0b2245430649/botocore/utils.py#L2604-L2614

@0ut0fcontrol
Copy link
Author

Thank you, @skshetry and @pmrowla.
When forcing boto3 to use {'addressing_style': 'virtual'}, I can access byte-plus bucket:
image

when not, it raises error:
image

What should I do to make dvc work?

@pmrowla
Copy link
Contributor

pmrowla commented Jan 25, 2024

The error given with s3fs/DVC is a permission error when accessing a specific file. It does not look like s3fs is not failing to create the client session (InvalidPathAccess is not the raised exception when you use DVC). This would normally mean that the issue is specifically with the credentials you are setting for your DVC remote (and not that the issue is due to an incorrect addressing style)

@0ut0fcontrol can you verify that you are able to access your bucket via the AWS CLI?

@0ut0fcontrol
Copy link
Author

@pmrowla Yes, I can access my bucket via aws cli:
image

And, if I insert config_kwargs['s3'] = {'addressing_style': 'virtual'} in here, dvc will work fine.
However, I don't know how to pass this kwarg from dvc cli into s3fs.

  • add {'addressing_style': 'virtual'}
    image
    image

  • Comment out {'addressing_style': 'virtual'}, dvc not work again.
    image
    image

@pmrowla
Copy link
Contributor

pmrowla commented Jan 26, 2024

This looks to me like it may be an aiobotocore or botocore bug since their behavior to default to virtual host addressing and fall back to path addressing may not be working correctly?

But we can add support for setting this explicitly in dvc-s3 dvc-s3 (we set config timeouts in the same way). It will also need to get added to the DVC remote config schema
https://github.com/iterative/dvc-s3/blob/43f70226160f1a5c9ffdad4092d41a8bab7ec19b/dvc_s3/__init__.py#L176-L179

@pmrowla pmrowla added fs: s3 Related to the S3 filesystem feature request Requesting a new feature labels Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature fs: s3 Related to the S3 filesystem
Projects
None yet
Development

No branches or pull requests

3 participants