-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ssl_verify key in remote config does not accept custom CA bundle path and aws config is ignored #6012
Comments
Looks like we have 2 (related) action points here
(and finally the docs for @rgvanwesep if you'd like to work on this just let us know in this ticket. I'd say the first change is higher priority since it will make pinging @isidentical to double check this |
@pmrowla Good point on the first change. It would be good to be able to let boto fall back to the config. I would like to work on this. I can make both changes at the same time. |
related #5972 |
botocore allows a path to a custom CA bundle either by passing a path to the CA bundle file into the verify argument of boto3.session.Session.client or passing None (the default) which will fall back to the AWS config. Previously, the DVC config only accepted a boolean into the ssl_verify option in the remote S3 config. This changes the DVC config to accept both string and None in addition to boolean and defaults to None. I also changed the default for ssl_verfiy to None in BaseS3FileSystem. Thus, if ssl_verify is not provided, botocore will fall back to the AWS config. Testing Unit tests to cover the changes to the config schema and addition ssl_verify types that will be passed into S3FileSystem. Also, ran dvc push -r object-store data/cifar-10-python.tar.gz in my work environment that has a private S3 endpoint that requires a custom CA bundle, both with and without ssl_verify specified in the config. This was successful, showing that communication could be established. And I ran dvc remote modify object-store ssl_verify "$HOME/.aws/cabundle.pem" and confirmed that the custom CA bundle path was added to the config. Fixes iterative#6012
Also see #5732 for discussion of how to implement this (whether as part of |
@dberenbaum I read through the related issues you posted. I think the discussion you are referring to this thread? Also, it seems that I ran into the same issue as this commenter: |
@dberenbaum, I am fine with this, but let me ping @efiop on this. What alternative name do we have? |
I thought you had recommended |
@dberenbaum Thanks for reminding about that PR 🙏 , I totally forgot about it. The |
* config, remote: Made S3 CA bundle customizable botocore allows a path to a custom CA bundle either by passing a path to the CA bundle file into the verify argument of boto3.session.Session.client or passing None (the default) which will fall back to the AWS config. Previously, the DVC config only accepted a boolean into the ssl_verify option in the remote S3 config. This changes the DVC config to accept both string and None in addition to boolean and defaults to None. I also changed the default for ssl_verfiy to None in BaseS3FileSystem. Thus, if ssl_verify is not provided, botocore will fall back to the AWS config. Testing Unit tests to cover the changes to the config schema and addition ssl_verify types that will be passed into S3FileSystem. Also, ran dvc push -r object-store data/cifar-10-python.tar.gz in my work environment that has a private S3 endpoint that requires a custom CA bundle, both with and without ssl_verify specified in the config. This was successful, showing that communication could be established. And I ran dvc remote modify object-store ssl_verify "$HOME/.aws/cabundle.pem" and confirmed that the custom CA bundle path was added to the config. Fixes #6012 * Removed default None on ssl_verify Responding to PR comment, removed the Optional, default None on ssl_verify since the config keys are optional by default. Rather than a missing ssl_verify producing a None that eventually gets filtered, it doesn't appear in the parsed config in the first place. Co-authored-by: Robert Van Wesep <robert.g.vanwesep@gsk.com>
The
ssl_verify
key in the remote config gets passed through to theS3FileSystem
client_kwargs
:dvc/dvc/fs/s3.py
Line 105 in 89b40af
dvc/dvc/fs/fsspec_wrapper.py
Line 17 in 89b40af
dvc/dvc/fs/s3.py
Line 154 in 89b40af
These are in turn passed to the
aiobotocore.AioSession
:https://github.com/dask/s3fs/blob/a3d7a946f85b6dbef62ab75c61fe1319a482c8ba/s3fs/core.py#L366
In the AioSession it checks if the
verify
key is set and if it isn't then it looks in the aws config:https://github.com/aio-libs/aiobotocore/blob/2a7c7f5a8c7a61daebe484bc5a6f2232607af82c/aiobotocore/session.py#L70-L71
verify
can either be a boolean or a string, with the latter being a path to a custom CA bundle:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html (see the
verify
argument of theclient
method.)However, the config schema for DVC only allows boolean for
ssl_verify
and defaults true:dvc/dvc/config_schema.py
Line 148 in 89b40af
The result is that the aws config is never checked and a custom CA bundle cannot be used. If such a CA bundle is needed when trying to communicate to remote (e.g. using push or pull) the result is
I ran into this problem because my company uses a self-hosted S3 clone with a bundle of internally signed certificates. Setting the
AWS_CA_BUNDLE
environment variable did not resolve the issue. But modifying the config schema to accept a string:and running
resolved the issue for me.
I'm happy to open a pull request to make the change to the config schema if that solution is acceptable, but it would be my first contribution (for any OSS project!), so it'll take extra time for me to setup my environment, etc.
The text was updated successfully, but these errors were encountered: