Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl_verify key in remote config does not accept custom CA bundle path and aws config is ignored #6012

Closed
rgvanwesep opened this issue May 15, 2021 · 8 comments · Fixed by #6018
Labels
bug Did we break something? fs: s3 Related to the S3 filesystem p2-medium Medium priority, should be done, but less important

Comments

@rgvanwesep
Copy link
Contributor

The ssl_verify key in the remote config gets passed through to the S3FileSystem client_kwargs:

client["verify"] = config.get("ssl_verify", True)

self.fs_args.update(self._prepare_credentials(**kwargs))

return _S3FileSystem(**self.fs_args)

These are in turn passed to the aiobotocore.AioSession :
https://github.com/dask/s3fs/blob/a3d7a946f85b6dbef62ab75c61fe1319a482c8ba/s3fs/core.py#L366

In the AioSession it checks if the verify key is set and if it isn't then it looks in the aws config:
https://github.com/aio-libs/aiobotocore/blob/2a7c7f5a8c7a61daebe484bc5a6f2232607af82c/aiobotocore/session.py#L70-L71
verify can either be a boolean or a string, with the latter being a path to a custom CA bundle:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html (see the verify argument of the client method.)

However, the config schema for DVC only allows boolean for ssl_verify and defaults true:

Optional("ssl_verify", default=True): Bool,

The result is that the aws config is never checked and a custom CA bundle cannot be used. If such a CA bundle is needed when trying to communicate to remote (e.g. using push or pull) the result is

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate

I ran into this problem because my company uses a self-hosted S3 clone with a bundle of internally signed certificates. Setting the AWS_CA_BUNDLE environment variable did not resolve the issue. But modifying the config schema to accept a string:

Optional("ssl_verify", default=True): Any(Bool, str),

and running

dvc remote modify object-store ssl_verify "$HOME/.aws/cabundle.pem"

resolved the issue for me.

I'm happy to open a pull request to make the change to the config schema if that solution is acceptable, but it would be my first contribution (for any OSS project!), so it'll take extra time for me to setup my environment, etc.

@pmrowla pmrowla added bug Did we break something? p2-medium Medium priority, should be done, but less important labels May 15, 2021
@pmrowla
Copy link
Contributor

pmrowla commented May 15, 2021

Looks like we have 2 (related) action points here

  • client["verify"] = config.get("ssl_verify", True) should default to passing None into boto instead of True if ssl_verify is not explicitly set for a DVC remote
  • DVC's ssl_verify should also accept a ca_bundle string

(and finally the docs for ssl_verify will need to be updated)

@rgvanwesep if you'd like to work on this just let us know in this ticket. I'd say the first change is higher priority since it will make aiobotocore fall back to reading ca_bundle from the user's AWS config by default, but both can be done together.

pinging @isidentical to double check this

@rgvanwesep
Copy link
Contributor Author

@pmrowla Good point on the first change. It would be good to be able to let boto fall back to the config.

I would like to work on this. I can make both changes at the same time.

@pmrowla
Copy link
Contributor

pmrowla commented May 15, 2021

related #5972

rgvanwesep pushed a commit to rgvanwesep/dvc that referenced this issue May 16, 2021
botocore allows a path to a custom CA bundle either by passing a path
to the CA bundle file into the verify argument of
boto3.session.Session.client or passing None (the default) which will
fall back to the AWS config. Previously, the DVC config only accepted a
boolean into the ssl_verify option in the remote S3 config. This changes
the DVC config to accept both string and None in addition to boolean and
defaults to None. I also changed the default for ssl_verfiy to None in
BaseS3FileSystem. Thus, if ssl_verify is not provided, botocore will
fall back to the AWS config.

Testing

Unit tests to cover the changes to the config schema and addition
ssl_verify types that will be passed into S3FileSystem. Also, ran

dvc push -r object-store data/cifar-10-python.tar.gz

in my work environment that has a private S3 endpoint that requires a
custom CA bundle, both with and without ssl_verify specified in the
config. This was successful, showing that communication could be
established. And I ran

dvc remote modify object-store ssl_verify "$HOME/.aws/cabundle.pem"

and confirmed that the custom CA bundle path was added to the config.

Fixes iterative#6012
@dberenbaum
Copy link
Contributor

Also see #5732 for discussion of how to implement this (whether as part of ssl_verify or a separate parameter) and related issues in #5388 and iterative/dvc.org#1079. cc @skshetry

@dberenbaum dberenbaum added this to To do in DVC May 18 - Jun 1 2021 via automation May 17, 2021
@dberenbaum dberenbaum moved this from To do to In progress in DVC May 18 - Jun 1 2021 May 17, 2021
@rgvanwesep
Copy link
Contributor Author

@dberenbaum I read through the related issues you posted. I think the discussion you are referring to this thread?
#5732 (comment)
The linked PR I posted implements a custom CA bundle path as part of ssl_verify (similar to how botocore and requests does it.)

Also, it seems that I ran into the same issue as this commenter:
#5972 (comment)
There, the AWS_CA_BUNDLE environment variable is ignored if anything is passed into verify either True or False. The linked PR should fix that issue as well.

@skshetry
Copy link
Member

@dberenbaum, I am fine with this, but let me ping @efiop on this. What alternative name do we have?

@efiop efiop moved this from In progress to Review in progress in DVC May 18 - Jun 1 2021 May 18, 2021
@dberenbaum
Copy link
Contributor

I thought you had recommended certs and I also think I saw or hear ca_bundle, but I don't have a strong opinion about the name or whether we even need a separate key.

@efiop
Copy link
Member

efiop commented May 19, 2021

@dberenbaum Thanks for reminding about that PR 🙏 , I totally forgot about it.

The ssl_verify is ok if user knows about boto, but probably not so good from a regular user's perspective. I guess we can keep it as is with ssl_verify for now, looks like there is not much pushback these days. Also, #6018 implementation is significantly better than the attempt we had before, so I personally won't feel bad for merging it.

@skshetry skshetry added the fs: s3 Related to the S3 filesystem label May 27, 2021
DVC May 18 - Jun 1 2021 automation moved this from Review in progress to Done Jun 1, 2021
efiop pushed a commit that referenced this issue Jun 1, 2021
* config, remote: Made S3 CA bundle customizable

botocore allows a path to a custom CA bundle either by passing a path
to the CA bundle file into the verify argument of
boto3.session.Session.client or passing None (the default) which will
fall back to the AWS config. Previously, the DVC config only accepted a
boolean into the ssl_verify option in the remote S3 config. This changes
the DVC config to accept both string and None in addition to boolean and
defaults to None. I also changed the default for ssl_verfiy to None in
BaseS3FileSystem. Thus, if ssl_verify is not provided, botocore will
fall back to the AWS config.

Testing

Unit tests to cover the changes to the config schema and addition
ssl_verify types that will be passed into S3FileSystem. Also, ran

dvc push -r object-store data/cifar-10-python.tar.gz

in my work environment that has a private S3 endpoint that requires a
custom CA bundle, both with and without ssl_verify specified in the
config. This was successful, showing that communication could be
established. And I ran

dvc remote modify object-store ssl_verify "$HOME/.aws/cabundle.pem"

and confirmed that the custom CA bundle path was added to the config.

Fixes #6012

* Removed default None on ssl_verify

Responding to PR comment, removed the Optional, default None on
ssl_verify since the config keys are optional by default. Rather
than a missing ssl_verify producing a None that eventually gets
filtered, it doesn't appear in the parsed config in the first place.

Co-authored-by: Robert Van Wesep <robert.g.vanwesep@gsk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? fs: s3 Related to the S3 filesystem p2-medium Medium priority, should be done, but less important
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants