Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow credentials in client kwargs #308

Merged

Conversation

michaelchia
Copy link
Contributor

Since 0.4.1 I am unable to provide credentials via the client_kwargs argument.

s3 = S3FileSystem(client_kwargs={'aws_access_key_id': 'foo', 'aws_secret_access_key': 'bar'})
>  TypeError: create_client() got multiple values for keyword argument 'aws_access_key_id'

This was caused by the following change in s3fs.core (lines 292-293):

self.s3 = self.session.create_client('s3', aws_access_key_id=self.key, aws_secret_access_key=self.secret, aws_session_token=self.token, config=conf, use_ssl=ssl,
                                        **self.client_kwargs)

My PR would allow the use of either using the arguments (key, secret, token) or the client_kwargs.
It would also allow a combination of both. E.g. key + client_kwargs['aws_secret_access_key']
I do allow repeating the same argument in both (e.g. key + aws_access_key_id) as long as they have the same value, but would raise exception if they have conflicting values (will raise same TypeError: multiple values for kwarg). I also added tests to check this behaviour.

There were other options to choose from so please advise if you prefer one of the following:

  1. Fail if the same argument is repeated even if the same value.
  2. Prioritise init arguments (i.e. key, secret, token) over client_kwargs
  3. Prioritise client_kwargs over init args
    If choosing between 2 or 3, I would favour 2 since those are more explicit, as client_kwargs could be some defaults loaded from some file. However 3 is more consistent with S3FileSystem._prepare_config_kwargs.
    Ultimately I prefer my current solution to let it fail, so it would help the user debug if unintentional, while users can handle their own defaulting logic if needed.

Since use_ssl defaults to True, it is impossible to tell if the user explicitly sets it as True or it is defaulted to True. So I give priority to client_kwargs (as per S3FileSystem._prepare_config_kwargs). i.e. S3FileSystem(use_ssl=True, client_kwargs={'use_ssl': False}) would set use_ssl to False.

I also ensured that if anon==True, it would disregard the credentials in client_kwargs as is currently the case with key, secret & token.

I also cleared up some unused/unnecessary line of code (s3fs.core: 275)

anon, key, secret, kwargs, ckwargs, token, ssl = (
            self.anon, self.key, self.secret, self.kwargs,
            self.client_kwargs, self.token, self.use_ssl)

I hope you accept this PR because the change in 0.4.1 was breaking some other 3rd party library that was relying on passing credentials via client_kwargs.

@martindurant
Copy link
Member

Sorry if I am being dense, but what's the purpose of passing the credentials kwargs in client_kwargs rather than the current mechanism?

@michaelchia
Copy link
Contributor Author

Although it is always possible to use the native arguments, I think it would be good to allow this alternative usage pattern as well.
Personally, I am often required to change the endpoint_url so I prefered to place all my arg in the client_kwargs. Also, it allowed me to easily load all my arguments from disk as a single dictionary to provide as input. I also use boto3 direct as well, so it would allow me to use the same dictionary to populate the arguments to instantiate the boto client.
However, more importantly, it is not backwards compatible as previous versions (up to 0.4.0) did allow such usage and some packages may be relying on this pattern (namely I was using a package called kedro). There was nothing in the documentation that hinted that this behaviour was not supported and the docstring for client_kwargs ("dict of parameters for the botocore client") does naturally imply users could input any of the arguments in boto client.
I don't see a major issue in allowing this as it does not affect standard usage.
Thanks for your time and consideration.

@martindurant
Copy link
Member

OK, you make a good argument.
The py35 failure is weird, there's a sorted right there, but if you can think of something better to make it pass... else pytest.mark.skipif(py35, reason='odd sorting issue') and move on (py35 compat is not high priority, and this won't actually break usage).

@martindurant
Copy link
Member

It would be good to add wording to the docstring abut the parameter priority.

@michaelchia
Copy link
Contributor Author

Hmmm I didn't touch test_s3_glob at all though. I'll just add pytest.mark.skipif(py35, reason='odd sorting issue') anyway since it has your blessing.
Just to confirm, there is only parameter priority for use_ssl, so I'll add to the docstring to mention that. There is no priority preference for the rest of the arguments. It will raise an error if they conflict (same error as now). i.e.:

TypeError: create_client() got multiple values for keyword argument 'aws_access_key_id'

If you would like me to explicitly raise a more descriptive error, please advise on what error to raise and what the message should be.

@michaelchia
Copy link
Contributor Author

There was a bug in the version checking logic. So tests weren't skipping for py35. Fixed it.

@martindurant
Copy link
Member

I think that's good now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants