Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for temporary security credential in AWS auth #5210

Merged
merged 2 commits into from
Jul 30, 2021

Conversation

laggardkernel
Copy link
Contributor

Support temporary security credentials by adding a new setting AWS_SESSION_TOKEN. S3 client or credentials creation is done by

  • Session.create_client() in pipelines, feed exporter
  • Credentials.__init__() in S3DownloadHandler

Both of the methods from botocore accept another token parameter.

AWS_SESSION_TOKEN is only required by temporary security credentials, which is short term.

References about the AWS_SESSION_TOKEN

@wRAR
Copy link
Member

wRAR commented Jul 20, 2021

Please fix the test failures (and ideally run them locally before submitting the changes).

Also, if the doc says using this is discouraged, do we need this added?

@laggardkernel
Copy link
Contributor Author

Please fix the test failures (and ideally run them locally before submitting the changes).

Sorry, have some problem running the tests locally. It takes too much time to install dependencies (stuck at flake8 installdeps:). I guess I need to wait for it patiently.

Also, if the doc says using this is discouraged, do we need this added?

We need it. It's discouraged because this kind of credential is short term. But what if a user can't get a normal (long term and never expiring) credential? I'll tweak the words in the doc later to make it clear.

@laggardkernel laggardkernel force-pushed the feature/aws-session-token branch 3 times, most recently from 4ab6ca7 to fa66c36 Compare July 21, 2021 08:37
@codecov
Copy link

codecov bot commented Jul 21, 2021

Codecov Report

Merging #5210 (8e7b96d) into master (abe0b37) will decrease coverage by 0.00%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master    #5210      +/-   ##
==========================================
- Coverage   88.42%   88.42%   -0.01%     
==========================================
  Files         162      162              
  Lines       10522    10528       +6     
  Branches     1521     1522       +1     
==========================================
+ Hits         9304     9309       +5     
  Misses        944      944              
- Partials      274      275       +1     
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/s3.py 93.47% <50.00%> (-1.98%) ⬇️
scrapy/extensions/feedexport.py 95.04% <100.00%> (+0.01%) ⬆️
scrapy/pipelines/files.py 71.87% <100.00%> (+0.19%) ⬆️
scrapy/pipelines/images.py 90.43% <100.00%> (+0.08%) ⬆️

@Gallaecio
Copy link
Member

the doc says using this is discouraged
It's discouraged

You seem to both agree that it’s discouraged. I failed to find such information in the 2 linked references. But even if it is discouraged, as long as it is not deprecated (i.e. as long as they do not ask people to stop using it, and instead ask them to consider alternatives first), +1 to add support.

@wRAR
Copy link
Member

wRAR commented Jul 26, 2021

I meant the doc in the change itself.

docs/topics/feed-exports.rst Outdated Show resolved Hide resolved
docs/topics/settings.rst Outdated Show resolved Hide resolved
scrapy/extensions/feedexport.py Outdated Show resolved Hide resolved
tests/test_feedexport.py Outdated Show resolved Hide resolved
@laggardkernel
Copy link
Contributor Author

You seem to both agree that it’s discouraged. I failed to find such information in the 2 linked references. But even if it is discouraged, as long as it is not deprecated (i.e. as long as they do not ask people to stop using it, and instead ask them to consider alternatives first), +1 to add support.

Temporary credential (e.g. setting AWS_SESSION_TOKEN) is officially supported by AWS. "discourage" is only suggested by me.

The problem of a temporary credential is that it will expire (you can set it when creating one). This introduce a risk that your spider may haven't got the scraping done when the credential expires.

@Gallaecio
Copy link
Member

This introduce a risk that your spider may haven't got the scraping done when the credential expires.

I kind of expect people to realize this based on upstream documentation, but I guess it would be OK to include it as a warning or note. In any case, I would avoid the word “discouraged”, and simply state the facts: if the credentials expire in the middle of a crawl, . I’m not sure what actually happens, if the stuff is simply not delivered of if such an error can stop the spider.

@Gallaecio
Copy link
Member

I meant the doc in the change itself.

This is what happens when you don’t review the changes before commenting. Sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants