Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FilesStore support for other S3 providers (botocore options) #2609

Merged
merged 4 commits into from Jan 8, 2018

Conversation

@otobrglez
Copy link
Contributor

@otobrglez otobrglez commented Mar 1, 2017

Hi guys!

Scrapy internally uses botocore for Amazon S3 data "manipulation". However there are other "S3" compatible storages that can also be used as alternative to S3 - services like Minio or s3.scality.

In order to get the S3FilesStore to work with this and other providers botocore internally needs to know the endpoint_url and since SSL/VERIFY then become optional this settings can/should be set by user in settings.py.

This patch adds options to botocore.session.create_client() via:

  • Support for botocore endpoint_url via new AWS_ENDPOINT_URL scrapy setting.
  • Support for botocore region_name via new AWS_REGION_NAME scrapy setting.
  • Support for botocore use_ssl via new AWS_USE_SSL scrapy setting.
  • Support for botocore verify via AWS_VERIFY scrapy setting.

Example setting.py for Minio now looks like this:

AWS_ENDPOINT_URL = 'http://minio.example.com:9000'
AWS_USE_SSL = False # or True (None by default)
AWS_VERIFY = False # or True (None by default)
AWS_ACCESS_KEY_ID = "aws_key"
AWS_SECRET_ACCESS_KEY = "aws_pass"
FILES_STORE_S3_ACL = 'public-read' 

Please review and marge if you feel like this is something that people might find useful. Thanks for your time and effort. Cheers!

@redapple redapple added the S3 label Mar 2, 2017
@kmike
Copy link
Member

@kmike kmike commented Mar 2, 2017

Hey @otobrglez! Thanks for the pull request; I think it is a good feature to have.

Questions:

  1. Does S3 still works without defining new options? Should we add them to https://github.com/scrapy/scrapy/blob/master/scrapy/settings/default_settings.py?
  2. Could you please add docs for new options?
@codecov-io
Copy link

@codecov-io codecov-io commented Mar 19, 2017

Codecov Report

Merging #2609 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2609      +/-   ##
==========================================
+ Coverage   84.48%   84.53%   +0.04%     
==========================================
  Files         164      164              
  Lines        9464     9291     -173     
  Branches     1429     1381      -48     
==========================================
- Hits         7996     7854     -142     
+ Misses       1205     1179      -26     
+ Partials      263      258       -5
Impacted Files Coverage Δ
scrapy/pipelines/files.py 68.01% <100%> (-1.75%) ⬇️
scrapy/commands/bench.py 94.73% <0%> (-5.27%) ⬇️
scrapy/utils/log.py 88.5% <0%> (-3.3%) ⬇️
scrapy/extensions/throttle.py 44.89% <0%> (-0.56%) ⬇️
scrapy/settings/default_settings.py 98.6% <0%> (-0.22%) ⬇️
@otobrglez
Copy link
Contributor Author

@otobrglez otobrglez commented Mar 19, 2017

Hey @kmike! 👋

  1. It works without defining and changes to default_settings.py. The underlaying botocore library has predefined default values,...
  2. I've added documentation to settings.rst and media-pipeline.rst. Hope this is enough?

Cheers!

  • Oto
@otobrglez
Copy link
Contributor Author

@otobrglez otobrglez commented Apr 11, 2017

@kmike or @redapple - can someone please merge this? :) Cheers!

@otobrglez
Copy link
Contributor Author

@otobrglez otobrglez commented Oct 16, 2017

@kmike @redapple Ping. Anyone merging this?

@dangra dangra added this to the v1.6 milestone Dec 31, 2017
@dangra dangra merged commit b170e9f into scrapy:master Jan 8, 2018
2 checks passed
2 checks passed
@codecov
codecov/patch 100% of diff hit (target 84.48%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dangra
Copy link
Member

@dangra dangra commented Jan 8, 2018

Useful and implementation looks clean. thanks!

kmike added a commit that referenced this pull request Dec 26, 2018
dangra added a commit that referenced this pull request Dec 26, 2018
fix docs for AWS_... settings. A follow-up to GH-2609.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants