Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FilesStore support for other S3 providers (botocore options) #2609

merged 4 commits into from
Jan 8, 2018


Copy link

Hi guys!

Scrapy internally uses botocore for Amazon S3 data "manipulation". However there are other "S3" compatible storages that can also be used as alternative to S3 - services like Minio or s3.scality.

In order to get the S3FilesStore to work with this and other providers botocore internally needs to know the endpoint_url and since SSL/VERIFY then become optional this settings can/should be set by user in

This patch adds options to botocore.session.create_client() via:

  • Support for botocore endpoint_url via new AWS_ENDPOINT_URL scrapy setting.
  • Support for botocore region_name via new AWS_REGION_NAME scrapy setting.
  • Support for botocore use_ssl via new AWS_USE_SSL scrapy setting.
  • Support for botocore verify via AWS_VERIFY scrapy setting.

Example for Minio now looks like this:

AWS_USE_SSL = False # or True (None by default)
AWS_VERIFY = False # or True (None by default)
AWS_ACCESS_KEY_ID = "aws_key"
FILES_STORE_S3_ACL = 'public-read' 

Please review and marge if you feel like this is something that people might find useful. Thanks for your time and effort. Cheers!

@redapple redapple added the S3 label Mar 2, 2017
Copy link

kmike commented Mar 2, 2017

Hey @otobrglez! Thanks for the pull request; I think it is a good feature to have.


  1. Does S3 still works without defining new options? Should we add them to
  2. Could you please add docs for new options?

Copy link

codecov-io commented Mar 19, 2017

Codecov Report

Merging #2609 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2609      +/-   ##
+ Coverage   84.48%   84.53%   +0.04%     
  Files         164      164              
  Lines        9464     9291     -173     
  Branches     1429     1381      -48     
- Hits         7996     7854     -142     
+ Misses       1205     1179      -26     
+ Partials      263      258       -5
Impacted Files Coverage Δ
scrapy/pipelines/ 68.01% <100%> (-1.75%) ⬇️
scrapy/commands/ 94.73% <0%> (-5.27%) ⬇️
scrapy/utils/ 88.5% <0%> (-3.3%) ⬇️
scrapy/extensions/ 44.89% <0%> (-0.56%) ⬇️
scrapy/settings/ 98.6% <0%> (-0.22%) ⬇️

Copy link
Contributor Author

Hey @kmike! 👋

  1. It works without defining and changes to The underlaying botocore library has predefined default values,...
  2. I've added documentation to settings.rst and media-pipeline.rst. Hope this is enough?


  • Oto

Copy link
Contributor Author

@kmike or @redapple - can someone please merge this? :) Cheers!

Copy link
Contributor Author

@kmike @redapple Ping. Anyone merging this?

@dangra dangra added this to the v1.6 milestone Dec 31, 2017
@dangra dangra merged commit b170e9f into scrapy:master Jan 8, 2018
Copy link

dangra commented Jan 8, 2018

Useful and implementation looks clean. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging this pull request may close these issues.

None yet

5 participants