Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FilesStore has incomplete header to botocore mapping #3904

davkuyek opened this issue Jul 25, 2019 · 1 comment · Fixed by #3905

S3FilesStore has incomplete header to botocore mapping #3904

davkuyek opened this issue Jul 25, 2019 · 1 comment · Fixed by #3905


Copy link

@davkuyek davkuyek commented Jul 25, 2019

If botocore is installed, Scrapy's S3FilesStore allows a limited number of headers to be applied when persisting files to S3. The headers are converted to their botocore s3 put_object method equivalent argument. The mapping however is incomplete.

Put-object request headers
Botocore arguments

Missing headers and their equivalent options:

x-amz-storage-class -> StorageClass
x-amz-tagging -> Tagging
x-amz-website -redirect-location -> WebsiteRedirectLocation
x-amz-object-lock-mode -> ObjectLockMode
x-amz-object-lock-retain-until-date -> ObjectLockRetainUntilDate
x-amz-object-lock-legal-holdv -> ObjectLockLegalHoldStatus
x-amz-server-side -encryption -> ServerSideEncryption
x-amz-server-side-encryption-aws-kms-key-id -> SSEKMSKeyId
x-amz-server-side-encryption-context -> SSEKMSEncryptionContext
x-amz-server-side -encryption -customer-algorithm -> SSECustomerAlgorithm
x-amz-server-side -encryption -customer-key -> SSECustomerKey

If you don't have botocore installed you are able to persist a file with the storage class STANDARD_IA using the header x-amz-storage-class. With botocore, if you try to do the same thing, the type error Header "x-amz-storage-class" is not supported by botocore will be raised.

Test code:

from scrapy.pipelines.files import S3FilesStore
from io import BytesIO

s3_file_store = S3FilesStore('s3://your-bucket/')
s3_file_store.persist_file('s3://your-bucket/your-key', BytesIO(b'test'), None, None, {'x-amz-storage-class': 'STANDARD_IA'})

scrapy version -v
Scrapy : 1.4.0
lxml :
libxml2 : 2.9.9
cssselect : 1.0.3
parsel : 1.5.1
w3lib : 1.20.0
Twisted : 19.2.1
Python : 2.7.10 (default, Feb 22 2019, 21:55:15) - [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)]
pyOpenSSL : 16.2.0 (OpenSSL 1.1.1c 28 May 2019)
Platform : Darwin-18.6.0-x86_64-i386-64bit

Copy link

@lucywang000 lucywang000 commented Jul 26, 2019

I just submitted a fix in #3905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging a pull request may close this issue.

2 participants