Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 file store persist_file should accept all supported headers #3905

Merged
merged 1 commit into from Jul 29, 2019

Conversation

lucywang000
Copy link
Member

@lucywang000 lucywang000 commented Jul 26, 2019

Fix #3904.

The new headers mapping are read from botocore service json file:

import requests, json
r = requests.get('https://raw.githubusercontent.com/boto/botocore/master/botocore/data/s3/2006-03-01/service-2.json')
s3_service_json = r.json()
members = s3_service_json['shapes']['PutObjectRequest']['members']

def to_camel_case(s):
    return '-'.join(x.upper() if x.lower() in ('acp', 'md5') else x.capitalize() for x in s.split('-'))

print({
    to_camel_case(v['locationName'].encode()): k.encode()  
    for k, v in members.items()
    if v.get('location') == 'header' and k.lower() not in ('meta', 'acl' )
})

Tested manually with:

import os
from twisted.internet import reactor

os.environ['AWS_ACCESS_KEY_ID'] = 'xxx'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'xxx'

from scrapy.pipelines.files import S3FilesStore
from io import BytesIO

s3_file_store = S3FilesStore('s3://bucket/')
dfd = s3_file_store.persist_file('test.txt', BytesIO(b'test'), None, None, {'x-amz-storage-class': 'STANDARD_IA'})

reactor.callLater(2, reactor.stop)
reactor.run()
print(dfd.result)

The result is:

{'ResponseMetadata': {'RequestId': '...',
  'HostId': '...',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '...',
   'x-amz-request-id': '...',
   'date': 'Fri, 26 Jul 2019 01:24:08 GMT',
   'etag': '"..."',
   'x-amz-storage-class': 'STANDARD_IA',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ETag': '"..."'}

kmike
kmike approved these changes Jul 26, 2019
Copy link
Member

@kmike kmike left a comment

Thanks @lucywang000!

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Jul 29, 2019

Nice!

@Gallaecio Gallaecio closed this Jul 29, 2019
@Gallaecio Gallaecio reopened this Jul 29, 2019
@codecov
Copy link

@codecov codecov bot commented Jul 29, 2019

Codecov Report

Merging #3905 into master will decrease coverage by 0.6%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3905      +/-   ##
==========================================
- Coverage   85.47%   84.87%   -0.61%     
==========================================
  Files         165      165              
  Lines        9624     9624              
  Branches     1446     1446              
==========================================
- Hits         8226     8168      -58     
- Misses       1145     1193      +48     
- Partials      253      263      +10
Impacted Files Coverage Δ
scrapy/pipelines/files.py 65.38% <ø> (-1.16%) ⬇️
scrapy/core/downloader/handlers/s3.py 62.9% <0%> (-32.26%) ⬇️
scrapy/utils/boto.py 46.66% <0%> (-26.67%) ⬇️
scrapy/core/downloader/tls.py 75.92% <0%> (-12.97%) ⬇️
scrapy/utils/ssl.py 51.42% <0%> (-5.72%) ⬇️
scrapy/extensions/feedexport.py 78.44% <0%> (-5.05%) ⬇️
scrapy/utils/trackref.py 83.78% <0%> (-2.71%) ⬇️
scrapy/core/downloader/handlers/http11.py 89.92% <0%> (-2.62%) ⬇️
scrapy/core/scraper.py 86.48% <0%> (-2.03%) ⬇️

@Gallaecio Gallaecio merged commit 06c093f into scrapy:master Jul 29, 2019
1 of 3 checks passed
@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Jul 29, 2019

Wrong button 😳

@lucywang000 lucywang000 deleted the 0.001 branch Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants