Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 file store persist_file should accept all supported headers #3905

Merged
merged 1 commit into from
Jul 29, 2019

Conversation

lucywang000
Copy link
Member

Fix #3904.

The new headers mapping are read from botocore service json file:

import requests, json
r = requests.get('https://raw.githubusercontent.com/boto/botocore/master/botocore/data/s3/2006-03-01/service-2.json')
s3_service_json = r.json()
members = s3_service_json['shapes']['PutObjectRequest']['members']

def to_camel_case(s):
    return '-'.join(x.upper() if x.lower() in ('acp', 'md5') else x.capitalize() for x in s.split('-'))

print({
    to_camel_case(v['locationName'].encode()): k.encode()  
    for k, v in members.items()
    if v.get('location') == 'header' and k.lower() not in ('meta', 'acl' )
})

Tested manually with:

import os
from twisted.internet import reactor

os.environ['AWS_ACCESS_KEY_ID'] = 'xxx'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'xxx'

from scrapy.pipelines.files import S3FilesStore
from io import BytesIO

s3_file_store = S3FilesStore('s3://bucket/')
dfd = s3_file_store.persist_file('test.txt', BytesIO(b'test'), None, None, {'x-amz-storage-class': 'STANDARD_IA'})

reactor.callLater(2, reactor.stop)
reactor.run()
print(dfd.result)

The result is:

{'ResponseMetadata': {'RequestId': '...',
  'HostId': '...',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '...',
   'x-amz-request-id': '...',
   'date': 'Fri, 26 Jul 2019 01:24:08 GMT',
   'etag': '"..."',
   'x-amz-storage-class': 'STANDARD_IA',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ETag': '"..."'}

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucywang000!

@Gallaecio
Copy link
Member

Nice!

@Gallaecio Gallaecio closed this Jul 29, 2019
@Gallaecio Gallaecio reopened this Jul 29, 2019
@codecov
Copy link

codecov bot commented Jul 29, 2019

Codecov Report

Merging #3905 into master will decrease coverage by 0.6%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3905      +/-   ##
==========================================
- Coverage   85.47%   84.87%   -0.61%     
==========================================
  Files         165      165              
  Lines        9624     9624              
  Branches     1446     1446              
==========================================
- Hits         8226     8168      -58     
- Misses       1145     1193      +48     
- Partials      253      263      +10
Impacted Files Coverage Δ
scrapy/pipelines/files.py 65.38% <ø> (-1.16%) ⬇️
scrapy/core/downloader/handlers/s3.py 62.9% <0%> (-32.26%) ⬇️
scrapy/utils/boto.py 46.66% <0%> (-26.67%) ⬇️
scrapy/core/downloader/tls.py 75.92% <0%> (-12.97%) ⬇️
scrapy/utils/ssl.py 51.42% <0%> (-5.72%) ⬇️
scrapy/extensions/feedexport.py 78.44% <0%> (-5.05%) ⬇️
scrapy/utils/trackref.py 83.78% <0%> (-2.71%) ⬇️
scrapy/core/downloader/handlers/http11.py 89.92% <0%> (-2.62%) ⬇️
scrapy/core/scraper.py 86.48% <0%> (-2.03%) ⬇️

@Gallaecio Gallaecio merged commit 06c093f into scrapy:master Jul 29, 2019
@Gallaecio
Copy link
Member

Wrong button 😳

@lucywang000 lucywang000 deleted the 0.001 branch July 30, 2019 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3FilesStore has incomplete header to botocore mapping
3 participants