Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Py3 S3 botocore #1761

Merged
merged 22 commits into from Feb 18, 2016
Merged

[MRG+1] Py3 S3 botocore #1761

merged 22 commits into from Feb 18, 2016

Conversation

@lopuhin
Copy link
Member

@lopuhin lopuhin commented Feb 5, 2016

The main idea is that we try to use botocore by default, falling back to boto on Py2. botocore is more recent and has better Py3 support. Should fix #1718 (see comments there about some changes in test). boto code path is still tested in precise environment.

@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 5, 2016

Ah, scrapy.pipelines.files.S3FilesStore and scrapy.extensions.feedexport.S3FeedStorage also require boto. What is better - porting them to botocore/boto3, while leaving boto support, or requiring both boto and botocore on Python 3?

@eliasdorneles
Copy link
Member

@eliasdorneles eliasdorneles commented Feb 5, 2016

I think porting it to botocore/boto3 with a fallback to boto would be good.
Things would work normally for people in Python 2 / boto, and for people using Python 3, having botocore/boto3 would be sufficient.

raise NotConfigured("missing botocore or boto library")
try:
self.conn = _S3Connection(
aws_access_key_id, aws_secret_access_key, **kw)

This comment has been minimized.

@eliasdorneles

eliasdorneles Feb 5, 2016
Member

so, this is only done for boto library, yes?

there is one detail here, because the keyword arguments allow you to set is_secure=False, proxy and debug options -- this won't be possible with botocore/boto3, right?

This comment has been minimized.

@lopuhin

lopuhin Feb 5, 2016
Author Member

Hm, indeed, I missed them - they might also affect headers, at least proxy, and I think botocore does not support it. I don't see yet if debug and is_secure can currently affect anything.

This comment has been minimized.

@eliasdorneles

eliasdorneles Feb 5, 2016
Member

Though I don't know if this even matters, because looking here the download never gets any other argument apart from the Settings in production code, only in the test code.

Also, looks like the previous code only mutates the keyword arguments for the anon case, which from the comments seems to be inferred when no keys are provided.

So, I guess this would only break for users who are subclassing this and upgrading to Python 3, and perhaps that's acceptable (better than not working in PY3 anyway).

This comment has been minimized.

@lopuhin

lopuhin Feb 5, 2016
Author Member

Good, then I can also add a check that no other keyword args are passed in botocore codepath, to avoid silently discarding them.

This comment has been minimized.

@lopuhin lopuhin changed the title Py3 s3 botocore [WIP] Py3 s3 botocore Feb 5, 2016
@lopuhin lopuhin changed the title [WIP] Py3 s3 botocore [WIP] Py3 3 botocore Feb 5, 2016
@lopuhin lopuhin changed the title [WIP] Py3 3 botocore [WIP] Py3 S3 botocore Feb 5, 2016
@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 15, 2016

Just checked that there is a test for S3FeedStorage that is obviously not run on travis (this requires FEEDTEST_S3_URI env variable). S3FilesStore is missing such tests - if the boto import is commented out, it's tests pass.
So probably the way forward is to write a test against real s3 for S3FilesStore (similar to S3FeedStorage) and test locally for now. If we want to run these tests on travis, then something like this https://github.com/jubos/fake-s3 could work, but this is a separate issue I believe.

@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 15, 2016

Ah, no, it should be possible to do tests against real s3 on travis: https://docs.travis-ci.com/user/environment-variables/ - this way we could set up an s3 bucket for tests and use it without revealing aws credentials.

@codecov-io
Copy link

@codecov-io codecov-io commented Feb 15, 2016

Current coverage is 83.10%

Merging #1761 into master will decrease coverage by -0.22% as of bc9023b

Powered by Codecov. Updated on successful CI builds.

@lopuhin lopuhin force-pushed the lopuhin:py3-s3-botocore branch from be015c1 to d1ecb8c Feb 15, 2016
@lopuhin lopuhin changed the title [WIP] Py3 S3 botocore Py3 S3 botocore Feb 15, 2016
@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 15, 2016

This is ready for review: now we try to use botocore, falling back to boto on python 2. Tests again real S3 are not run on travis, I run them locally like this:

AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx \
    S3_TEST_FILE_URI=s3://kostia.lopuhin/scrapy-test \
    tox  -e py34 -- tests/test_pipeline_files.py::TestS3FilesStore \
                    tests/test_feedexport.py::S3FeedStorageTest
except ImportError:
if six.PY2:
try:
import boto

This comment has been minimized.

@kmike

kmike Feb 15, 2016
Member

I can't confidently tell if this imports boto from site-packages or from scrapy.utils.boto. What about adding absolute_imports future import to this file?

This comment has been minimized.

@lopuhin

lopuhin Feb 15, 2016
Author Member

Ouch, thanks for the catch! Fixed.

Body=buf,
Metadata={k: str(v) for k, v in six.iteritems(meta)},
ACL=self.POLICY)
else:

This comment has been minimized.

@kmike

kmike Feb 17, 2016
Member

headers and self.HEADERS are not used for botocore, is this intended?

This comment has been minimized.

@lopuhin

lopuhin Feb 17, 2016
Author Member

Yes - botocore does not accept them. But it allows quite a lot of customization: https://botocore.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.put_object so maybe if we know what are headers commonly used for, we could fish something from them and pass to put_object. Perhaps I just missed how headers could be used.

This comment has been minimized.

@lopuhin

lopuhin Feb 17, 2016
Author Member

Ah, sorry, I meant headers, not self.HEADERS - they really should be used, I'll fix it.

This comment has been minimized.

@kmike

kmike Feb 17, 2016
Member

Headers we send here are used when response is retreived by a client from S3 HTTP link. So e.g. you can't view an uploaded image using a browser unless Content-Type: image/jpeg header is set. This feature is used in ImagesPipeline. Cache-Control header is also commonly used.

This comment has been minimized.

@lopuhin

lopuhin Feb 18, 2016
Author Member

I see, thanks! I added conversion of all headers supported by botocore, and check that headers are set in a test.

@@ -78,6 +79,12 @@ def values(self):
def to_string(self):
return headers_dict_to_raw(self)

def to_native_string_dict(self):

This comment has been minimized.

@kmike

kmike Feb 17, 2016
Member

Hm, method name is 'to_native_string_dict', but keys and values are converted to unicode - why is that?

This comment has been minimized.

@lopuhin

lopuhin Feb 17, 2016
Author Member

Yes, don't know what I was thinking. to_unicode_dict should be better.

This comment has been minimized.

@lopuhin

lopuhin Feb 18, 2016
Author Member

Fixed

@kmike kmike changed the title Py3 S3 botocore [MRG+1] Py3 S3 botocore Feb 18, 2016
@kmike
Copy link
Member

@kmike kmike commented Feb 18, 2016

LGTM.

redapple added a commit that referenced this pull request Feb 18, 2016
[MRG+1] Py3 S3 botocore
@redapple redapple merged commit da36b7d into scrapy:master Feb 18, 2016
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 47.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@redapple
Copy link
Contributor

@redapple redapple commented Feb 18, 2016

thanks @lopuhin !

@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 18, 2016

Thanks for careful review!

There are some tests against real S3 that are currently skipped on travis (S3FeedStorageTest, TestS3FilesStore). It would be nice to run, but there are some problems: tests might be unstable if S3 is down, and this depends on someone providing AWS keys in encrypted form, and paying the (tiny) bills :) So should we add them? If yes, I'll create an issue.

@redapple
Copy link
Contributor

@redapple redapple commented Feb 18, 2016

Right. can you create the issue?

@kmike
Copy link
Member

@kmike kmike commented Feb 18, 2016

I like https://github.com/jubos/fake-s3 approach more because this way users will be able to run S3 tests locally without setting up AWS and without Internet access.

@lopuhin
Copy link
Member Author

@lopuhin lopuhin commented Feb 18, 2016

Ok, I'll put this all into the issue, and then can check if fake-s3 will work for us.

redapple added a commit that referenced this pull request Feb 18, 2016
redapple added a commit that referenced this pull request Feb 18, 2016
PR #1761
Fixes #1718
redapple added a commit that referenced this pull request Feb 18, 2016
Originally from #1761
@lopuhin lopuhin deleted the lopuhin:py3-s3-botocore branch Feb 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.