Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #960: S3 Feed Export throws boto error #5833

Merged
merged 16 commits into from Jun 13, 2023

Conversation

jazzthief
Copy link
Contributor

@jazzthief jazzthief commented Feb 17, 2023

Fixes #960, closes #5735 by switching from botocore to boto3 upload method, which supports multipart upload.

  • Document the need to install boto3 for S3 support, rather than botocore.
  • Implement code that uploads using boto3’s method instead of botocore to upload files. It seems the interfaces are similar.
  • If boto3 is not installed, but botocore is, fall back to the current implementation, and log a deprecation warning.

@jazzthief
Copy link
Contributor Author

jazzthief commented Feb 17, 2023

This works fine with put_object(). Switching to upload_fileobj() fails a number of tests because of incompatibility - they probably need to be modified to support it.
Currently I'm thinking that existing tests should remain and cover only botocore's functionality, and I should add modified tests for boto3's functionality, possibly splitting them into single and multipart upload cases.

If I am on the right path, I would really appreciate it if someone could point me in the right direction with the tests here: will I be fine just trying to do the same thing as the existing tests?

@jazzthief
Copy link
Contributor Author

jazzthief commented Feb 22, 2023

Fixed my typo in a call to botocore get_session() - now botocore functionality works as it should

@jazzthief
Copy link
Contributor Author

jazzthief commented Feb 23, 2023

According to this issue, upload_file() cannot be stubbed directly with botocore's Stubber that is currently used in the tests. I am guessing I can trace which exact methods upload_file() calls under the hood and stub them - or try a library like placebo as botocore's maintainer suggested in the issue.
@Gallaecio, could you please have a look?

@Gallaecio
Copy link
Member

I will have a look, though it might take me some days 😓

@codecov
Copy link

codecov bot commented Feb 25, 2023

Codecov Report

Merging #5833 (7de7ebe) into master (d60b4ed) will decrease coverage by 0.03%.
The diff coverage is 55.55%.

❗ Current head 7de7ebe differs from pull request most recent head cb67bc1. Consider uploading reports for the commit cb67bc1 to get more accurate results

@@            Coverage Diff             @@
##           master    #5833      +/-   ##
==========================================
- Coverage   88.84%   88.82%   -0.03%     
==========================================
  Files         162      162              
  Lines       11055    11068      +13     
  Branches     1800     1802       +2     
==========================================
+ Hits         9822     9831       +9     
- Misses        954      957       +3     
- Partials      279      280       +1     
Impacted Files Coverage Δ
scrapy/extensions/feedexport.py 93.14% <55.55%> (-2.31%) ⬇️

... and 2 files with indirect coverage changes

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change look good, I’ll try running the tests locally and see if I can figure out how to address the issues you mention.

docs/topics/feed-exports.rst Outdated Show resolved Hide resolved
docs/topics/feed-exports.rst Outdated Show resolved Hide resolved
scrapy/extensions/feedexport.py Outdated Show resolved Hide resolved
@jazzthief
Copy link
Contributor Author

The change look good, I’ll try running the tests locally and see if I can figure out how to address the issues you mention.

Got you, I will address your comments and wait for any thoughts you might have on tests

Gallaecio and others added 2 commits March 15, 2023 16:38
Adapt tests for a switch to boto3 with botocore support for backward compatibility
@Gallaecio
Copy link
Member

@jazzthief I see the extra-deps-pinned environment I added is causing issues.

If you can look into it while you work on my earlier feedback, that would be great. But I can take a look myself as well if you prefer.

@jazzthief
Copy link
Contributor Author

If you can look into it while you work on my earlier feedback, that would be great. But I can take a look myself as well if you prefer.

I'll have a look. One question, though: I see is_boto3_available() used in tests - should I still switch everything to an IS_BOTO3_AVAILABLE variable inside feedexport.py or do we leave it like that?

@Gallaecio
Copy link
Member

I think it may be worth it switching to a variable, I just did not want to make it as part of my changes, and instead focus on the test part.

@jazzthief
Copy link
Contributor Author

@Gallaecio I addressed your comments.

Regarding extra-deps-pinned: from what I understood from boto3 changelog, the API for upload was completely implemented in 1.1.4, however it seems something was changed later, so I've put 1.20.0 into extra-deps-pinned config as well as the docs. This passed the tests locally.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like this is ready for review. Great job!

@jazzthief jazzthief marked this pull request as ready for review March 17, 2023 09:35
@jazzthief
Copy link
Contributor Author

Seems to me like this is ready for review. Great job!

Thanks for your help! Do you think we should create an issue on reworking boto3 tests?

@Gallaecio
Copy link
Member

reworking boto3 tests

Could you elaborate?

As far as I am concerned, test-wise we are covered here. While something in line with Stub would be great, I think mocking on our side just to make sure that we are mapping parameters correctly to what boto3 expects is a good approach, and not that different from stubbing.

@jazzthief
Copy link
Contributor Author

I think mocking on our side just to make sure that we are mapping parameters correctly to what boto3 expects is a good approach, and not that different from stubbing.

That's what I meant. Got you, great then.

@wRAR wRAR added this to the Scrapy 2.10 milestone May 4, 2023
@wRAR
Copy link
Member

wRAR commented Jun 13, 2023

Thanks!

@wRAR wRAR merged commit 85fe88f into scrapy:master Jun 13, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 backend can't handle uploads larger than 5GB S3 Feed Export throws boto error: "Connection Reset By Peer"
4 participants