Skip to content

S3 Feed Export throws boto error: "Connection Reset By Peer" #960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
samzhang111 opened this issue Nov 25, 2014 · 9 comments · Fixed by #5833 · May be fixed by #4077
Closed

S3 Feed Export throws boto error: "Connection Reset By Peer" #960

samzhang111 opened this issue Nov 25, 2014 · 9 comments · Fixed by #5833 · May be fixed by #4077

Comments

@samzhang111
Copy link

Posted details on SO: http://stackoverflow.com/questions/27131693/scrapyd-s3-feed-export-connection-reset-by-peer

@kmike
Copy link
Member

kmike commented Nov 25, 2014

Scrapy uses boto for feed exports, so it is likely a boto issue (the one you've linked at SO, boto/boto#2207). Do you know a workaround?

@jersub
Copy link

jersub commented Mar 16, 2016

To send big feed files to S3 and avoid this bug, the fix is to use multipart upload. PR #1559 used to implement this for boto2.

@redapple
Copy link
Contributor

It would be great if someone can resurrect this WIP #1559

@Gallaecio
Copy link
Member

Looking at the upstream API, it seems like implementing this change is a matter of:

  • Documenting the need to install boto3 for S3 support, rather than botocore.
  • Implement code that uploads using boto3’s method instead of botocore to upload files. It seems the interfaces are similar.
  • If boto3 is not installed, but botocore is, fall back to the current implementation, and log a deprecation warning.

@Gallaecio
Copy link
Member

#4077 is an interesting approach, which starts uploading to S3 right away, rather than storing the whole output on disk first, and then uploading it.

But I would rather have a simpler approach than none. We can always implement the #4077 approach afterwards.

@Gallaecio
Copy link
Member

See also the workaround by @ogabrielsantos.

@jazzthief
Copy link
Contributor

I'd like to have a go at the simpler approach, if that's alright.

@jazzthief
Copy link
Contributor

@Gallaecio Hey, I did the "check libraries availability" part, it passes the tests with put_object method. Now I'm working on replacing it with boto3's upload_fileobj - it fails several store tests because it tries to compare a MagicMock attribute to int internally. I've tried to work around it, but looks like the tests need to be altered to support the new method. Am I going in the right direction?

@Gallaecio
Copy link
Member

Am I going in the right direction?

It sounds like you are. But feel free to open a draft pull request, and I can have a look.

wRAR added a commit that referenced this issue Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants