-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch deliveries for long running crawlers #4250
Comments
This will need unique file names, right? |
Indeed. |
Anyone working on this? Wanted to get started contributing to this project by jumping into this issue if no one else is. |
@alake16 |
Once you have something, you can create a draft PR in Github so we can follow through there. |
@BroodingKangaroo how's it going for you? Blocked on anything at the moment? |
@BroodingKangaroo , my idea is that it should be partial deliveries. |
Hi @ejulio ! =) Also I want to introduce myself. I have some questions regarding this issue.
|
@BroodingKangaroo , great! |
would it be possible for someone to write a tutorial on how to use this inside a very basic Scrapy crawler? I would love to be able to get started with it but don't know where to start to be honest. Thanks! |
@dipiana it is not merged yet. |
Oh okay, thanks. Any idea when I could use it? This is exactly what I was looking for! :) |
@dipiana I don't know exactly, but when I finish I will inform you from this thread. |
@ejulio Thank you for your reviews at #4434, I will correct the comments, of course. =)
|
Hello, @ejulio! |
@BroodingKangaroo .
|
Summary
Add a new setting
FEED_STORAGE_BATCH
that will deliver a file wheneveritem_scraped_count
reaches a multiple of that number.Motivation
For long running jobs (say we are consuming inputs from a working queue) we may want partial results instead of waiting for a long batch to finish.
Describe alternatives you've considered
Of course we can stop and restart a spider every now and then.
However, a simpler approach is to have it running as long as required, but delivering partial results.
The text was updated successfully, but these errors were encountered: