Persist job data in Scrapyd #12

Open
pablohoffman opened this Issue Apr 26, 2013 · 6 comments

Projects

None yet

5 participants

@pablohoffman
Member

Job data in scrapyd is only kept in memory and thus removed when Scrapyd is restarted. We should store job data (in a sqlite or something) so that it persists after restarts. This is particularly important for completed jobs.

@pablohoffman pablohoffman referenced this issue in scrapy/scrapy Apr 26, 2013
Closed

Persist job data in Scrapyd #173

@kjagiello

+1, I'd love to see this implemented!

@pablohoffman
Member

Pull requests wanted!

@dtheodor

👍

@dtheodor

Need pointers to which part of the code to look at for the in-memory objects we want to persist, where they are being retrieved and modified.

@Digenis
Collaborator
Digenis commented Aug 21, 2015

I would look at the poller and the runner.
This non-trivial update can flow more smoothly if it's optional at first.
E.g. in scrapyd's configuration one may specify a class that handles persistent storage.
(The runner is already set by the configuration, the poller indirectly from the app)
When it's optional+configurabe like this, one can contribute his own class,
e.g. for another database backend
for others to appreciate, use, crash, debug, patch, contribute back.

@Digenis Digenis added the enhancement label Jan 21, 2016
@lepture
lepture commented Feb 3, 2016

Why not use redis?

@Digenis Digenis added this to the 1.3.0 milestone Jan 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment