Job data in scrapyd is only kept in memory and thus removed when Scrapyd is restarted. We should store job data (in a sqlite or something) so that it persists after restarts. This is particularly important for completed jobs.
+1, I'd love to see this implemented!
Pull requests wanted!
Need pointers to which part of the code to look at for the in-memory objects we want to persist, where they are being retrieved and modified.
I would look at the poller and the runner.
This non-trivial update can flow more smoothly if it's optional at first.
E.g. in scrapyd's configuration one may specify a class that handles persistent storage.
(The runner is already set by the configuration, the poller indirectly from the app)
When it's optional+configurabe like this, one can contribute his own class,
e.g. for another database backend
for others to appreciate, use, crash, debug, patch, contribute back.
Why not use redis?