-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot save state when restarting scrapydweb #21
Comments
Actually, it's an issue of Scrapyd. I would figure it out in the next release. |
Yes it would be awesome to support this feature ! |
I just implemented a snapshot mechanism for the Dashboard page so that you still can check out the last view of it in case the Scrapyd service is restarted. |
I mean the graphic that shows the number of items stored by minute, the number of pages crawled by minute, and the other graph giving the progression of the global amount of crawled pages / stored items |
The stats and graphs of a job would be available as long as the json file generated by LogParser or the original logfile exist. |
But why emphasizing "after a ScrapydWeb restart"? Is there anything wrong with v1.1.0? |
Well indeed I’ve launched a job. It has finished, then I’ve restarted scrapydweb and scrapyd also. So I guess scrapydweb doesn’t show anymore the finished job and as a result I cannot get anymore the stats and graph of the job I imagine that if scrapydweb persist the finished job(next release) then I’ll be able also to see the graph built in real time Is it right ? |
Well I've just noticed that I could see the graph of the job when going to the Files > Logs section, a nice column let see the graph for all log files, which is perfect for me! With a snapshot of the dashboard, it will be even better! |
As I told you before: "You can check out the log of finished jobs in the Logs page for the time being" |
Also note that the json files generated by LogParser would also be removed by Scrapyd when it deletes the original logfiles. |
v1.2.0: Persist jobs information in the database |
Hi, I think what's best is to make scrapyd more modular |
Is there any plans to add this feature in future releases? There are a lot of cases when it's nice to be able to restore failed parser right from where it stopped, so that already scheduled requests won't be lost, just like it implemented in SpiderKeeper. |
@goshaQ |
@my8100 Btw, just noticed that Items section show error if scrapyd doesn't return them, which is normal if the result is written to database. It looks for me for the same reason on Jobs section keep show the red tip that tells to install logparser to show number of parsed items, even after I've installed logparser and launched it. Or am I doing something wrong? Sorry for unrelated question. |
|
Thanks, that's what I was looking for. But it appears that there is no
The reply is No Such Resource. |
Restart logparser and post the full log. |
|
Check if SCRAPYD_LOGS_DIR/stats.json exists. |
There is a file .json, but it's named the same as .log file. |
Check if SCRAPYD_LOGS_DIR/stats.json exists. |
Did you see the comment below? |
I'm running scrapydweb in docker
I can start job and I can then see some statistics, I also see the finished jobs, it's perfect
However, when I restart my container, I loose this state. I no more see the finished jobs for example
=> what is the data to persist in my docker container, so that I can see everything when I restart the container ?
I've tried to persist /usr/local/lib/python3.6/site-packages/scrapydweb/data, but it doesn't seem to do the trick
spiderkeeper keeps all its state in a SpiderKeeper.db which is perfect to keep state on container restart
Any idea how to have the same stuff with scrapydweb ?
Thanks again for your work !
The text was updated successfully, but these errors were encountered: