Skip to content

How to resume a crawl for later #506

Answered by ato
JenPho asked this question in Q&A
Discussion options

You must be logged in to vote

The primary way to stop and later resume a crawl is by creating a checkpoint (see wiki page). It seems I overlooked including this wiki page in the operating guide, sorry about that, I'll update it. (Edit: Done as of b328ded)

While the recovery log can be used in a pinch I believe it's really intended as a fallback option in case the crawler crashes or the crawl state becomes corrupted and a usable checkpoint is not available. I'm not certain of the disadvantages of using the recovery log but I'd guess that some crawl state might be incorrect (perhaps the statistics?) and for a large crawl it could take a lot more time to replay the recovery log than loading a snapshot would.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ato
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
2 participants
Converted from issue

This discussion was converted from issue #500 on September 30, 2022 00:43.