Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue where you left off feature is not working robustly #5

Open
AriaShishegaran opened this issue Nov 30, 2023 · 0 comments
Open

Comments

@AriaShishegaran
Copy link

I believe there's a problem when crawling massive documentation portals in a scenario where the crawler cannot handle a certain condition and exists suddenly, leading to the continuation task to start from the beginning but not intelligently, therefore leading to all the previous pages being checked before moving forward with the new download. This is my constant experience with stripe's documentation portal where every time after around an hour the crawler fails to move forward and there are no option to move forward exactly where you left off.
my suggestion is to switch the default behavior to continue where you left off without previous check, since I assume what you're doing with the current implantation is to also detect if there any changes and apply them as well on the existing files instead of literally sticking to the current state of the progress and attempting to finish the crawl job which is taking massive times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant