-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How would I crawl a single site with multiple pages in parallel? #24
Comments
Abot's PoliteWebCrawler alone can crawl multiple PAGES of a single site concurrently. AbotX's ParallelCrawlerEngine is to manage multiple instances of Abot's PoliteWebCrawler instances, effectively allowing you to crawl multiple SITES concurrently. Example shows how to get the content of a crawled page.
|
Thank-you for the info. Very helpful One last clarification. If I have a website and I want it to crawl specific pages, not the whole site, do I have to make it crawl the entire site? How do I direct it to crawl, say paged content. For example:
...and I do not want it to crawl
Thank-you for your patience! |
Hi,
Thanks for the product!
Apologies for the many questions.
How would I crawl a single site with multiple pages in parallel?
Do I need AbotX or Abot would do?
Do I need to loop through the list of sites if I can only do 3 at a time for the free version?
Is it ideal to have this in a job that keeps track of runs?
Also it doesn't say which part of the code I get the crawled data...is it in
crawlEngine.SiteCrawlCompleted
, after thelock(crawlCounts){...}
statment?Example
The text was updated successfully, but these errors were encountered: