-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should it be possible to add "depth" in the data hash ? #28
Comments
I'm assuming you mean minimum depth. One of the misconceptions with navigation is that there is one way to reach a page. The depth of a page can be different depending on the route you take to get to the page. Also, where is the homepage? Is it the page you started the crawl from or the url with the shortest url? If we took it as the first page that was crawled and passed a depth number down with the crawl it would not be guaranteed to give accurate results as each page is only processed once, and if there was a page that was linked to from the homepage (depth 1) but was actually crawled based on a sub page of the homepage it would have a depth of 2. Its something to think about, I suppose if you specified a page as the root and then processed all pages crawled after completion for the shortest route (we have the data for that) then that would give the most accurate results. But again, html navigation is not a tree structure, its a node graph with multiple parents and interconnections. |
Thats correct, and that it would be inaccurate to report Lets say, we start from the seed url, and we only prefer to go 2 pages deep within the navigation. Is that possible with CobWeb? This is certainly possible with Anemone crawler, but it is an old gem, now. I love the way CobWeb uses Sidekiq/Resque jobs, and would really prefer to limit the crawl depth for the crawler. Between, thanks again for the awesome gem. Really useful. |
I agree on both points, this is a really cool gem 👍 and would like to have a "max_depth" option. I totally understand that we're not dealing with tree data and that "depth" is relative, but it would still be useful. The nice thing it would give you is a chance to do a quick test of the "core" links from a page, following just a couple without processing the entire site so you can preview some results without waiting for the whole site to process. |
Hello,
As far as I can see, the generated hash for each page doesn't include the "depth" information, that is to say how many clicks from the homepage each page is distant.
Do you think it could be possible to add this option in the hash ?
By the way, I really appreciate your gem, good work Stewart !
Thanks.
The text was updated successfully, but these errors were encountered: