You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found Abot a few days ago, and we try its free version to see if it can meet our needs.
Everything worked fine until we noticed that it crawls urls which were 'Disallow' in robots.txt.
After some debugging, we ended up that it binds robots.txt with the initial uri scheme which is provided with the site to crawl, eg for the site https://mysite.com disallowed urls works only for https. If there is a link to http://mysite.com/somepage, Abot will ignore robots.txt and will crawl it.
*Assuming we have the following Robots.txt
User-agent: *
Disallow: /somepage
Could you help us how to deal with this issue?
Thank you
The text was updated successfully, but these errors were encountered:
Hello,
We found Abot a few days ago, and we try its free version to see if it can meet our needs.
Everything worked fine until we noticed that it crawls urls which were 'Disallow' in robots.txt.
After some debugging, we ended up that it binds robots.txt with the initial uri scheme which is provided with the site to crawl, eg for the site https://mysite.com disallowed urls works only for https. If there is a link to http://mysite.com/somepage, Abot will ignore robots.txt and will crawl it.
*Assuming we have the following Robots.txt
User-agent: *
Disallow: /somepage
Could you help us how to deal with this issue?
Thank you
The text was updated successfully, but these errors were encountered: