Robots.txt is not reloaded when uri scheme is changed (http/https) #17

askoutaris · 2018-03-21T10:04:06Z

Hello,

We found Abot a few days ago, and we try its free version to see if it can meet our needs.

Everything worked fine until we noticed that it crawls urls which were 'Disallow' in robots.txt.

After some debugging, we ended up that it binds robots.txt with the initial uri scheme which is provided with the site to crawl, eg for the site https://mysite.com disallowed urls works only for https. If there is a link to http://mysite.com/somepage, Abot will ignore robots.txt and will crawl it.

*Assuming we have the following Robots.txt
User-agent: *
Disallow: /somepage

Could you help us how to deal with this issue?
Thank you

sjdirect · 2018-03-21T16:51:18Z

Duplicated on abot thread sjdirect/abot#182

sjdirect closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robots.txt is not reloaded when uri scheme is changed (http/https) #17

Robots.txt is not reloaded when uri scheme is changed (http/https) #17

askoutaris commented Mar 21, 2018

sjdirect commented Mar 21, 2018

Robots.txt is not reloaded when uri scheme is changed (http/https) #17

Robots.txt is not reloaded when uri scheme is changed (http/https) #17

Comments

askoutaris commented Mar 21, 2018

sjdirect commented Mar 21, 2018