New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
添加robots爬虫文件 #42
添加robots爬虫文件 #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
页面上有这么一个元素,不知道是否有影响
|
页面里有这个会影响到爬虫 |
嗯,那这个也要调整。或者直接调整了这个 meta 是不是就好了。
… On May 16, 2019, at 13:59, yJunS ***@***.***> wrote:
页面里有这个会影响到爬虫
NOINDEX命令:告诉搜索引擎不允许抓取这个页面
NOFOLLOW命令:告诉搜索引擎不允许从此页找到链接、拒绝其继续访问。
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#42?email_source=notifications&email_token=AALCFPPSXOB3BB24K7TIIA3PVTZ2LA5CNFSM4HNEV7GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVQX3UY#issuecomment-492928467>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALCFPOK2NTUJETKAEX3LJTPVTZ2LANCNFSM4HNEV7GA>.
|
不是,根目录的robots.txt是针对全站的。页面里的这个是单指此页面的,爬虫一般都会先检查根目录的robots.txt,才会抓取页面信息 |
了解了,我把 meta 也修改了。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
No description provided.