Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nofollow doesnt work correcly when there multiple values in rel attribute #1201

aldarund opened this issue May 1, 2015 · 0 comments


Copy link

@aldarund aldarund commented May 1, 2015

According to specs rel can have multiple values:

But scrapy ( LxmlParserLinkExtractor and SgmlLinkExtractor(but this one doesnt matter i guess since its deprecated)) just check if it strictly only follow.

link = Link(url, _collect_string_content(el) or u'',
                nofollow=True if el.get('rel') == 'nofollow' else False)

So the cases when links looks like this will not work correctly:

 <a href='' rel='external nofollow'>bla bla</a>

And its not from a vacuum, its from real world sites where i encountered that scrapy follows nofollow link. For example at this site:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

1 participant
You can’t perform that action at this time.