nofollow doesnt work correcly when there multiple values in rel attribute #1201
According to specs rel can have multiple values: http://www.w3.org/TR/html401/struct/links.html#adef-rel
But scrapy ( LxmlParserLinkExtractor and SgmlLinkExtractor(but this one doesnt matter i guess since its deprecated)) just check if it strictly only follow.
link = Link(url, _collect_string_content(el) or u'', nofollow=True if el.get('rel') == 'nofollow' else False)
So the cases when links looks like this will not work correctly:
<a href='http://blablabla.com/' rel='external nofollow'>bla bla</a>
And its not from a vacuum, its from real world sites where i encountered that scrapy follows nofollow link. For example at this site: www.bruceclay.com/blog/secondary-keywords/
The text was updated successfully, but these errors were encountered: