Skip to content

RobotsTxtMiddleware doesn't support wildcards in Disallow rules #754

@mattfullerton

Description

@mattfullerton

This is because, as I understand, Python's robotparser module doesn't support them either. There is an alternative, drop-in module, Robotexclusionrulesparser:

http://nikitathespider.com/python/rerp/

This line would need to be changed:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/downloadermiddleware/robotstxt.py#L7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions