Feature: Ability to filter extracted links by tag's text value #3622
This is a very simple feature that I happen to have to develop in my Scrapy-based project because I found no built-in acceptable way to do it. Now I am offering to push this little piece of evolution into the Scrapy codebase:
In FilteringLinkExtractor, you can filter links whose url (href attribute's value) match a given regex, which is really helpful. However, it's not always sufficient. For instance, I once wanted to crawl a website where all urls look the same (some random uuid) but I only wanted to follow some : the ones with some special keyword in the text value of the tag. Like this:
So what do you think ? Would it be a positive addition to the features of the link extractor ? Or did I miss an already existing way to do what I wanted in the first place ?
The text was updated successfully, but these errors were encountered: