You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently offsite middleware reads allowed domains from spider attribute on spider opened and uses that to decide whether request should be followed or not.
I have use case where I'm making some initial request and then need to decide which domains to crawl. So ideally I'd make start_requests and after that set allowed_domains.
Does it make sense to add some way to add allowed domains dynamically? E.g. I could set something like this in spider.
self.add_allowed_domains('http://foo.com')
and after making this call spider will not follow foo.com.
The text was updated successfully, but these errors were encountered:
I wonder if we could use Request.meta, e.g. allow_domain=True, and have the middleware pop that key and extend the allowed domains based on the domain of the request URL.
Currently offsite middleware reads allowed domains from spider attribute on spider opened and uses that to decide whether request should be followed or not.
scrapy/scrapy/spidermiddlewares/offsite.py
Line 58 in 129421c
I have use case where I'm making some initial request and then need to decide which domains to crawl. So ideally I'd make start_requests and after that set allowed_domains.
Does it make sense to add some way to add allowed domains dynamically? E.g. I could set something like this in spider.
and after making this call spider will not follow foo.com.
The text was updated successfully, but these errors were encountered: