[offsite middleware] allow to dynamically add new entries to allowed domains #3257

pawelmhm · 2018-05-11T11:54:01Z

Currently offsite middleware reads allowed domains from spider attribute on spider opened and uses that to decide whether request should be followed or not.

scrapy/scrapy/spidermiddlewares/offsite.py

Line 58 in 129421c

def spider_opened(self, spider):

I have use case where I'm making some initial request and then need to decide which domains to crawl. So ideally I'd make start_requests and after that set allowed_domains.

Does it make sense to add some way to add allowed domains dynamically? E.g. I could set something like this in spider.

 self.add_allowed_domains('http://foo.com')

and after making this call spider will not follow foo.com.

Gallaecio · 2024-02-15T07:40:00Z

Not sure about adding a method to Spider, but we could add it to the middleware, which is now easier to reach with

scrapy/scrapy/crawler.py

Line 215 in 1c9d308

def get_spider_middleware(self, cls):

Gallaecio · 2024-02-15T12:34:35Z

I wonder if we could use Request.meta, e.g. allow_domain=True, and have the middleware pop that key and extend the allowed domains based on the domain of the request URL.

pawelmhm added enhancement discuss and removed enhancement labels May 11, 2018

Gallaecio added the enhancement label Aug 19, 2019

Gallaecio mentioned this issue Feb 23, 2024

Add a seed URL parameter zytedata/zyte-spider-templates#41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[offsite middleware] allow to dynamically add new entries to allowed domains #3257

[offsite middleware] allow to dynamically add new entries to allowed domains #3257

pawelmhm commented May 11, 2018 •

edited

Gallaecio commented Feb 15, 2024

Gallaecio commented Feb 15, 2024

[offsite middleware] allow to dynamically add new entries to allowed domains #3257

[offsite middleware] allow to dynamically add new entries to allowed domains #3257

Comments

pawelmhm commented May 11, 2018 • edited

Gallaecio commented Feb 15, 2024

Gallaecio commented Feb 15, 2024

pawelmhm commented May 11, 2018 •

edited