Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[offsite middleware] allow to dynamically add new entries to allowed domains #3257

Open
pawelmhm opened this issue May 11, 2018 · 2 comments

Comments

@pawelmhm
Copy link
Contributor

pawelmhm commented May 11, 2018

Currently offsite middleware reads allowed domains from spider attribute on spider opened and uses that to decide whether request should be followed or not.

def spider_opened(self, spider):

I have use case where I'm making some initial request and then need to decide which domains to crawl. So ideally I'd make start_requests and after that set allowed_domains.

Does it make sense to add some way to add allowed domains dynamically? E.g. I could set something like this in spider.

 self.add_allowed_domains('http://foo.com')

and after making this call spider will not follow foo.com.

@Gallaecio
Copy link
Member

Not sure about adding a method to Spider, but we could add it to the middleware, which is now easier to reach with

def get_spider_middleware(self, cls):

@Gallaecio
Copy link
Member

I wonder if we could use Request.meta, e.g. allow_domain=True, and have the middleware pop that key and extend the allowed domains based on the domain of the request URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants