Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy to break installing the custom reactor #4401

Closed
wRAR opened this issue Mar 4, 2020 · 12 comments · Fixed by #4406
Closed

Easy to break installing the custom reactor #4401

wRAR opened this issue Mar 4, 2020 · 12 comments · Fixed by #4406
Milestone

Comments

@wRAR
Copy link
Contributor

wRAR commented Mar 4, 2020

I created a spider that imports scrapy.mail and got "wrong reactor installed" after setting TWISTED_REACTOR. This is because scrapy.mail imports twisted.internet.reactor at the top level and because scrapy crawl imports all spider modules.

So I think we should kill all top-level twisted.internet.reactor imports (it was partially done when adding the feature, so that importing CrawlerProcess doesn't install the reactor) and add documentation that the user code shouldn't do similar top-level imports.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 4, 2020

I am working on this issue.

@elacuesta
Copy link
Member

elacuesta commented Mar 4, 2020

True, this is what happened to me in scrapinghub/testspiders#10, I didn't realize there were still conflicting imports within Scrapy itself.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 5, 2020

@wRAR I have tried a lot to get "wrong reactor installed" error but cannot get it. Can you provide an example?

@wRAR
Copy link
Contributor Author

wRAR commented Mar 5, 2020

@adityaa30 set TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor' in settings.py and add import scrapy.mail in your spider.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 5, 2020

@wRAR I have found out that, even if don't do import scrapy.mail and execute scrapy crawl quotes with TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor' set in my settings.py. I still get the error:

Screenshot from 2020-03-06 04-02-38

I feel this is because twisted.internet.reactor does:

from __future__ import division, absolute_import
import sys
del sys.modules['twisted.internet.reactor']
from twisted.internet import default
default.install()

I want to know If I am going in correct direction?

@wRAR
Copy link
Contributor Author

wRAR commented Mar 5, 2020

Then it looks like some other code imports twisted.internet.reactor in your test.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 5, 2020

@wRAR Okay thanks! I think I have a good idea now what the exact issue is. I will soon send a pr fixing this issue. Also, shall I add in documentation that the user code shouldn't do similar top-level imports in the same pr? If yes, where in docs should this be added?

@wRAR
Copy link
Contributor Author

wRAR commented Mar 5, 2020

The docs should say what can happen, why, and how to prevent it. It's also not just about user code but also about 3rd part libraries that are imported in user code. I think a good place for this is in https://docs.scrapy.org/en/latest/topics/settings.html#std:setting-TWISTED_REACTOR

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 5, 2020

Okay, working on it.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 6, 2020

@wRAR I have sent a pr #4406 fixing this issue.

@nyov
Copy link
Contributor

nyov commented Mar 7, 2020

Should/could we perhaps wrap and replace the twisted.internet.reactor function with a custom lazy-loading reactor-loading function instead? I'm not sure how feasible that is, but it seems there are quite a few twisted.internet.reactor imports still necessary, and maybe not all of them can always be moved around into a method call.

@adityaa30
Copy link
Contributor

adityaa30 commented Mar 10, 2020

@nyov I think this is possible. A custom class which takes a library/package name as string with customized attribute accessing. We can override __getattr__ method to check if the target library/package is loaded. If it is not loaded, we will use importlib.import_module to load the library/package and get the attribute we need.

For example:

import importlib

class LazyLoader:
    def __init__(self, lib_name):
        self.lib_name = lib_name
        self._mod = None

    def __getattr__(self, name):
        if self._mod is None:
            self._mod = importlib.import_module(self.lib_name)
        return getattr(self._mod, name)

Now, LazyLoader can be used to lazy-load twisted.internet.reactor module by declaring at top level:

reactor = LazyLoader('twisted.internet.reactor')

Upd: @wRAR this may be a good idea as we can declare reactor at top level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants