Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add basic extension loader. #4665

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

toofar
Copy link
Member

@toofar toofar commented Mar 21, 2019

I wanted to see if I could move all my config.py snippets into extensions. I could, because I am not doing any configurating in config.py, just registering commands. This works (for me) but it could use some more attention, particularly around error handling. I don't know how to best handle probably non-fatal errors at init time and I would like to be able to proceed if we run across them. I opened this PR to get some discussion on what we want out of this.

This implementation is similar to _walk_normal() but since the extensions aren't in a properly importable package we have to actually use the finder returned by walk_packages() to load the source files. That means the module is actually imported at that point and when _load_component() uses importlib it finds the module already loaded and everything works.

I wanted to keep this PR small but there are other features that could be added to it I bring up below.

Error handling

I assume if we fail to load a third party extension we should try to ignore it and continue on. That currently isn't the case, even the _on_walk_error() method that I am re-using raises an error which is never handled. I am not sure how to best proceed though, just log an error?

  • when os.mkdir() fails, because we have seen that before, ✓
  • and when importing the extensions, because I assume we don't want third party extensions to crash the browser when their dependancies aren't met or something, ✓
  • and when running the module init hooks for the same reason. ✓
  • and when running the interceptors? There is no log set up there yet... and it is handled by utils.prevent_exceptions

Experimental feature warning thing ✓

Shall I just log an all caps thing? Or I could make it only be active if the --debug flag is passed. It'll be easily missed if you have --debug on though. I supposed I could also just not autocreate the extensions folder.

Future work

Support loading folders ✓

If I remove the if ispkg: continue check the finder line actually succeeds in loading the directory (if it has a __init__.py) but after we yield it drops back into the walk_packages implemention after its own yield where it calls __import__(name) which fails because it presumably tries to load the parent package which isn't real. It even fails with the current implementation if there is a package folder in the extensions folder ever though we aren't importing it.
It may be as simple as ignoring errors for packages, we could check in sys.modules if the package itself got imported right.

It was failing with "No module named 'extensions'" because I had set the import prefix to "extensions.". Once I changed it to something properly namespaced under a real module (qutebrowser.extensions.third_party.) it just works.

Having a non-existant module (like third_party) in the (fake) module path means you cannot import the extension modules (which I don't see a reason to support). You can still get them from sys.modules though (and importlib). Even making that a package in the source tree doesn't let you import them (it seems to look for child modules as attributes of their parents rather than just looking for the whole heirarchy in sys.modules). I think I'll just leave it out and not support importing third party modules. They are in _module_infos anyway. Actually you can still do from qutebrowser.extensions.third_party import whatever, cool.

Unloading/Re-loading modules

The config-source command has been useful for developing external things so far. Both when developing and when adding things developed in a separate basedir into my main browser instance when I have got something stable without having to restart the whole instance. Currently loaded modules are stored in the _module_infos global list. Since they are modules though they do have unique names so we could support reloading (we might have to delete them out of sys.modules too though). Reloading modules does have the obvious possible problem of the outgoing copy of those modules having done things like register listeners. Although the config change filter in the extension API does avoid this so if continue with that pattern it would be less of a problem. We could also add a hook.unload() decorator where the extension could clean up.
The alternative is to just load new extensions although I would prefer to be able to reload them if it doesn't turn out to be too fraught.

Put behind a config item (added 2024-02-06)

And surface that this is turned on in :version and :report.

Consider how we could make versioning a part of this (added 2024-02-06)

Ideally extensions would declare a version they were compatible with and if they were running against a too new qutebrowser version they would complain/fail/require extra work to enable.


This change is Reviewable

@jgkamat
Copy link
Member

jgkamat commented Mar 21, 2019 via email

if ispkg:
continue
if name not in sys.modules:
_module = finder.find_module(name).load_module(name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line do anything? If so, maybe document it, because it does not look like it does anything...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it imports the module and saves it into sys.modules as a side effect. I will add a comment and remove the left hand side.

@toofar
Copy link
Member Author

toofar commented Mar 21, 2019

I'm guessing right now, it only loads single files

Yep. I think we should get bigger ones working too but I would rather do that as a separate PR. For jmatrix for now I put the import time init stuff into an init() method and added an extension file that just imported the jmatrix qutebrowser integration and had a @hook.init() decorated method that called jmatrix's one (because hook.init doesn't work outside of the top level extension modules).

couldn't we use importlib.reload

Ah, I was looking for that. It looks like it throws an error if the module's parent isn't in sys.modules, I may have to actually make the extensions folder look like a real package. I feel like that will make it more fragile with respect to user changes but if it gets us nice things it might be worth it.

how does it handle alternate top level modules in packages? If two packages define a 'tests' module, what will happen?

It uses fully specified dotted names. Currently prefixed by the prefix we pass. So extensions.foo.tests and extensions.bar.tests. If it actually imported whole directory structures, which it doesn't due to extensions not being a real package.

@The-Compiler
Copy link
Member

FWIW reloading Python modules is a can of worms - that's definitely something I'd rather defer to some later stage of extension support.

@toofar
Copy link
Member Author

toofar commented Mar 22, 2019

Oh also, @The-Compiler, on #30 it says plugins should be able to

Register new commands (with capital letter, like vim)

Is the leading captial letter something you still want to mandate?

@The-Compiler
Copy link
Member

Is the leading captial letter something you still want to mandate?

I'm not sure. As long as extensions aren't official I don't have to decide yet, though 😉

@toofar toofar force-pushed the feat/basic_extension_loader branch 3 times, most recently from dcfd829 to c711025 Compare May 13, 2019 07:53
@toofar
Copy link
Member Author

toofar commented May 13, 2019

That's all the stuff I wanted to get done for this pass (mainly support loading directories). I added some exception handling to _load_component which will hide (but log) errors in the init hooks of internal components as well. The other hook in the API, for interceptors, is already covered by utils.prevent_exceptions(), I didn't want to use that all over the place to separate out errors in third party code and in our third party code loading code. Maybe we could make it a context manager too.

Copy link
Member

@The-Compiler The-Compiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good overall, just some small suggestions.

qutebrowser/extensions/loader.py Show resolved Hide resolved
pass
return

if any(os.listdir(ext_dir)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any() takes a list of values and checks whether any of those is boolean-True. It happens to work here because any([]) == False and any(['foo', 'bar']) == any([True, True]) == True, but just

Suggested change
if any(os.listdir(ext_dir)):
if os.listdir(ext_dir):

(relying on the list to be either empty or not) is far less confusing 😉

if name not in sys.modules:
try:
# Import the module with the finder so that it is in
# sys.modules readif for _load_component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# sys.modules readif for _load_component.
# sys.modules ready for _load_component.

I think?

finder.find_module(name).load_module(name)
except Exception:
log.extensions.exception(
"Failed importing extension: {}".format(name[len(prefix):])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Failed importing extension: {}".format(name[len(prefix):])
"Exception while importing extension: {}".format(name[len(prefix):])

(or "Failed to import")

mod_info.init_hook(_get_init_context())
except Exception:
log.extensions.exception(
"Failed initializing extension: {}".format(mod.__file__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Failed initializing extension: {}".format(mod.__file__)
"Exception while initializing extension: {}".format(mod.__file__)

@@ -180,7 +227,15 @@ def _on_config_changed(changed_name: str) -> None:
cfilter = config.change_filter(option)
cfilter.validate()
if cfilter.check_match(changed_name):
hook()
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't catch exceptions in the if option is None: part above - I guess the whole block should be in the try: instead?

log.extensions.exception(
"Failed running config change hook for "
"item {} in extension: {}".format(
changed_name, hook.__code__.co_filename,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit hacky for getting the name - why not add that information to ModuleInfo or so?

@toofar toofar force-pushed the feat/basic_extension_loader branch 5 times, most recently from df40f33 to 2c6b610 Compare June 5, 2019 07:08
@The-Compiler The-Compiler added this to Low hanging fruit? in Pull request backlog Apr 21, 2020
@toofar
Copy link
Member Author

toofar commented Apr 27, 2020

I made this extension to reload extensions to make hacking easier. Previously I was developing them in config.py because that supports reloading and then moving them to extensions. I also made one to reload autoconfig because it makes sharing config between basedirs easier. I tested what would happen if a naughty extension connects to a PyQt signal by connecting to config.instance.changed and printing the curried module ID and they just keep getting printed, even after a gc run, how suspicious, ah, that is just if the connectee isn't a bound method it seems. If they are on an object they stopped getting called.

from qutebrowser.api import cmdutils

@cmdutils.register()
def reload_extensions():
    """Attempt to reload any third party extensions."""
    mod_prefix = "qutebrowser.extensions.third_party"

    # registered commands, including their args
    from qutebrowser.misc import objects
    todel = [
        cmd.name for cmd in objects.commands.values()
        if cmd.handler.__module__.startswith(mod_prefix)
    ]
    for name in todel:
        del objects.commands[name]

    # interceptor functions, could probably del by value too
    from qutebrowser.extensions.interceptors import _interceptors
    todel = [
        idx for idx, func in enumerate(_interceptors)
        if func.__module__.startswith(mod_prefix)
    ]
    for idx in reversed(todel):
        del _interceptors[idx]

    # extension infos with hooks
    from qutebrowser.extensions.loader import _module_infos, load_extensions
    todel = [
        idx for idx, modinfo in enumerate(_module_infos)
        if modinfo.module.__name__.startswith(mod_prefix)
    ]
    for idx in reversed(todel):
        del _module_infos[idx]

    # remove from sys.modules to make sure walk_extensions reloads them
    # I didn't test importlib.reload(), it might not work because these aren't real modules.
    import sys
    todel = [name for name in sys.modules.keys() if name.startswith(mod_prefix)]
    for name in todel:
        del sys.modules[name]

    load_extensions()

@toofar toofar force-pushed the feat/basic_extension_loader branch from 2c6b610 to 3a5c4f4 Compare August 17, 2020 07:12
@toofar toofar force-pushed the feat/basic_extension_loader branch 3 times, most recently from de76937 to b9ab92f Compare April 2, 2021 01:24
@The-Compiler The-Compiler moved this from Low hanging fruit? to Inbox in Pull request backlog Apr 8, 2021
spec = finder.find_spec(name, None)
if not spec or not spec.loader:
raise ImportError(f"pkgutil couldn't find loader for {name}")
spec.loader.load_module(name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:

Deprecated since version 3.4: The recommended API for loading a module is exec_module() (and create_module()). Loaders should implement it instead of load_module(). The import machinery takes care of all the other responsibilities of load_module() when exec_module() is implemented.

Copy link
Member Author

@toofar toofar Sep 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit that should work with python 3.12. Although I'm not convinced it's the right way to do dynamic imports like this. Mainly because I have to add stuff to sys.modules myself which feels wrong. Although there are other examples on the internet doing the same thing. See the commit message for more context and ideas.

I forgot to put it in the commit message but this was the stack trace before I was adding it to sys.modules myself:

  File "/home/user/projects/qutebrowser/qutebrowser/app.py", line 141, in init
    loader.load_extensions()
  File "/home/user/projects/qutebrowser/qutebrowser/extensions/loader.py", line 88, in load_extensions
    _load_component(info, skip_hooks=skip_hooks)
  File "/home/user/projects/qutebrowser/qutebrowser/extensions/loader.py", line 198, in _load_component
    info.spec.loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/tmp/qute-basdir-extensions/data/extensions/jmatrix.py", line 6, in <module>
    @hook.init()
     ^^^^^^^^^^^
  File "/home/user/projects/qutebrowser/qutebrowser/api/hook.py", line 37, in __call__
    info = _add_module_info(func)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/qutebrowser/qutebrowser/api/hook.py", line 18, in _add_module_info
    module = importlib.import_module(func.__module__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'qutebrowser.extensions.third_party'

This is similar to `_walk_normal()` but since the extensions aren't in a
properly impartable package we have to actually use the finder returned
by `walk_packages()` to load the source files. That means the module is
actually imported at that point and when `_load_component()` uses
importlib it finds the module alreay loaded and everything works.

TODO: error handling

* when `os.mkdir()` fails, because we have seen that before,
* and when importing the extensions, because I assume we don't want
  third party extensions to crash the browser when there dependancies
  aren't met or something,
* and when running the module init hooks for the same reason.

TODO: support loading folders

If I remove the `if ispkg: continue` check the finder line actually
succeeds in loading the directory (if it has a `__init__.py`) but after
we yield it drops back into the `walk_packages` implemention after its
yield where it calls `__import__(name)` which fails because it
presumably tries to load the parent package which isn't real.
It may be as simple as ignoring errors for packages.
toofar and others added 9 commits September 30, 2023 17:44
If you have a folder in your extensions folder with a `__init__.py` file
in it it will now be loaded as an extension. So will the modules under
it.

Extensions can't currently be imported from
`qutebrowser.extensions.third_party` but if we added a `third_party`
folder in qutebrowser/extentions (even an empty one) we would be able to
import them using the "from ... import <extension>" syntax.

The issue was that when pkgutil whent to import the package it was
failing because the top level module ("extensions") wasn't real. Now it
is under a real package.
Handle errors importing extensions and running their initialization and
config changed hooks by logging the error (via a stack trace) and
continuing.

I put the name/filename of the extensions in the log messages but now I
realise the filename is going to be in the stack trace. If lint
complains I will just take them out.
The case where they wanted to be called for any config item wasn't
covered by the broad except.

I don't want the config change filter validation to be in the broad
except but I am going to move that out in a future PR anyway. It is
curretly a memory leak (when change filters are instantiated they are
put into a module scope list, here we instantiate them whenever a config
item is set) and we can fail early on that.
Because the user might care.
I wanted to log the filename for some reason. Probably because the
extension name would have the `qutebrowser.third_party` prefix. Otherwise
I could just pass the extension name to `add_module_info()`. I could go
either way really.
The vulture and docs scripts both call load_components befer
standadir._init_data() is called, so they were getting a KeyError when
the extension loader went to find the extension directory. There
shouldn't be any extensions installed in those cases anyway, and if
there were they shouldn't be included in the docs.

Also the docstring on load_components was lying.
FileFinder.find_module(name) doesn't comply with whatever stubs mypy is
looking at. The source mentions it is deprecated this one.

Still had to add a couple of unnecessary things to make mypy happy, the
`None` argument to find_spec() is defined in the source as `path=None`
but mypy thinks it is neither optional nor usable as a keyword argument.

The (fullname) argument to load_module() appears to be unnecesary in
practice but mypyp insists.
Importlib says this about load_module:

> Deprecated since version 3.4: The recommended API for loading a module
> is exec_module() (and create_module()). Loaders should implement it
> instead of load_module(). The import machinery takes care of all the
> other responsibilities of load_module() when exec_module() is
> implemented.

I'm not 100% sure how to implement the new way of doing things with this
setup, I've got something working now but I'm not sure it is the best
way of doing it.

To recap, what I'm trying to do here is load extensions from the
extension folder (eg `$XDG_DATA_DIR/qutebrowser/extensions/`) under a
namespace we control (eg `qutebrowser.extensions.third_party`). I don't
remember why I set it up like that but there is probably a good reason.

What I was doing previously was creating a module and not loading it (I
think?) in walk_extensions, by extensions putting it in sys.modules,
then later load_component would call the normal import mechanism that
would find it already loaded.

Now I can't find a way to create a module without importing it. Eg
`spec.loader.create_module(spec)` just returns None, which the docs say
means "use the standard import mechanism". Not a very helpful
replacement for `load_module()`!

But the loader returned by the finder returned by walk_packages does now
how to import the module. So for now I'm saving the whole spec for
`walk_component()` so it can re-use that loader. And then after creating
the module I have to add it to sys.modules manually and then exec it.
Having to manipulate sys.modules manually is problematic and the biggest
sign that I'm doing the wrong thing here.

I suspect that the proper fix is to register a finder in `sys.meta_path`
that'll handler that fake "third_part" package so that you can just to
`import qute.third_party.my_extension` and it'll map the module name to
a path and create a module from a file somehow. The finder I see here is
`FileFinder('/real/path/to/extensions/')` so that might be a good
starting spot. Links for how to create Finders:

https://docs.python.org/3/library/importlib.html#setting-up-an-importer
https://stackoverflow.com/questions/57126085/how-to-use-sys-meta-path-with-python-3-x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants