Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hooks for wx.lib.pubsub do not work with modulepgraph #1367

Closed
htgoebel opened this issue Jul 31, 2015 · 24 comments
Closed

Hooks for wx.lib.pubsub do not work with modulepgraph #1367

htgoebel opened this issue Jul 31, 2015 · 24 comments

Comments

@htgoebel
Copy link
Member

The hook for wx.lib.pubsub.setupargs1 does not work with modulegraph, because the order, hooks are executed has changed and may even be undetermined.

The hook just sets a value in hookutils.hook_variables, which the hook for wx.lib.pubsub.core uses for extending it's __path__. But this requires hook-wx.lib.pubsub.core to be executed after hook-wx.lib.pubsub.setupargs1, which is not the case when using modulegraph.

wx.lib.pubsub.setuparg1 imports wx.lib.pubsub.core, so the cause may simply be that we are calling the hooks to early.

@leycec
Copy link
Contributor

leycec commented Aug 1, 2015

Well-fermented food for thought.

I see two simple means of fixing this (in order of decreasing simplicity):

  1. Solution A.
    1. Remove the hook-wx.lib.pubsub.setuparg1.py hook. That hook basically only exists to inform the hook-wx.lib.pubsub.core.py hook that the wx.lib.pubsub.setuparg1 module has been imported. But that is trivially testable by just refactoring the hook-wx.lib.pubsub.core.py implementation to read:
def hook(mod):
    pth = str(mod.__path__[0])
    if os.path.isdir(pth):
        protocol = 'kwargs' if mod.graph.findNode('wx.lib.pubsub.setuparg1') is None else 'arg1'
        logger.info('wx.lib.pubsub: Adding %s protocol path' % protocol)
        mod.__path__.append(os.path.normpath(os.path.join(pth, protocol)))

    return mod

Untested, of course. But that should hypothetically theoretically work. Actually, that's pretty sweet, so no Solution B.

Solution B was going to alternately suggest that hook-wx.lib.pubsub.setuparg1.py be converted into a graph hook, which would guarantee its execution before the hook-wx.lib.pubsub.core.py hook. But that's overkill (...and not the good kind), given the above.

@htgoebel
Copy link
Member Author

htgoebel commented Aug 1, 2015

@leycec The honor is your for this solution. Go a head ans issue a pull-reqest.

I'd only write

        protocol = 'kwargs'
        # Has wx.lib.pubsub.setuparg1 already been imported?
        if mod.graph.findNode('wx.lib.pubsub.setuparg1'):
            protocol = 'arg1'

instead of the one-liner as this is more readable.

So, wait, stop this shows the problem: wx.lib.pubsub.core must not be imported prior to importing wx.lib.pubsub.setuparg1. Otherwise the protocol will be "kwargs". But explicitly importing core first is "legal", it has the same effect as not importing it explicitly. (The only thing core does is setting the default for the protocol.)

Which means, we need another solution. Maybe set some variable in hook-wx.lib.pubsub.core.py an in hook-wx.lib.pubsub.setuparg1.py raise an error if it is set. No, wait, this will not work either.

@leycec
Copy link
Contributor

leycec commented Aug 2, 2015

Hmm.

I see that the definition of PyInstaller.build.Analysis.assemble() is now unrecognizable to me. If my reading of the new implementation is correct, aren't import hooks still run after modulegraph has already constructed the complete graph by analyzing all application imports?

Oh, wait. I sort-of-kind-of-maybe get what you're getting at: we need to explicitly detect whether wx.lib.pubsub.setuparg1 was imported (easy) and, if so, whether it was imported before or after wx.lib.pubsub.core (really friggin' hard). Is that right?

If so... yeah. I don't believe that modulegraph preserves the order in which modules were imported. It should be feasible to refactor modulegraph to do that, presumably by either:

  1. Adding a new order integer attribute to the low-level modulegraph.Node class. Under this approach, each node and hence module added to the graph would have its order attribute set to be one greater than that of the previous node added to the graph. Testing whether one module was imported earlier than another module would then reduce to something like graph.findNode(module_1).order < graph.findNode(module_2).order.
  2. Adding a new node_order list attribute to the high-level modulegraph.ModuleGraph class. Under this approach, each node and hence module added to the graph would also be added to the node_order list (in order). Testing whether one module was imported earlier than another module would then reduce to something like graph.node_order.index(module_1) < graph.node_order.index(module_2).

The second approach requires two linear searches through a stupendously huge list for each such comparison. The efficiency obsessive-compulsive in me promotes the first approach, instead.

Alternately, we could go with the graph hook approach. Unlike import hooks, graph hooks are guaranteed to be run immediately before modulegraph adds the corresponding module to the graph. In this case, we could (ab|mis)use that fact by replacing the existing PyInstaller/hooks/hook-wx.lib.pubsub.setuparg1.py import hook with a new PyInstaller/hooks/graph/hook-wx.lib.pubsub.setuparg1.py graph hook resembling:

from import PyInstaller.utils.hooks.hookutils import hook_variables
def hook(module_graph):
    hook_variables['is_wx.lib.pubsub.setuparg1_imported_first'] = not mod.graph.findNode('wx.lib.pubsub.core')

The existing PyInstaller/hooks/hook-wx.lib.pubsub.core.py import hook could then be refactored to test that dictionary entry. Since graph hooks already work, this would require no changes to existing functionality – which is a little nice.

It's also a little wonky. Sure, we could (ab|mis)use graph hooks in this way. And to get this working today, we probably should. In the long-term, however, we probably do want some means of determining import order. In that case, one of the prior two solutions may be a better fit.

@matysek
Copy link
Member

matysek commented Aug 2, 2015

I see that the definition of PyInstaller.build.Analysis.assemble() is now unrecognizable to me. If my reading of the new implementation is correct, aren't import hooks still run after modulegraph has already constructed the complete graph by analyzing all application imports?

@leycec It was recently changed to be more robust. See this comment in build.py:

        ### Iterate over import hooks and update ModuleGraph as needed.
        #
        # 1. Iterate in infinite 'while' loop.
        # 2. Apply all possible hooks in one 'while' iteration.
        # 3. Remove applied hooks from the cache.
        # 4. The infinite 'while' loop ends when:
        #    a. hooks cache is empty
        #    b. no new hook was applied in the 'while' iteration.
        #

Basically, it now tries to apply all import hooks while dependency graph gets updated. So the order when a hook gets applied is unpredictable.

So lets say that we have these files:

# hook-mymodule.py
hiddenimports = ['abc']
# hook-abc.py
hiddenimports = ['def']
  • Let's assume that no other module imports 'abc' or 'def' and our main script imports 'mymodule'.
  • PyInstaller first applies 'hook-mymodule' and the dependency graph then will contain 'abc'
  • Later PyInstaller tries to apply other hooks and now it finds out that graph contains 'abc' and there is hook 'hook-abc'
  • Then it applies hook-abc and the graph will contain module 'def'.

@htgoebel htgoebel added this to the PyInstaller 3.0 milestone Aug 3, 2015
@htgoebel htgoebel added the @high label Aug 3, 2015
@htgoebel
Copy link
Member Author

htgoebel commented Aug 3, 2015

I'm tagging this as high, since this may show some deeper problem in the way, hooks are handle currently.

@htgoebel htgoebel self-assigned this Aug 3, 2015
@leycec
Copy link
Contributor

leycec commented Aug 8, 2015

@matysek Thanks for the erudite break-down. I'm properly educated now – and all without $100,000 in predatory student loan debt. ✏️

Your analysis confirms deep-set suspicions: while I can't see any critical issues in the new assemble() algorithm (...in fact, it all rather seems like a good idea), I can see that the order in which import hooks are applied is fundamentally disconnected from reality. Of course, that was always the case. Since the set of import hooks is constructed after the initial graph's construction, this disconnection is unavoidable, fine, and possibly even essential to sane import hook operation.

This could suggest, however, that fixing this issue requires new modulegraph machinery preserving the order of module imports. Reiterating prior commentary, my penultimate recommendation is to:

  • Add a new order integer attribute to the low-level modulegraph.Node class.

That shouldn't be terribly arduous. (Lies. All lies!)

Let me know if you'd like me to forge ahead with that.

@matysek
Copy link
Member

matysek commented Aug 8, 2015

@leycec Regarding order attribute I would suggest raise an issue to the modulegraph and discuss it with the author. He might be more educated than me and tell you if that's feasible or not.

Regarding the erudite break-down expect my invoice the following days.

@tallforasmurf
Copy link
Contributor

back in ought-thirteen I asked M. Oussoren about adding extra info to graph
nodes. Here's the conversation.

https://bitbucket.org/ronaldoussoren/modulegraph/issues/16/extended-info-in-the-graph-and-best-way-to

It was inconclusive. I gave up, saying "Anyway I will store these outside
the graph, in dicts keyed by graphident." Which I did not do, but that
still sounds like a viable approach, given graphident is unique and
hashable.

On Sat, Aug 8, 2015 at 4:08 AM, Martin Zibricky notifications@github.com
wrote:

@leycec https://github.com/leycec Regarding order attribute I would
suggest raise an issue to the modulegraph and discuss it with the author.
He might be more educated than me and tell you if that's feasible or not.

Regarding the erudite break-down expect my invoice the following days.


Reply to this email directly or view it on GitHub
#1367 (comment)
.

@leycec
Copy link
Contributor

leycec commented Aug 9, 2015

I asked M. Oussoren about adding extra info to graph nodes.

Thanks for liaisoning with the respectable Brave Sir Ronald. Your efforts are not in vain.

I gave up, saying "Anyway I will store these outside the graph, in dicts keyed by graphident." Which I did not do, but that still sounds like a viable approach, given graphident is unique and hashable.

Conceivably. But probably not.

Although node identifiers are uniquely hashable (...they sort of have to be, don't they?), how exactly would you externally hook into modulegraph to initialize your hypothetical dict? And there's the rub: modulegraph provides no existing facilities for hooking into module imports. Which is why we had to define our own facilities to resolve six-style imports. (Thank you, graph hooks, for you are special and good.)

As currently defined, graph hooks are module-specific and hence not generalizable to hooking every possible module. We can, however, extend the same idea by either:

  1. Refactoring the modulegraph module by adding a new order integer attribute to the low-level modulegraph.Node class initialized by overriding the equally low-level modulegraph.ModuleGraph.createNode() method, already. (See below.)
  2. Refactoring the depend.analysis module by monkey-patching a new order integer attribute into the low-level modulegraph.Node class initialized by the high-level PyiModuleGraph._import_module() method, much as we currently do for graph hooks.

In either case, there's little gain to externalizing this ordering into a PyInstaller-specific dict or other data structure. modulegraph exposes caller-modifiable nodes for a reason. This is one.

I know modulegraph. (I wish I didn't.) In my brash opinion, the first of these two approaches is the simplest and most obvious. No, really. It's trivial. The altgraph.ObjectGraph class already defines a createNode() method, which the modulegraph.ModuleGraph class thankfully calls on every node creation. Given the existing machinery, all we do is override the createNode() method in the modulegraph.ModuleGraph class to increment the order attribute of the newly created node.

Done. Issue resolved. The glib credits roll, everyone goes to bed with a glazed lolipop and a loopy grin, and Santa returns to the North Pole. Honestly, why didn't Ronald just suggest that straightaway rather than obfuscating-ly hemming and hawing around? 😱

I'm getting a "too many chefs (and not enough actual cookin') in the kitchen" vibe. Since talk is discount cheap, I'm just going to go ahead and actually do this. Here's to you, fellow chefs!

Iron Chef

leycec added a commit to leycec/pyinstaller that referenced this issue Aug 9, 2015
The order in which modules are imported by the user application is now
inspectable via the new public `lib.modulegraph.modulegraph.Node.order`
attribute. This is low-level machinery for fixing pyinstaller#1367 and similar issues.
@leycec
Copy link
Contributor

leycec commented Aug 9, 2015

Done. I now eat my zesty yellow pepper in the peace and comfort of a job well-done. (Well, done, anyway.)

leycec added a commit to leycec/pyinstaller that referenced this issue Aug 16, 2015
The order in which modules are imported by the user application is now
inspectable via the new public `lib.modulegraph.modulegraph.Node.order`
attribute. This is low-level machinery for fixing pyinstaller#1367 and similar issues.
htgoebel referenced this issue Sep 10, 2015
set() changes the order, use OrderedDict instead.
@htgoebel
Copy link
Member Author

Some summary of the insights found in #1393 (comment) and following:

  • The "old" implementation of the wx.lib.pubsub* hooks is wrong, because the default pubsub is not kwargs but v1 (or maybe even no default at all, depending on the version of wx)

  • The order in which the old imptracker is processing imports is unpredictable (or in the best case: alphabetically, which is not better), due to 465a541 dating back to 2012. In contrast, modulegraph is processing imports in strict order they appear in the code.

    This is why in test_wx_pubsub_arg1.py the wx.lib.pubsub.pub is processed prior to wx.lib.pubsub.setuparg1, even if the test's source says different.

    Note: I just pushed a fix for this (f14e895) do ease debugging this ticket.

Order of imports and hooks
  • modulegraph is processing imports in strict order, even if the import is done in a function. In detail each import is completely resolved when it occurs in the code. E.g. import wx.lib.pubsub.setuparg1 leads to load wx and recursively process all contained imports,, then wx.lib, then wx.lib.pubsub and in the end wx.lib.pubsub.setuparg1.

    This cause of this very issue is (as leycec describes in detail in Module Import Ordering Preserved #1393 (comment)): wx.lib.pubsub - which is imported and processed prior to wx.lib.pubsub.setuparg1 - leads to wx.lib.pubsub.core being imported and processed. Thus the hook for wx.lib.pubsub.core is processed prior to the hook for wx.lib.pubsub.setuparg1.

  • In contrast, the old imptracker only loads the modules and processes the hooks in the requested order (wx, wx.lib, wx.lib.pubsub, wx.lib.pubsub.setuparg1 and wx.lib.pubsub.pub) but the imports are processed afterwards.

    As you can see here, the module is loaded and hooks are processed (in analyze_one()) in an inner loop. The imported names are entered into the list of modules to be processed by the outer loop. As you can see when debugging [n[0] for n in nms[:] at line 104 the order is correct on this outer loop. But the wx.lib.pubsub.setuparg1 has already been processed (loaded and hooks) in the inner loop. You may also add print-statements after line 270 and 116 (already available at htgoebel@1782243).

conditional imports and imports in functions
  • modulegraph takes conditions, functions and try/except into account only when scanning the source (AST), see process_import in contrast to _scan_bytecode. But anyway, this DependencyInfo is undocumented and it's use unclear.

    Nevertheless, this different behaviour can be ignored for now, as we have the source for the files and thus the AST is analysed.

    • The old imtracker takes conditions and functions into account, but uses this only for printing messages (see here and here).
Conclusion

The old imptracker's behaviour is plain wrong, module graph does a far better job.

Idea for a solution

As I need to go back to may paid occupation, I don't have time for testing this. So maybe I'm totally wrong again.This is based on wx 2.8.12, which I have installed.

  • wx.lib.pubsub defaults to setupv1, if and only if module wx.lib.pubsub.autosetuppubsubv1 can be imported.for setupv1. So we could implement a in-safe-import-hookh hook mimicking this behaviour. If autosetuppubsubv1 can not be imported, remove setupv1 from the import-tree.

  • For setuparg1 and setupkwargs implement a post-create-package hook retargeting the packages as required and fixing pubsub.core.__path__. May be we need to retarget all submodules of core which also exist in pubsub/core/arg1/' resp.pubsub/core/kwargs/` and have already been imported.

  • For setupv1, setupv2 we do the same except retargeting to pubsub/pubsub1/ resp. pubsub/pubsub2/ and some other path needs to be manipulated.

  • Test-cases:

    from wx.lib.pubsub import setupv1
    from wx.lib.pubsub import pub
    assert pub.PUBSUB_VERSION == 1
    

leycec added a commit to leycec/pyinstaller that referenced this issue Sep 13, 2015
The order in which modules are imported by the user application is now
inspectable via the new public `lib.modulegraph.modulegraph.Node.order`
attribute. This is low-level machinery for fixing pyinstaller#1367 and similar issues.
leycec added a commit to leycec/pyinstaller that referenced this issue Sep 24, 2015
The order in which modules are imported by the user application is now
inspectable via the new public `lib.modulegraph.modulegraph.Node.order`
attribute. This is low-level machinery for fixing pyinstaller#1367 and similar issues.
leycec added a commit to leycec/pyinstaller that referenced this issue Sep 25, 2015
The order in which modules are imported by the user application is now
inspectable via the new public `lib.modulegraph.modulegraph.Node.order`
attribute. This is low-level machinery for fixing pyinstaller#1367 and similar issues.
@codewarrior0
Copy link
Contributor

Here's one more detail to agonize over.

wxPython 2.8 decides whether to use the legacy v1 API (its default) by checking for the existence of the wx.lib.pubsub.autosetuppubsubv1 module. To do this, it calls imp.find_module. imp.find_module does not honor sys.meta_path and thus does not call our FrozenImporter to ask it for a loader. It always returns None, causing wxPython 2.8 to select the newer version of the API when frozen, which is not what it does when unfrozen.

@codewarrior0
Copy link
Contributor

I'm thinking we'll need a runtime hook that imports wx only to check its version. If it is <2.9, it will ask FrozenImporter (via pkgutil?) if wx.lib.pubsub.autosetuppubsubv1 exists, and if so, simply import wx.lib.pubsub.setupv1 to emulate the auto-setup and use the same API version as it does when unfrozen.

In any case, with the path-aware FrozenImporter in place, the wx_lib_pubsub family of tests now passes on Python 2.7 with versions 2.9.5 and 3.0.2 of wxPython.

@htgoebel
Copy link
Member Author

I don't see any need for a runtime-hook. IMHO all the detection can be done at freeze-time and using the new redirect-hooks resp. __path__-extension-hooks.

@codewarrior0
Copy link
Contributor

Let me explain again. wxPython 2.8 contains this code:

    try: 
        # if autosetupv1 is importable, then it means we should
        # automatically setup for version 1 API
        import imp
        imp.find_module('autosetuppubsubv1', __path__)

    except ImportError:
        pass

The call to imp.find_module does not go through our FrozenImporter. It only uses the normal module import system. Because of this, wxPython 2.8 can never find that module if it is frozen. As a result, when frozen, it does not use the version 1 API. When not frozen, it does use the version 1 API because it can find the module normally.

@codewarrior0
Copy link
Contributor

A few other solutions, off the top of my head:

  1. Wrap the imp module with a module that searches FrozenImporter (or generally, sys.meta_path) for modules. Going down this route will probably look a lot like porting Python 3's importlib to Python 2.
  2. Place the file wx/lib/pubsub/autosetuppubsubv1.py in the COLLECT folder (or onefile temp folder) as an ordinary file, instead of packing it into the PYZ. This way, imp.find_module will find it using the normal import machinery.
  3. Document it as a known issue with wxPython 2.8.x and suggest the workaround:
    • Import wx.lib.pubsub.setupv1 prior to importing wx.lib.pubsub.pub if you are not otherwise using the newer pubsub API.

@htgoebel
Copy link
Member Author

Yes, but autosetuppubsubv1.py does nothing, it's just a marker which makes wxPython 2.8 import setupv1. So in hook-wx.lib.pubsub, we simply need to detect autosetuppubsubv1 ourself and - if it's available - add a hidden import for setupv1.

The only think we have to take care of is: setupv1 and the other setup-modules need to redirect the correct modules.

@codewarrior0
Copy link
Contributor

Yes, it is just a marker, but that's not the issue here. The issue is that the marker is never detected whenever it is in a PYZ archive - and it is not detected because imp.find_module does not go through FrozenImporter. The marker is in fact inserted into the PYZ archive by the latest version of the wx.lib.pubsub hook - nothing special to do there.

The problem is that imp.find_module simply does not look in the PYZ archive for the marker.

@htgoebel
Copy link
Member Author

But my point is, that there is no need to detect it at run-time. We can detect it at freeze-time already.

And - to be frank - I'm strictly against implementing a complicated hack for imp when a simple workaround is available: The user could simply import setupv1 explicitly - like it is done in current wx version.

@codewarrior0
Copy link
Contributor

All right, the third option sounds good, then. I'll update the wiki with a note about wxPython 2.8.x.

The new hook works with 2.9.x and 3.0.x, according to the wx_lib_pubsub tests. It just brings in all of the submodules and relies on the __path__ support to sort everything out.

The reason the new wx.lib.pubsub hook works for wxPython 2.9.x but not 2.8.x is because 2.9.x doesn't use imp.find_module - all it does is import autosetuppubsubv1, which does go through FrozenImporter and finds the module correctly. (In 2.9.x, the module isn't just a marker, it raises ImportError to pretend it doesn't exist, and thus setupv1 is not automatically imported. Like you say, it has to be explicitly imported in the newer versions of wx.)

@codewarrior0
Copy link
Contributor

Added wiki page Recipe-wxPython-2.8.x (superseded by new hook)

@htgoebel
Copy link
Member Author

@codewarrior0 Please remove @xfail from the tests, though.

@codewarrior0
Copy link
Contributor

Sure.

@codewarrior0
Copy link
Contributor

While removing the @xfail marker, I tried out the workaround I wrote in the Recipe-wxPython-2.8.x wiki page, by adding it to the "default protocol" test for 2.8.x. It didn't work! The reason turned out to be that collect_submodules can't find the files in the pubsub1 or pubsub2 folders because they are not packages and don't have an __init__.py... so those files never get included.

To resolve this, I included them as data files instead, and included the notorious autosetuppubsubv1 as a data file too, implementing solution-off-the-top-of-my-head number 2. Now the hook works even with 2.8.x, so I removed the wiki page.

Relevant commit: 5676716

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants