Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module Import Ordering Preserved #1393

Merged
merged 11 commits into from Sep 27, 2015
Merged

Module Import Ordering Preserved #1393

merged 11 commits into from Sep 27, 2015

Conversation

leycec
Copy link
Contributor

@leycec leycec commented Aug 9, 2015

This pull request preserves the order in which modules are imported by the user application. Because that is both needful and good.

Specifically, a new public lib.modulegraph.modulegraph.Node.order attribute – guaranteed to monotonically increase on every module importation and hence provide a reliable means of comparing importation order – is added. (Tested and working, you may be sure.)

This low-level machinery partially fixes #1367. Hooks may now test the order in which two hypothetical modules foo and bar were initially imported as follows:

from PyInstaller.utils.hooks.hookutils import logger

def hook(mod):
    if mod.graph.findNode('foo').order < mod.graph.findNode('bar').order:
        logger.debug('Module "foo" imported before module "bar". Why? I dunno.')
    else:
        logger.debug('Module "bar" imported before module "foo". Suck it, you.')

    return mod

What About #1367? Why You No Fix, Huh?

Would someone (...hey, no names: @htgoebel) mind posting a minimal-length example (MLE) demonstrating the exact issue of #1367?

While I do have the Python 2 version of wxPython locally installed, I'm unsure how to properly replicate this issue. I know; it's probably trivial. I'm feeble-brained, but hard-working. As soon as I get an MLE, I'll append a commit to this pull request fixing #1367 in entirety.

What Else? There's Always Something Else

wxPython's Python 2 version (referred to as "Classic") and Python 3 version (referred to as "Phoenix") appear to be very dissimilar – to the point of actually being different projects that confusingly share the same name. Our import hooks should probably account for these differences in ways I cannot possibly fathom.

Thus spaketh Walt.

@matysek
Copy link
Member

matysek commented Aug 10, 2015

  • Could you add some notes about your modification into file PyInstaller/lib/README.rst?

@htgoebel
Copy link
Member

Would someone (...hey, no names: @htgoebel) mind posting a minimal-length example (MLE) demonstrating the exact issue of #1367?

Are you okay with this one?

cd tests/old-suite
env PYINSTALLER_RUNTESTS_WORKDIR= ./runtests.py  libraries/test_wx_pubsub_*.py

@htgoebel
Copy link
Member

@leycec You may again put me on some pedestal or call me names, but have you ever though about making modulegraph call our import-hooks just when it imports the module? There already is some method you may overwrite: modulegraph.ModuleGraph.import_hook().

Actually this method seams to import the submodules, too, so you have to test this in detail. In the best case, you can call our import-hooks prior to calling this method. In the worst case you need to re-implement this method and call out hooks in the midst of the existing code.

If everything fails, and we need to ad some order attribute, PyiModuleGraph.import_hook() could be the place to implement this without any need to change modulegraph itself.

@leycec
Copy link
Contributor Author

leycec commented Aug 11, 2015

Are you okay with this one?

O.K. There's an existing unit test. You're awesome. I'll get on that rickety horse right away.

@leycec You may again put me on some pedestal or call me names...

Too late. See "You're awesome" above.

...but have you ever though about making modulegraph call our import-hooks just when it imports the module?

I've definitely thought about that. I assumed this was not originally done for reasons that escape me. I may have assumed incorrectly. Your wise and seemly suggestion may be the perfect long-term solution. For the lazy moment, however, I have two concerns:

  • matysek and grubby friends have been busy code bees. Import hooks and the build process in general have been heavily refactored. Which is great! It's also a bit intimidating. I'd rather not touch import hooks in a general way until the radioactive dust settles. 🐝
  • Even when the dust settles, the import_hook() method doesn't help us with this particular issue. The reason why is that there exist nodes that are added to the graph that are not imported via import_hook() but which we probably need to know the import order of. This includes both normal and namespace packages, excluded and missing modules, and module aliases (e.g., six-style fake modules). None of these are added to the graph via the import_hook() method. By definition, the createNode() method is the only feasible place to hook everything at once.

Could you add some notes about your modification into file PyInstaller/lib/README.rst?

Absolutely! Why must I forget the essentials? //ruefully shakes head//

...without any need to change modulegraph itself.

[DANGER: Big wall of text follows. Proceed with ninja-like caution.]

I'm beginning to come to an unpopular opinion. No one's going to like it. Rebel without a cause time.

Rebel Without a Cause

In my humble opinion, PyInstaller should internally fork modulegraph and altgraph. Specifically, the embedded lib.modulegraph and lib.altgraph packages should be moved out of lib and into the main codebase. All existing copyrights and attributions should, of course, be preserved. All pretense of PyInstaller merely embedding or branching these libraries should be dropped.

Yeah. I just dropped the fork bomb. Why? A few reasons (in no particular order):

  • Ron's not the easiest volunteer to work with. I'm sure he's a great guy. He must be, because modulegraph and altgraph are both unadulterated awesome-sauce. Nonetheless, I haven't particularly enjoyed my online interactions with Ron and wouldn't necessarily want to repeat the experience. (It's probably an irreconcilable personality conflict. Again, I'm sure he's awesome.)
  • Ron, by his own admission, no longer has much time to devote to open-source development. Hence, modulegraph's moribund commit state. For all intents and purposes, these libraries are on the cusp of immanent bit-rot... if not death.
  • Ron has an unfortunate history of committing changes that break public API backwards compatibility. A recent example is his refactoring of the public load_module() and scan_code() methods into private _load_module() and _scan_code() methods, which famously broke py2app a few years ago. Somewhat hilariously (...but not really), py2app still appears to be broken in this respect. Whereas PyInstaller embeds PyInstaller-specific versions of these libraries, py2app imports the system-installed versions of these libraries. The latter approach is usually the Right Thing to Do (tm), until it isn't.
  • These libraries are the beating heart of PyInstaller. No, really. You can subtract out any other embedded library and still get a somewhat-kind-of-maybe-working PyInstaller. Subtract out modulegraph and altgraph, however. What are you left with? A big fat smoking pile of rubble-strewn ashes. (Which is to say, nothing.) PyInstaller increasingly requires non-trivial changes to these libraries, which absolutely makes sense. These libraries form the algorithmic core of all PyInstaller logic. Naturally, PyInstaller needs to occasionally change this logic and hence these libraries. That's inescapably baked into the fabric of our Universe. 🌠
  • Downstream (i.e., PyInstaller) changes are unlikely to be submitted upstream (i.e., to the official altgraph and modulegraph repositories), but must nonetheless be occasionally made to support PyInstaller-driven requirements (e.g., SWIG, six). To say this complicates synchronization between the two projects is an understatement. "Why you no submit changes upstream?", you are probably pondering. Again, a few reasons:
    • Mercurial (hg) is surely the devil's spawn. I don't like it; I don't want to use it; when I am infrequently forced to use it, I reteach myself the entire subcommand set, how branches work, what the best equivalent of git rebase is, and so on. Every time. (Not fun.)
    • BitBucket isn't much better. I don't like it; I don't want to use it; when I am infrequently forced to use it, I grit my teeth, gnash my mouth, and weep bitter granular tears. PyInstaller being hosted on a different site prevents sane communication between the two projects, complicating such otherwise simple tasks as pinging a particular developer for attention.
    • See "Ron's not the easiest volunteer to work with."
    • Time. There's only so much of it in the universe, and I squander enough on my existing crop of open-source projects. Adding yet another to the mix is not my idea of a volunteer weekend well-spent.
    • Benefit. There's no tangible benefit, to me, to submitting altgraph and modulegraph changes upstream. If it works in PyInstaller, I'm largely satisfied. The only other prominent consumer of these libraries appears to be py2app. For obvious reasons, I'm not overly concerned about assisting that project. (I wish them luck, of course.)

@htgoebel
Copy link
Member

@leycec Forking modulgraph may be an option - I was thinking about this, too. We should discuss in a separate issue. Could you please move the text from above into a new issue? Thanks.

Regarding the topic :-) Thanks for you analysis here. So we need to handle the order in createNode(). But wouldn't it be sufficient to override this in PyiModulegraph like this:

class PyiModulegraph:
    def __init__(self, ...):
        self.order_count = 0
    def createNode(self, *args, **kwargs):
       node = super(ModuleGraph, self).createNode(args, **kwargs)
       self.order_count += 1
       node.order = self.order_count
       return node

The main differences to you implementation is:

  1. Don't change the interface of the node-classes. Simply add another attribute.
  2. Don't care about the number of nodes in the graph, simply increase the counter each time. This is okay since we do not need a consecutive range of numbers. It even be beneficial if for any case the number of nodes decreases.
  3. No need to change the Node-classes in modulegraph. As you can see, I still want to avoid changes to modulgraph :-)

Re. 3.: Anyway I'd like to keep your code for the time we fork modulegraph.

@leycec
Copy link
Contributor Author

leycec commented Aug 13, 2015

We should discuss in a separate issue. Could you please move the text from above into a new issue? Thanks.

I am delighted to comply with your always sensible advice. Issue #1412, "To Fork or Not to Fork modulegraph and altgraph", is now a thing.

Let the vitriolic discussion commence! Bring it.

But wouldn't it be sufficient to override this in PyiModulegraph like this:

Ah-ha! I was hoping no one would think of that. In the ideal world of my grandiose dreams (...I am a flying king there), that would work. Unfortunately for you and I both, the definition of the parent ObjectGraph.createNode() method is as follows:

    def createNode(self, cls, name, *args, **kw):
        """
        Add a node of type cls to the graph if it does not already exist
        by the given name
        """
        m = self.findNode(name)
        if m is None:
            m = cls(name, *args, **kw)
            self.addNode(m)
        return m

Under your proposed PyiModulegraph.createNode() implementation, a node's ordering would non-deterministically change on every repeat call to createNode() passed that node rather than merely the first call to createNode() passed that node.

At this point, you are complaining: "Yes, yes. But isn't that pure conjecture? Does modulegraph really ever pass the same node multiple times to createNode()?" Because this is Bizarro World, the answer is of course "Maybe."

Bizarro World

The codebase is quite convoluted. (Understatement.) While it's difficult to say, the _load_module() and _load_package() methods both unconditionally call createNode(), implying a node could conceivably be passed multiple times to createNode().

But this is all beside the point. The createNode() method is intentionally designed to be idempotent (i.e., to be passed the same node multiple times without adding that node to the graph multiple times or otherwise modifying that node's properties), which is a good thing.

We should definitely preserve idempotency here... if only for future sanity. And the only sane way to do that is to initialize the Node.order attribute on the first (hopefully, only) addition of that node to the graph. And the only sane way to do that is to pass the desired order as a parameter to the parent ObjectGraph.createNode() method, which then passes that to the Node.__init__() method.

The insane alternative is to refactor your proposed PyiModulegraph.createNode() implementation to:

  1. Define a new local boolean is_node defaulting to False.
  2. Add a new if self.findNode(module_name) is not None: test, whose body sets is_node to True.
  3. Call the superclass createNode() as before.
  4. If is_node is False, initialize the node.order attribute as before.

It's incredibly awkward. It's also a little inefficient, as the findNode() method would then be called twice for every node addition. (Probably negligible. But still. It's the principle, man.)

It even be beneficial if for any case the number of nodes decreases.

Shitballs of fire. You're absolutely right. I'll refactor the current request to increment a new self.order_count-like instance attribute instead of calling graph.number_of_nodes(). I wuz a dumb.

Superb catch. ⚾

@leycec
Copy link
Contributor Author

leycec commented Aug 16, 2015

Updated.

The entire pull request has been truncated to only three lines of code, ignoring the usual overly verbose documentation. This addresses htgoebel's cutting observation that the prior approach broke under node deletion. (That's bad.) The new approach guarantees node ordering to increase monotonically, regardless of the current number of graph nodes. (That's good.)

Next up: fix wxPython. Weeeeeee. 🏇

@htgoebel
Copy link
Member

The codebase is quite convoluted.

Indeed it is :-(

I'm fine with you change, I'm going to merge it. Thanks for your thorough investigation and description.

Only one thing: Now we have order numbers, where do we use them? ;-) It looks like you loast dome of the code when updating the pull-request. git reflog may be your friend.

@leycec
Copy link
Contributor Author

leycec commented Aug 25, 2015

I've been hacking on this for quite a bit. It's a complete cock-up.

Specifically, nothing in the codebase appears to work as expected. The FakeModule.add_imports() method is fundamentally broken (and probably never worked), the suite of old_suite/libraries/test_wx_pubsub*.py functional tests are fundamentally broken (and probably never worked), the wx.lib.pubsub.core hook made a number of egregiously bad assumptions (and probably never worked), and... I could go on, but Jebus Crust.

I'm fairly confident that none of these things ever worked. They couldn't have, for reasons I am currently too contumely to rant about.

Long Story Short

The PyInstaller codebase is mildly frustrating at the moment. It's no one's fault; everyone's hacking hard; all's right with the world. I will get this working, but it's going to take blood, sweat, and crocodile tears.

Expect a big fat load of commits sometime... soonish.

@leycec
Copy link
Contributor Author

leycec commented Aug 25, 2015

Oh, fuck me. Has someone been hacking on our embedded PyInstaller.lib.modulegraph library? Because it doesn't work as advertised anymore.

Specifically, the modulegraph.import_hook() method used to raise ImportError exceptions on the passed module not being found. It doesn't anymore. The reason why is that someone commented out the two lines in the _load_tail() method that used to do that. (That someone may be Ron. No blame.) Instead, _load_tail() now adds a MissingModule to the graph – which isn't at all what it should be doing:

    def _load_tail(self, mod, tail):
        self.msgin(4, "load_tail", mod, tail)
        result = mod
        while tail:
            i = tail.find('.')
            if i < 0: i = len(tail)
            head, tail = tail[:i], tail[i+1:]
            mname = "%s.%s" % (result.identifier, head)
            result = self._import_module(head, mname, result)
            if result is None:
                result = self.createNode(MissingModule, mname)
                #self.msgout(4, "raise ImportError: No module named", mname)
                #raise ImportError("No module named " + mname)
        self.msgout(4, "load_tail ->", result)
        return result

That's not good. Everything depends on that method raising an exception. Everything. Like, for example, the ImportHook._process_hiddenimports() method, which attempts to catch an exception that's no longer being raised and hence can no longer even detect whether a hidden import was successfully imported or not.

This is exactly why modulegraph has two separate methods for importing modules:

  • import_hook(), which is supposed to raise ImportError exceptions rather than adding MissingModule nodes to the graph instead. That's critical, because it allows us to trivially test for whether a module was successfully imported or not.
  • _safe_import_hook(), which wraps import_hook() by catching the ImportError exceptions raised by that method and adds a MissingModule to the graph instead.

I don't quite know what's going on here. But it's bad, and I'm definitely going to spelunk to the bottom of this.

@leycec
Copy link
Contributor Author

leycec commented Aug 25, 2015

I don't get it. Insanely, Ron himself appears to have made that breaking change. After locally reverting back to the older working behaviour, I now receive the expected warning on missing hidden imports: e.g.,

9357 WARNING: Hidden import 'wx.lib.pubsub.pubsub1' not found (probably old hook)

Which is good. But Ron broke that, which is bad.

In my (possibly not-so-humble opinion), we should immediately stop pulling upstream changes from either altgraph or modulegraph. I have no idea when this upstream change was made – but it's not a good one. I'm concerned about what else might already be broken.

@htgoebel
Copy link
Member

For the records: Related change, done by Ron:https://bitbucket.org/ronaldoussoren/modulegraph/commits/cd6865045087e94bf38bb0dd2f0aae9421ed6632:

The primary use-case of this new information is to make it possible to have better reporting of
modules that seem to be missing

@leycec
Copy link
Contributor Author

leycec commented Aug 26, 2015

Thanks for dredging that up, man! Your grep skills are a lasting tribute to the spirit of humanity. 💡

I must admit I don't understand his rationale.

The contract of a public method comprises its method name, return type, argument types and order, and (crucially) types of raised exceptions. Java is probably the most explicit language in this regard, requiring all exceptions a method might throw be explicitly listed in the throws clause for that method. While most dynamic languages including Python are more implicit, the types of exceptions raised by public methods cannot be changed without breaking that contract and hence backward compatibility.

In this case, a core public method was changed from raising exceptions on fatal error conditions to raising no exceptions at all – a complete rupture in the space-time continuum.

I'd probably be more agreeable to this change (...despite it breaking PyInstaller and wasting several hours of my fragile time) if it actually made sense. But it doesn't. The ModuleGraph._scan_bytecode() method (which parses all import statements and is hence the core of ModuleGraph) already calls the _safe_import_hook() method (which converts ImportError exceptions raised by import_hook() into MissingModule nodes). There's no value added here. Missing modules were already added to the graph.

Moreover, this change is irrationally non-orthogonal (...which is almost worse). To see why, note that import_hook() internally calls two private methods to do its dirty work: _find_head_package() and _load_tail(). Given an import statement resembling from wx.pub import pubsub, import_hook() would first call _find_head_package() to import the parent package wx.pub and then call _load_tail() to import the child module pubsub. So far so reasonable.

Then things get weird. _find_head_package() still raises ImportError exceptions on parent packages not being found. That hasn't changed. But _load_tail() now silently adds MissingModule nodes on child modules not being found rather than raising ImportError exceptions. Does that make sense? No. If one of these private methods raises exceptions, so should the other; likewise, if one of these private methods adds MissingModule nodes, so should the other. (Of course, neither of these private methods should add MissingModule nodes – because _safe_import_hook() already does so.)

So it's non-orthogonal. Does that matter? Yes, but only if you value sanity and a non-broken codebase. Here's why.

On import errors, import_hook() now sometimes raises ImportError exceptions and sometimes adds MissingModule nodes – and which behaviour you get is basically non-deterministic. To reliably detect import errors, all import_hook() callers (including both PyInstaller and modulegraph itself) must now both catch ImportError exceptions and test for MissingModule nodes. Of course, none of them do. They all assume that import_hook() only raises ImportError exceptions on import errors. Wut.

I note at least six calls in PyInstaller and modulegraph to import_hook(), including the über-critical modulegraph._safe_import_hook() method – all of which are now non-deterministic and hence broken. This reliably breaks the SWIG support we recently added, for example. If that sounds like utter balls to you, that's because it is. (Hint: that should sound like utter balls to you.)

If Ron wants to break backward compatibility as well as his own codebase, he needs a better reason than hand-waving.

<\rant>

@leycec
Copy link
Contributor Author

leycec commented Sep 6, 2015

There's good news, and there's bad news.

The good news is that I've fixed both #1367 (wx.lib.pubsub) and #1420 (venv-specific distutils). 😆

The bad news is that it cost me precious sanity... and I was already running on empty. 😱

Additionally:

  • Two new types of hooks were required:
  • Extensive modulegraph changes were required. modulegraph makes numerous simplistic assumptions broken by these hooks, as well as failing to provide sufficiently granular methods for hooking. For example, post-create package hooks require that there exist a ModuleGraph._create_package() method. (Makes sense.) Well, there does now.
  • Even more extensive modulegraph changes will be required. As @htgoebels helpfully noted elsewhere (...somewhere), modulegraph needs to be refactored to use iteration rather than recursion. modulegraph currently processes imports recursively. Since Guido explicitly designed Python to prohibit deep recursion (e.g., by allocating stack frames on the finite call stack and providing no tail recursion), that's unutterably horrible. PyInstaller currently can't freeze applications with sufficiently deep import chains. While this could be clumsily hacked around by manually changing sys.setrecursionlimit, that's well-known to be dangerous.
  • Upstream is unlikely to embrace these changes. Understatement, thy name is this list item.
  • Upstream changes are no longer safely mergeable. It's not just that upstream is unpleasantly willing to break backward compatibility (...they are); it's that our downstream modulegraph no longer resembles the upstream copy. In these changes:
    • I document numerous unused modulegraph attributes, functions, methods, and submodules that are unlikely to ever be used and should be permanently removed. (Don't worry. I haven't removed anything. Yet.)
    • I refactor numerous modulegraph functions and methods to use newly defined attributes, functions, and methods not present in the upstream modulegraph.
    • For safety, I break API compatibility with the upstream modulegraph in minor ways. I only do so where absolutely necessary. Nonetheless, this is necessary. A simple example:
      • The upstream modulegraph defaults the level parameter passed to most useful methods (e.g., import_hook()) to a value conditionally dependent on the active Python interpreter. This breaks PyInstaller's hiddenimports under Python 2.7 and, more generally, is a terribly idea. These changes ensure that sort of breakage never happens again by requiring that a level parameter always be explicitly passed. (No more default value.) Since upstream doesn't, our downstream API is now incompatible with the upstream API.

PyInstaller and its modulegraph are now incestuously tied at the hip. They always were, but these changes really cement that. I'm convinced there exists no other means of fixing #1367 and #1420 and that these changes will be positively awesome for the PyInstaller community at large. (That means us.)

So, here's the plan. Since #1367 also requires the module ordering change implemented by this pull request, I'm going to begin collecting all changes required by both #1367 and #1420 here – starting tomorrow. These changes span modulegraph, hooks, tests, and PyInstaller itself. Basically, everything.

The anticipation in the air is palpable.

Dust Bowl

@matysek
Copy link
Member

matysek commented Sep 6, 2015

The bad news is that it cost me precious sanity... and I was already running on empty.

Will you ever get sanity back and how long will it take? Have you run into debts when running on empty?

@tallforasmurf
Copy link
Contributor

Nattering from the peanut gallery...

  1. This means a complete fork of Modulegraph now becomes part of PyInstaller and no longer communicates either upstream or down?
  2. Altgraph too, or can "our" Modulegraph use the "public" Altgraph?
  3. How many kinds of hooks are there now, I count four?
    • post-create package ones
    • in safe import ones
    • regular old-fashioned ones
    • bootloader run-time ones
  4. What are the convention(s) for hook names and hook folders that will let PyInstaller know which hook to invoke and when -- and not incidentally help the user keep the types of hooks separate and understandable?
  5. Is there means for PyInstaller to catch the error of invoking some type of hook at the wrong phase (because of being named wrong or put in the wrong folder), and clearly letting the user know what's wrong?

@htgoebel
Copy link
Member

htgoebel commented Sep 6, 2015

For the sake of release 3.0 I'm willed to accept (nearly) any code solving the blockers #1367 (wx.lib.pubsub), #1420 (venv-specific distutils) and #1322 (extend __path__).

@leycec I'm eager to see the code :-)

@tallforasmurf

  1. While I still did not find time to work through this complex modulgraph issues, I have the feeling that we need to redesign parts of modulgraph to fit our needs. Let alone the long discussion, we had here and in the named issues, I doubt, staying with modulegraph will help us in the long run.
  2. We did not change altgraph until yet.
  3. I hope, after clean-up we'll end up with only three types of hooks: post-create package ones (aka pre-import), post-import (aka in safe, old-fashioned) and run-time ones.
  4. Hmm, maybe we should push out a cleaned-up release 3.0.1 soon after 3.0 :-)
  5. New features only go into release 3.1 :-)

@tallforasmurf
Copy link
Contributor

There will be doc changes needed. I can take those on. Please be thinking of notes that will help me explain the uses and conventions for the two kinds of hooks. Maybe there should be a separate issue for doc changes for PyI 3.0? Then notes can be dropped there, and I can ask questions there?

@htgoebel
Copy link
Member

htgoebel commented Sep 6, 2015

@leycec Rethinking your statement, which sounds like you are making the Earth turn around the other direction: Is there any chance, to implement only a hack for now? E.g. by overwriting _scan_bytecode and Visitor._process_import, which are the places, the imports "executed".

This method replaces the undocumented spaghetti code of the
modulegraph.find_module() function with well-documented and maintainable code.
Refactoring this function into a method permits this method to be subclassed by
the "PyiModuleGraph" class and hence hooked for retargeting modules.
Spaghetti code in this method has been marginally improved. Open file handles are
now guaranteeably closed on exceptions. Terse and ambiguous variable names
(e.g., "m", "fp", "name") now have human-readable names.

For orthogonality with the ModuleGraph._safe_import_hook() method, the
ModuleGraph._import_module() method has been renamed to _safe_import_module().
All methods squelching "ImportError" exceptions should be prefixed by "_safe".
For disambiguity, "graph hooks" are now named "pre-safe import module hooks" and
live at "PyInstaller.hooks.pre_safe_import_module". The current "hook-six.moves"
hook has been refactored to support the new "PreSafeImportModuleAPI" class.
PyiModuleGraph now overrides the superclass _find_module_path() method with
support for pre-find module path hooks. In keeping with superclass changes, the
_import_module() method has been renamed to _safe_import_module() and now
supports pre-safe import module hooks with a proper communication API.

PyiModuleGraph now accepts an optional list of third-party hook directories on
initialization, permitting users to hook these new hook types.
The "building.imphook.HooksCache" class has been refactored to map from each
module name to the list of the absolute paths of all hooks specific to that
module name. Previously, this class only mapped from each module name to a
single absolute path, ensuring that user hooks override official hooks.

This class now subclasses "dict" rather than "UserDict", which has been (mostly)
deprecated since Python 2.2. For convenience, tilde and variable expansion is
also now applied to all passed hook dirnames.
The PyInstaller.building.build_main.Analysis.assemble() method now runs all
available post-graph hooks for each hooked module (rather than merely the most
recently cached such hook for that module).
A convenience @xfail decorator for pytest tests has been added.
@leycec
Copy link
Contributor Author

leycec commented Sep 26, 2015

Could you add xfail to test_django? This way CI tests will pass and this could be cleanly merged and later test_django will be debugged.

Excellent suggestion. The Django test and newly added wxPython PyPubSub tests have now all been marked as failing.

The Django test is currently known to fail with "GraphError" exceptions and has
now been marked as such. For organization, this test has been moved to a
separate script in the new "tests/functional/test_hooks" subdirectory.
The wxPython PyPubSub tests are currently known to fail (due to underlying hooks
failing) and have now been marked as such.
@matysek
Copy link
Member

matysek commented Sep 26, 2015

@htgoebel If you are fine with this don't hesitate to merge it.

@codewarrior0
Copy link
Contributor

@leycec Great work! With these changes I can finally build MCEdit using PyInstaller 3.0.

@htgoebel htgoebel removed the state:verify This needs to be verified - would be great if someone could write a test-case for label Sep 27, 2015
htgoebel added a commit that referenced this pull request Sep 27, 2015
New hook mechanisms for pre-safe_import_module, pre-find_module_path and post-graph hooks.
@htgoebel htgoebel merged commit 0f0c14f into pyinstaller:python3 Sep 27, 2015
@htgoebel
Copy link
Member

@leycec Thanks a lot for the great work!

One thing came to my mind: Shouldn't we make the existing (old) "ImportHook" handler submit a API, too? And should we merge the old hook mechanisms into the new ones?

@leycec
Copy link
Contributor Author

leycec commented Sep 28, 2015

Great work! With these changes I can finally build MCEdit using PyInstaller 3.0.

Glad to be of service! I only regret it taking so long.

Also, MCEdit is a wonderful service to man. That is all.

@leycec
Copy link
Contributor Author

leycec commented Sep 28, 2015

@htgoebel Excellent questions! You always keep me on my gangrenous toes. 👣

One thing came to my mind: Shouldn't we make the existing (old) ImportHook handler submit a API, too?

Absolutely.

The new building.imphookapi.PostGraphAPI class has been designed to be a drop-in replacement for the old building.imphook.FakeModule class. Despite bundling PostGraphAPI with this commit, I decided at the last minute not to make the switch-over in this pull request. I was concerned about merge conflicts and test failures. There don't appear to be any, so I probably should have just done it. Annyway. (When laziness and risk aversion unite!)

Making the switch-over is trivial:

  1. Remove building.imphook.FakeModule.
  2. Refactor the building.imphook.ImportHook._process_hook_function() method to use building.imphookapi.PostGraphAPI instead. In theory, this just means renaming FakeModule to PostGraphAPI in that method.

I've made the switch-over locally. So far, so good. I'll probably submit a new pull request finalizing this change over the coming week.

And should we merge the old hook mechanisms into the new ones?

Absolutely.

tallforasmurf and codewarrior0 were completely right on this point; I was completely wrong. Thankfully, I've seen the light. (It burns!)

What's the Plan, Man?

Here's how I see this going down:

  1. The building.imphook.HooksCache should be completely refactored from the ground up to support lazy hook loading. Currently, HooksCache only caches the filenames of all hooks on initialization. That's it. Instead, HooksCache should:
    1. Probably still cache the filenames of all hooks on initialization.
    2. Lazily cache the in-memory module object returned by the importlib_load_source() utility function for each hook in a "just-in-time" manner. The contents of unused hooks will not be cached. The contents of each used hook will be cached on the first attempt to access those contents, which will then be efficiently accessible by different hook points (e.g., PyiModuleGraph._find_module_path(), PyiModuleGraph._safe_import_module()) throughout the PyInstaller lifecycle.
  2. The contents of each hook in the new hooks/pre_find_module_path and hooks/pre_safe_import_module subdirectories should be moved back into the corresponding hook in the top-level hooks directory. For example:
    1. The pre_find_module_path() function defined by the new hooks/pre_find_module_path/hook-distutils.py hook should be moved back into the old hooks/hook-distutils.py hook.
  3. The new hooks/pre_find_module_path and hooks/pre_safe_import_module subdirectories should go away.

What Could Possibly Go Wrong?

There are a few things to consider when performing this refactoring, however.

For each hooked module, HooksCache needs to cache the contents of n possible hook files – not just 1. These are:

  • The official PyInstaller hook file for that module (if any).
  • The n-1 unofficial user-defined hook files for that module (if any).

To ensure each hook function is called at most once, HooksCache will also need some means of tracking which hook functions for each cached hook file have already been called and which haven't. I'm fairly sure we can do this by just deleting each in-memory hook function from its in-memory hook object after that function has been called. (There may be hidden dragons here, however.)

It's not exactly trivial. It's not exactly hard, either. I estimate a week's worth of design, implementation, testing, and documentation.

What About Them Hook Functions?

Yeah. We'll still need separate hook functions, of course. These are:

  • hook() for post-graph hooks.
  • pre_find_module_path() for pre-find module path hooks.
  • pre_safe_import_module() for pre-safe import module hooks.

After implementing lazy hook loading, we won't need to separate those functions into separate hook files. They'll all be unified into top-level hook/ files. If nothing else, that should simplify both the documentation and maintenance of hooks.

A big win for everybody! (Mostly, tallforasmurf. Best of luck on that documentation. I don't envy you.)

@htgoebel
Copy link
Member

I moved the discussion o a follow-up-ticket.

@leycec
Copy link
Contributor Author

leycec commented Sep 29, 2015

I moved the discussion o a follow-up-ticket.

Excellent. Thanks for that.

@htgoebel
Copy link
Member

For the records: test_django fails with GraphError: Invalid nodes matplotlib.pylab -> matplotlib.cbook.dedent

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants