Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New compiler. #4513

Merged
merged 13 commits into from Sep 13, 2019
Merged

New compiler. #4513

merged 13 commits into from Sep 13, 2019

Conversation

stuartarchibald
Copy link
Contributor

This PR is large, both in terms of churn and in terms of conceptual change.

The intent is as follows:

  1. Redesign the Numba pipeline to be a bit more like that which is in LLVM. This
    to comprise:

    1. Passes must extend from a base class and implement defined methods.
    2. Passes must be registered with a pass registry.
    3. Instances of a PassManager class are used to orchestrate a "pipeline" of
      passes to execute.
    4. A CompilerBase class instance holds the state, the PassManager
      orchestrated passes operate on this state.
    5. A DefaultPassBuilder class provides static methods defining default or
      commonly used pipelines.
  2. In addition, points for future work on safety and optimisation are added.

    1. AnalysisUsage cf. that in LLVM is stubbed out to permit eventual
      declaration of analysis dependencies between passes.
    2. Passes must return True/False depending on whether statement level changes
      have been made.
    3. At registration time passes must declare whether they mutate the CFG and
      whether they are analysis only passes.
    4. A simple timer is present to capture the execution time spent in each pass
      for a given pipeline execution.
    5. Basic support for print_after like functionality to print the IR
      following any transform is added.

The approach taken in this PR:

  1. Each "stage" in the old compiler pipeline is now approximately a single new
    compiler pass.
  2. The new compiler passes are split largely into two categories, those which
    are untyped (i.e. do not need type information and run before type inference)
    and those which are typed (i.e. need type information and run after type
    inference). These can be found in numba.untyped_passes and
    numba.typed_passes respectively. Passes relating object mode are in
    numba.object_mode_passes.
  3. The fundamental types and machinery to build the new compilation chain can be
    found in numba.compiler_machinery. This includes the PassManager, the
    base class for passes CompilerPass, and the pass_registry decorator (used
    for registering new passes).
  4. The original compiler.py has been heavily refactored to make use of the
    more modular design described above. Most notably the CompilerBase class is
    now responsible for determining how to execute pipelines defined by
    PassManager instances with each PassManager holding a single pipeline.
    The compilation state information is also attached to the CompilerBase
    class and is not part of a PassManager. This makes it easier for users
    building their own compilers to not have to subscribe to the Numba JIT
    compilation semantics, the Compiler state can be user defined as can the
    pipeline managed by the PassManager itself (along with all the passes and
    their execution order).
  5. A large amount of "fixing" was required throughout the code base to
    accommodate the above.

Outstanding items/areas of concern:

  1. Does this meet the needs of Numba? Is it too complicated?
  2. Is the CompilerPass design and its registration mechanism reasonable?
    • Could be based on class instances instead of classes themselves to allow
      more configuration from the same code. But equally inheritance/composition
      could achieve the same.
    • Could/should pipelines be derived and manipulated more easily?
  3. Is this sufficiently recognisable in terms of compiler design to permit
    getting up to speed quickly (along with new docs/examples).
  4. With respect to object mode deprecation, does this design permit/allow a
    "mode" based pipeline as default? It seems like it could with separate
    compiler instances extending from CompilerBase each defining a single
    pipeline via the define_pipelines() method with new named pipelines for
    each purpose added to the DefaultPassBuilder.

Work left on this PR:

  1. Fix any failing tests.
  2. Write new unit tests as needed.
  3. Documentation.
  4. Further code clean up/refactoring, there's some duplication between the modules holding passes.

As title.
numba/compiler_machinery.py Outdated Show resolved Hide resolved
numba/compiler.py Outdated Show resolved Hide resolved
numba/compiler.py Outdated Show resolved Hide resolved
@ehsantn
Copy link
Collaborator

ehsantn commented Sep 6, 2019

I just tested this PR and everything seems good. It actually exposed a problem in my pipeline since there was a name conflict in stage function names. Also, the "print after" feature is great!

class CompilerPass(object):

@abstractmethod
def __init__(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does __init__ has to be an abstractmethod? A lot of times it's just a passthru to the parent's implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs to be. I tried a bunch of different ways of describing this class and the pipeline and I think that this is an artifact from when pass registration required an instance of a pass to be registered. I was thinking about how pass classes might be extended, and was considering the merits of inheritance over composition and settled for the latter as most of the time passes are pretty unique should be state free by design.

@sklam
Copy link
Member

sklam commented Sep 9, 2019

I just tried adapting the literal dispatch PR to use the new pass API. It's so easy.

@sklam
Copy link
Member

sklam commented Sep 11, 2019

The latest commit is causing 28 (out of 47) errors in numba.tests.test_withlifting

@stuartarchibald
Copy link
Contributor Author

Thanks @sklam, turns out that removing the .pipeline from the state to drop the self reference breaks that, so instead I think it's best to just set it to None as the function exits.

break
else:
raise ValueError("Could not find pass %s" % location)
self.passes.insert(idx + 1, (pass_cls, str(pass_cls)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the str(pass_cls) be from a description parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, I was maintaining the original interface style, is a description useful in this new style as passes are uniquely defined and have a name? Am happy to do either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same pass could be added several times; i.e. a pass can require a DCE beforehand. A description can help inform why the pass is added. Not high priority though.

numba/compiler_machinery.py Outdated Show resolved Hide resolved
numba/compiler_machinery.py Outdated Show resolved Hide resolved

def finalize(self):
"""
Finalize the PassManager, after which no more passes may be added and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more passes may be added

^ Not enforced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in two minds... On the one hand, it's quite useful to be able to say "finalize this, compute analysis usage etc" and that be fixed. On the other, its quite useful to be able to mutate a "finalized" pass, which unsets its finalized state and subsequently re-finalize it when done. I think one way of achieving the latter with finalization equating to immutable would be to have a copy constructor or similar to generate a new pipeline from an existing? Suggestions welcomed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea on having a copy constructor

numba/compiler_machinery.py Outdated Show resolved Hide resolved
@sklam
Copy link
Member

sklam commented Sep 12, 2019

Looks to me that the self-reference is only needed for pipeline re-entrant.

@stuartarchibald
Copy link
Contributor Author

Looks to me that the self-reference is only needed for pipeline re-entrant.

yeah, I think that's the case.

numba/compiler.py Outdated Show resolved Hide resolved
@sklam sklam added the 5 - Ready to merge Review and testing done, is ready to merge label Sep 13, 2019
Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. This is an important refactor to cleanup the compiler pipeline. Other comments in the review can be resolved later. We'll learn more once we use the new code in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review 5 - Ready to merge Review and testing done, is ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants