Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Builtin Types and Methods to stdlib #3363

Merged
merged 80 commits into from
May 5, 2022

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented Mar 25, 2022

Pull Request Description

This PR replaces hard-coded @Builtin_Method and @Builtin_Type nodes in Builtins with an automated solution
that a) collects metadata from such annotations b) generates BuiltinTypes c) registers builtin methods with corresponding
constructors.
The main differences are:

  1. The owner of the builtin method does not necessarily have to be a builtin type
  2. You can now mix regular methods and builtin ones in stdlib
  3. No need to keep track of builtin methods and types in various places and register them by hand (a source of many typos or omissions as it found during the process of this PR)

Related to #181497846
Benchmarks also execute within the margin of error.

Important Notes

The PR got a bit large over time as I was moving various builtin types and finding various corner cases.
Most of the changes however are rather simple c&p from Builtins.enso to the corresponding stdlib module.
Here is the list of the most crucial updates:

  • engine/runtime/src/main/java/org/enso/interpreter/runtime/builtin/Builtins.java - the core of the changes. We no longer register individual builtin constructors and their methods by hand. Instead, the information about those is read from 2 metadata files generated by annotation processors. When the builtin method is encountered in stdlib, we do not ignore the method. Instead we lookup it up in the list of registered functions (see getBuiltinFunction and IrToTruffle)
  • engine/runtime/src/main/java/org/enso/interpreter/runtime/callable/atom/AtomConstructor.java has now information whether it corresponds to the builtin type or not.
  • engine/runtime/src/main/scala/org/enso/compiler/codegen/RuntimeStubsGenerator.scala - when runtime stubs generator encounters a builtin type, based on the @Builtin_Type annotation, it looks up an existing constructor for it and registers it in the provided scope, rather than creating a new one. The scope of the constructor is also changed to the one coming from stdlib, while ensuring that synthetic methods (for fields) also get assigned correctly
  • engine/runtime/src/main/scala/org/enso/compiler/codegen/IrToTruffle.scala - when a builtin method is encountered in stdlib we don't generate a new function node for it, instead we look it up in the list of registered builtin methods. Note that Integer and Number present a bit of a challenge because they list a whole bunch of methods that don't have a corresponding method (instead delegating to small/big integer implementations).
    During the translation new atom constructors get initialized but we don't want to do it for builtins which have gone through the process earlier, hence the exception
  • lib/scala/interpreter-dsl/src/main/java/org/enso/interpreter/dsl/MethodProcessor.java - @Builtin_Method processor not only generates the actual code fpr nodes but also collects and writes the info about them (name, class, params) to a metadata file that is read during builtins initialization
  • lib/scala/interpreter-dsl/src/main/java/org/enso/interpreter/dsl/MethodProcessor.java - @Builtin_Method processor no longer generates only (root) nodes but also collects and writes the info about them (name, class, params) to a metadata file that is read during builtins initialization
  • lib/scala/interpreter-dsl/src/main/java/org/enso/interpreter/dsl/TypeProcessor.java - Similar to MethodProcessor but handles @Builtin_Type annotations. It doesn't, yet, generate any builtin objects. It also collects the names, as present in stdlib, if any, so that we can generate the names automatically (see generated types/ConstantsGen.java)
  • engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin - various classes annotated with @BuiltinType to ensure that the atom constructor is always properly registered for the builitn. Note that in order to support types fields in those, annotation takes optional params parameter (comma separated).
  • engine/runtime/src/bench/scala/org/enso/interpreter/bench/fixtures/semantic/AtomFixtures.scala - drop manual creation of test list which seemed to be a relict of the old design

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

This is WIP. This change does not attempt to handle @Builtin_Type. Yet.

During the initialization of Builtins we collect info about root nodes
representing generated AST truffle nodes. Such nodes will dynamically
replace implementations of @Builtin_Method stubs during IR to Truffle
translation.
There are still rough edges to work out but checking this in to see if
this is the direction we want to take.
Sadly because of the move of the functions to a separate module and
@Builtin_Type Polyglot still being in Builtins it's tricky to get auto
imports that don't require manual adjustments.
The main issue is with `PolyglotBranchNode` in `IrToTruffle` that
requires a reference to the polyglot atom constructor.
Decided to make this incremental step to do some benchmarking of
@Builtin_Method solution and then move on @Builtin_Type.
The latter will require a separate pass, hence the additional step.

Additionally removed unused warnings for methods annotated with
@Builtin_Method.
The change moves another type, Array, out of Builtins.enso.
Also introduced a couple of changes to make it possible:
- Complex Types cannot erase annotations of its atoms
- method lookup algorithm has to take into account situation where
Complex Type, it's Atom *and* module use the same name, which makes name
resolution slightly ambiguous. The new version ensures that we don't
have to use qualified names *every single time*

This change introduces some lookup maps for checking if the given atom
constructor is a builtin type. Using maps (and strings as keys) is not
always necessary or unambiguous, and will work on a more permanent
solution.
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to declarative - e.g. not imperative - registration is always good. In addition if, ideally, we could:

  • make all the necessary information available at compile time
  • read the information step-by-step when needed

It could make the system more "native image" ready. It could speed up initialization even in HotSpot mode.

A very simple move of Boolean values to std library. Initially wanted to
also fix the type definition of Boolean so that it looks like
```
@Builtin_Type
type Boolean
  type True
  type False

  ... methods
```
rather than current
type Boolean
  @Builtin_Type
  type Builtin_Type

  ... methods

@Builtin_Type
type True
@Builtin_Type
type False
```
but turns out this is a bigger task on its own. For example ComplexTypes
pass would have to keep the information Boolean data type, rather than
erasing it.
Per PR review: using `reflections` library adds an unnecessary cost at
startup.
Instead, while generating code from the BuiltinMethod
annotations, we write meta files that can be read at compile time,
therefore avoiding the performance penalty.
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicer. In addition it would be good to eliminate also FileSystem creation and walking through it.

Demonstrates support for @BuiltinMethod when we are not dealing with
special types known to the compiler.
In order to be able to locally extend a builtin type, we have to be able
to then find it during symbol resolution.
Since the symbol is being added to a module in stdlib but resolution
would be using Any special builtin type, the latter has to be aware of
the former via shadowdefinitions reference.
Mostly c&p, and occassional fix for imports/renaming.
Rather than writing multiple tiny metadata files per each builtin we now
generate a single file with all information.
This simplifies the logic of reading that data in builtins a lot.
Reading resource as a stream rather than as a file allows us to remove
the need to initialize FileSystem.
Move builtin types Cons & Nil to stdlib List module.
Mostly c&p, also preparing for getting rid of temp
`builtinConstructors` list for initialization of builtin types.
Use @BuiltinType to mark specific types as BuiltinTypes, and gather
necessary metadata for initializing them during Builtins setup.
This is WIP, lacks documentation and such, but illustrates steps towards
system of builtin objects without hardcoding everything.
Copy link
Contributor

@kustosz kustosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot, and it's going to be a great boon for usability! Reported some problems inline that make this broken though (should be easy enough to fix though).

In separate compilation annotation processor will only process files
that were touched. However, since we now create metadata files with bulk
information and it is not possible to append/modify exisiting resource
files, it will simply overwrite the original resource with a fraction
of the original information. This would in turn lead to runtime issues.

The only solution was to clean and compile from scratch which is not
really acceptable in regular dev.
Alternatively, in the past we were creating metadata file per source
file. The latter however meant that during Builtins initialization we
had to do a bit of File traversing.

Instead we will read the previous resource file before overriding it,
update entries that changed and append the past contents. It doesn't
deal with all cases (renaming/delete) but neither would the
alternatives.
The initial work introduced a workaround for a "feature" that existed
before builtins work.
Consider a module Foo defined `src/Data/Foo.enso`:

```
type Foo

    type Foo a

    plus_one  = 1 + a

new x = Foo x
```
And now you define extension methods on Foo

```
from Standard.Base.Data.Foo import all

Foo.two = 2

Foo.Foo.three = 3

main =
    x = Foo.new 0
    x.two // not OK
    x.there // fine
```
This is how it works even in the current master.
The logic in ModuleScope allowed to workaround the (hidden) name resolution ambiguity
and just define `Foo.two`. This became apparent when adding methods to
types such as Nothing and Any and some tests were failing.
This change reverts the workaround but adds a pending test to
potentially address this in the future.
// Ensure that synthetic methods, such as getters for fields are in the scope
// Some scopes won't have any methods at this point, e.g., Nil or Nothing, hence the null
// check.
Map<String, Function> methods = this.definitionScope.getMethods().get(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly guard byCompilerAsserts.neverPartOfCompilation().

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

228 files! I am tired just scrolling thru them. Grandiose work. Please integrate it.

hubertp and others added 5 commits April 29, 2022 18:33
BuiltinType annotation now provides an additional parameter that stores
a link to the corresponding definition in stdlib.

TypesFromProxy provides a single static method `fromTypeSystem` which converts
from type-system type names to atoms.
It is a proxy because it could easily be placed inside {@link org.enso.interpreter.runtime.builtin.Builtins}.
Except that by the time Builtins is compiled it requires ConstantsGen to be present, which is being generated via
the TypeProcessor at a late stage. Similarly {@link org.enso.interpreter.runtime.type.Types} requires
ConstantsGen to be in the same package which creates a catch-22 situation.
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Avoiding the `CompilationFinal` logic all over the place by abstracting
it to a separate class.
Minor cleanups as well.
@hubertp hubertp requested review from kustosz and radeusgd May 4, 2022 11:12
Loading stdlib for `org.enso.interpreter.test.instrument.*` tests
appears to be the main slowdown with builtins move. As in 25+ seconds vs
1 second, slowdown.

Turning IR caching does not appear to affect the results and we are more
or less back to what we used to have.
Copy link
Contributor

@kustosz kustosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this. Regarding test slowdowns – think about the micro-stdlib approach. Maybe in the next PR?

@mwu-tow mwu-tow merged commit 4bbabc0 into develop May 5, 2022
@mwu-tow mwu-tow deleted the wip/hubert/shadow-definition-builtins-181497846 branch May 5, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants