Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unboxed atoms #3862

Merged
merged 38 commits into from
Jan 24, 2023
Merged

Unboxed atoms #3862

merged 38 commits into from
Jan 24, 2023

Conversation

kustosz
Copy link
Contributor

@kustosz kustosz commented Nov 9, 2022

Pull Request Description

Introduces unboxed (and arity-specialized) storage schemes for Atoms. It results in improvements both in memory consumption and runtime.
Memory wise: instead of using an array, we now use object fields. We also enable unboxing. This cuts a good few pointers in an unboxed object. E.g. a quadruple of integers is now 64 bytes (4x8 bytes for long fields + 16 bytes for layout and constructor pointers + 16 bytes for a class header). It used to be 168 bytes (4x24 bytes for boxed Longs + 16 bytes for array header + 32 bytes for array contents + 8 bytes for constructor ptr + 16 bytes for class header), so we're saving 104 bytes a piece. In the least impressive scenarios (all-boxed fields) we're saving 8 bytes per object (saving 16 bytes for array header, using 8 bytes for the new layout field). In the most-benchmarked case (list of longs), we save 32 bytes per cons-cell.
Time wise:
All list-summing benchmarks observe a ~2x speedup. List generation benchmarks get ~25x speedups, probably both due to less GC activity and better allocation characteristics (only allocating one object per Cons, rather than Cons + Object[] for fields). The "map-reverse" family gets a neat 10x speedup (part of the work is reading, which is 2x faster, the other is allocating, which is now 25x faster, we end up with 10x when combined).

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the
    Scala,
    Java,
    and
    Rust
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide build and ./run ide watch.

import org.enso.interpreter.runtime.callable.atom.Atom;
import org.enso.interpreter.runtime.callable.atom.AtomConstructor;
import org.enso.interpreter.runtime.data.struct.Struct;
import org.enso.interpreter.runtime.data.struct.AtomConstructor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than sixty changed files! I was wondering why the change is so huge. Consider doing the rename of Atom to Struct as a separate PR.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduction of AtomL0 seems like the right approach that we can expand on.

@Specialization(guards = "arity == 2")
Object do2(AtomConstructor constructor, Object[] arguments) {
if (arguments[0] instanceof Long) {
return new AtomLO(constructor, (long) arguments[0], arguments[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "demo" shortcut.

/** A runtime representation of an Atom in Enso. */
/**
* A runtime representation of an Atom in Enso.
*/
@ExportLibrary(InteropLibrary.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it of any use to export these libraries from here?

import com.oracle.truffle.api.library.LibraryFactory;

@GenerateLibrary
public abstract class StructsLibrary extends Library {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard Library isn't the most efficient (in terms of AST size) concept for use inside of a language, but I think we are not that far to switch to other concepts yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't Graal Team also mention that they want to eventually get rid of it as well?
I hope there will be an alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Christian Humer and he said: (Interop) Library is still useful and necessary for communication between different language implementations.

@ExportMessage
@ExplodeLoop
public boolean isMemberReadable(String member) {
for (int i = 0; i < constructor.getArity(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to @ExplodeLoop I am afraid. constructor isn't compilation constant. You'd need some profile, I think.

# Conflicts:
#	engine/runtime/src/main/java/org/enso/interpreter/node/ExpressionNode.java
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/atom/InstantiateNode.java
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin/error/InvalidArrayIndexError.java
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin/error/PolyglotError.java
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin/number/decimal/CompareToNode.java
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin/text/util/TypeToDisplayTextNode.java
#	engine/runtime/src/main/java/org/enso/interpreter/runtime/callable/atom/Atom.java
#	engine/runtime/src/main/java/org/enso/interpreter/runtime/callable/atom/AtomConstructor.java
#	engine/runtime/src/main/java/org/enso/interpreter/runtime/type/TypesFromProxy.java
# Conflicts:
#	engine/runtime/src/main/java/org/enso/interpreter/node/expression/builtin/meta/EqualsAnyNode.java
#	engine/runtime/src/main/java/org/enso/interpreter/runtime/callable/atom/Atom.java
@kustosz kustosz marked this pull request as ready for review January 21, 2023 18:34
Copy link
Contributor

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the description of the PR, please?

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I am leaving few comments but all seems green and that's (almost) all that matters!

@@ -121,10 +121,10 @@ class AtomFixtures extends DefaultInterpreterRunner {
| List.Cons h t -> @Tail_Call t.mapReverse f (List.Cons (f h) acc)
| _ -> acc
|
|main = self -> list ->
|main = list ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the JMH part of the benchmark check the returned value to make sure we execute some computation in the benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should. I would suggest when we run the benchmarks in CI, we assert some non-trivial runtime (e.g. more than 1ms). Not a part of this PR tho.

if (object instanceof Atom atom) {
Object[] fields = atom.getFields();
Object[] fields = structs.getFields(atom);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of atom.getFields() we'll have a StructsLibrary that will allow us to speculate on the shape of the atom. Would it make sense to have getter of long for an index another getter of double for an index and then generic getter of Object for an index?

Once the code is PE compiled, it makes no difference, but it would show some difference before compilation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is possibly a good future improvement. I don't want to add this now, because we have no benchmarks that prove this is a problem at all, and it is significant additional complexity.

@@ -37,7 +41,7 @@ public GetFieldNode(TruffleLanguage<?> language, int index, Type type, String na
public Object execute(VirtualFrame frame) {
// this is safe, as only Atoms will ever get here through method dispatch.
Atom atom = (Atom) Function.ArgumentsHelper.getPositionalArguments(frame.getArguments())[0];
return atom.getFields()[index];
return structs.getField(atom, index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had getLongField, getDoubleField and getField we could speculate here possibly avoiding boxing before the code gets PEed.


@NodeInfo(shortName = "get_field", description = "A base for auto-generated Atom getters.")
public class GetFieldNode extends RootNode {
private final int index;
private final String name;
private final Type type;

private @Child StructsLibrary structs = StructsLibrary.getFactory().createDispatched(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speculating for 10 different shapes of atoms. Isn't that too much? Do we observe any change in performance when we go to three or even two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, but we don't have benchmarks that saturate this at all. Intuitively: I chose 10 because it is enough to handle all 2-field cases and it is much cheaper to go through the cache checks here 10 times than fire the uncached version even once.

*/
public Atom(AtomConstructor constructor, Object... fields) {
public Atom(AtomConstructor constructor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the constructor protected.

if (layouts.length != this.unboxedLayouts.length) {
// Layouts changed since we last tried; Update & try again
updateFromConstructor();
return execute(arguments);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute shall probably not be executed while holding the lock. Release the lock first, then call execute.


// Layouts didn't change; just create a new one and register it
var newLayout = Layout.create(arity, flags);
constructor.addLayout(newLayout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very happy when I see an internal constructor logic being performed in _layout- I'd rather have it all at a single place. Exposinglock` and letting anyone invoke some operations that would rather be atomic is fragile.

As a minimum use assert lock.isHeldByCurrentThread() (as suggested elsewhere in this review) at methods that shall only be accessed while holding the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah good point, I wrote it quickly and forgot about it... The whole thing needs a rewrite tbh, the responsibilities are horrid.

* {@link Double} fields before the {@link Long} fields, but this is not required or enforced
* by this class.
* <li>These design choices mean that to enable optimal storage of N-field atoms, we need N+1
* different subclasses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

* The concrete subclass will also generate a method allowing to obtain a factory for these nodes
* based on dynamically-passed index, i.e. with the following signature: {@code public static
* NodeFactory<? extends FieldGetterNode> getFieldGetterNodeFactory(int storageIndex, boolean
* isDoubleIfUnboxed)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@@ -2,6 +2,7 @@

import org.enso.compiler.exception.CompilerError;
import org.enso.interpreter.runtime.builtin.Builtins;
import org.enso.interpreter.runtime.callable.atom.Atom;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless.

@JaroslavTulach
Copy link
Member

Pull Request Description

TBD. Nothing works here, PRing not to lose it

Please update PR's description to match reality and share some performance numbers.

@kustosz kustosz added the CI: Ready to merge This PR is eligible for automatic merge label Jan 24, 2023
@mergify mergify bot merged commit 242bd52 into develop Jan 24, 2023
@mergify mergify bot deleted the wip/mk/unbox-atoms branch January 24, 2023 13:03
hubertp added a commit that referenced this pull request Jan 24, 2023
Slipped through review of #3862
mergify bot pushed a commit that referenced this pull request Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants