Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Vector.from_polyglot_array to make Vectors backed by polyglot arrays #3628

Merged
merged 14 commits into from
Aug 23, 2022

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented Jul 29, 2022

Pull Request Description

Use Proxy_Polyglot_Array as a proxy for polyglot arrays, thus unifying
the way the underlying array is accessed in Vector.

Used the opportunity to cleanup builtin lookup, which now actually
respects what is defined in the body of @Builtin_Method annotation.

Also discovered that polyglot null values (in JS, Python and R) were leaking to Enso.
Fixed that by doing explicit translation to Nothing.

https://www.pivotaltracker.com/story/show/181123986

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the
    Scala,
    Java,
    and
    Rust
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide dist and ./run ide watch.

@hubertp hubertp changed the title Use Vector.from_polyglot_array, drop from_array Use Vector.from_polyglot_array to make Vectors backed by polyglot arrays Jul 29, 2022
@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from 50b2521 to a5a3a2c Compare August 1, 2022 09:27
@hubertp hubertp marked this pull request as ready for review August 1, 2022 09:27
@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from a5a3a2c to ec0c08f Compare August 1, 2022 09:29
@hubertp hubertp marked this pull request as draft August 1, 2022 10:26
@hubertp hubertp marked this pull request as ready for review August 1, 2022 13:08
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is along the lines needed.

If we move the to_array to be a method the new storage type this allows us to make Slice a no-op :).

The bit missing is to review the current Vector.Vector calls and map them into the new polyglot functions.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see Array in Enso to be done properly. If this "solution" somehow helps libraries, feel free to go on, but I find this wrong from Engine perspective. Especially knowing that we don't handle nested arrays [[1,2], [3,4,5]].

@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch 2 times, most recently from e593166 to b145485 Compare August 5, 2022 17:21
@hubertp hubertp requested a review from jdunkerley August 5, 2022 17:22
@hubertp
Copy link
Contributor Author

hubertp commented Aug 5, 2022

@jdunkerley I think I addressed all your concerns. This also revealed some bugs that I fixed on the way.
@JaroslavTulach As discussed, this only addresses concerns and use-cases mentioned in the ticket. Arrays, in general, will need some ❤️, as we discussed offline, and I will try to address that in a separate PR.

Also, while working on this ticket, I discovered that polyglot null values were leaking into Enso rather than being translated to Nothing. I added the necessary coercion for all polyglot languages and null check. I would appreciate your feedback if that is the right way to do it (see b145485) @JaroslavTulach @kustosz

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am completely OK with the null & Nothing changes.

@@ -64,6 +64,11 @@ long doInteger(Object integer, @CachedLibrary(limit = "5") InteropLibrary number
}
}

@Specialization(guards = "interop.isNull(value)")
Object doNull(Object value, @CachedLibrary(limit = "5") InteropLibrary interop) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using Java null to represent Enso Nothing? Truffle languages in general try to avoid usage of null as it requires additional checks in the partial evaluation mode and the whole interpreter code doesn't optimize so nicely. Following the traditional approach, we should have Nothing.NULL truffle object. CCing @kustosz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, null is not ideal but I had to use something that Epb and Enso know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to get rid of this "to null" and "from null" conversion and added a separate conversion node for the results of foreign calls.

Object elem = coercion.execute(library.readArrayElement(array, index));
if (elem == null) {
Builtins builtins = Context.get(this).getBuiltins();
return builtins.nothing().newInstance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope newInstances returns a singleton!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see cachedInstance field

@@ -49,7 +49,7 @@ spec = Test.group "Vectors" <|
built_from_js = Vector.from_polyglot_array generate_js_array
built_from_js . should_equal [1, 2, 3, 4, 5]
built_from_py = Vector.from_polyglot_array generate_py_array
built_from_py . should_equal [1, 2, 3, 4, 5]
built_from_py . should_equal [1, 2, 3, 4, Nothing]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I believe this is desirable. Foreign nulls are equal to Enso Nothing. Btw. this is not what Graal.js does. If you:

var value = foreignCall();
if (value === null) print('===');
if (value == null) print("==");

then only the second condition may be true! In Graal.js foreign nulls are not === with JavaScript null, but are only == with it. I've been burned by this feature so many times - I suggest in Enso Nothing is equal with any foreign Nothing/null.

Comment on lines +31 to +36
foreign python generate_py_array = """
return [1, 2, 3, 4, None]

foreign python generate_nested_py_array = """
return [[1, 2, 3], [4, 5]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd also like to see a test where these two situations are joined - a nested array containing a null - so that we can correctly verify that not only surface level nulls are converted but also ones deep into the nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unlikely to work atm - see the pending test I added. It also demonstrates that it wouldn't work even before this PR.
In order to support it, we need a bigger rewrite of polyglot arrays which is not part of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that it is pending. That is concerning - is the fix for that planned for the near future?

Well, I'd suggest adding this mixed test as pending too - to ensure that, once we do the rewrite, stuff is handled correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is concerning

Yes, I also find this concerning.

to_array self =
len = self.length
a = Array.new len
0.up_to len . each ix-> a.set_at ix (self.at ix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very inefficient.

Any reason why you cannot replace it with Array.copy self.arr 0 a 0 len?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm a bit worried about the fact that we would always copy here. Cannot we somehow wrap the polyglot array while avoiding copying? At least for cases where we can guarantee immutability as we were discussing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty much as inefficient as it was before, really. I would defer optimizations until we actually decide to rewrite Array, and we seem to be coming to that point very soon :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you comparing to? Before we didn't have a way to convert a polyglot array to Enso, to_array would just return it - which was a zero-cost operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to (now gone) from_array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, right.

I was thinking of the Vector.Vector polyglot_array approach which we used most of the time.

@radeusgd radeusgd mentioned this pull request Aug 11, 2022
4 tasks
Comment on lines 18 to 26
at : Number -> Any
at self index =
Polyglot.read_array_element self.arr index

to_array : Array Any
to_array self =
len = self.length
a = Array.new len
0.up_to len . each ix-> a.set_at ix (self.at ix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one observation regarding that - we do the copy only upon a to_array call - so this gives us the right type, but this does not solve the mutability issue - if the underlying polyglot array is mutable, the calls to wrapped_array.at will not be referentially transparent as the underlying array may change in-between them.

I'm not saying that is an issue, because I assume that this PR is not aiming to solve the mutability issue, but I just wanted to ensure that we are on the same page there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, we are on the same page.

@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from 0711f55 to 6c956cf Compare August 12, 2022 10:43
@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from 6c956cf to 0fd1217 Compare August 19, 2022 15:54
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an overall consensus this PR shall be integrated before work on Interop for Array & Vector starts, then let's do it.

My minority opinion continues to suggest to focus on Interop for Array & Vector which solves these array interop issues properly. But I don't want to stay in the way.

Engine approval is needed to get this PR off the table. Approving.

@hubertp
Copy link
Contributor Author

hubertp commented Aug 22, 2022

If there is an overall consensus this PR shall be integrated before work on Interop for Array & Vector starts, then let's do it.

My minority opinion continues to suggest to focus on Interop for Array & Vector which solves these array interop issues properly. But I don't want to stay in the way.

Yeap, improved Array/Vector is next in line

@hubertp hubertp added CI: Ready to merge This PR is eligible for automatic merge and removed CI: Ready to merge This PR is eligible for automatic merge labels Aug 22, 2022
@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from 23d1194 to 7e97b19 Compare August 23, 2022 07:49
Use Proxy_Polyglot_Array as a proxy for polyglot arrays, thus unifying
the way the underlying array is accessed in Vector.

Used the opportunity to cleanup builtin lookup, which now actually
respects what is defined in the body of @Builtin_Method annotation.
Vector now has a `storage` field, which will delegate to
Proxy_Polyglot_Array.
Identified what appears to be all invalid `Vector.Vector` constructs
when the underlying storage was a polyglot array.
Fixed a bug in our handling of polyglot values by
1) doing coercion of primitive values on all languages
2) explicitly checking for null values
3) whenever foreign calls return null we translate that to Nothing
This way we avoid explicit null checking which is admittedly not the
nicest way of dealing with Nothing conversion.
One last request to avoid copying Arrays when the underlying storage is
already an Array.
@hubertp hubertp force-pushed the wip/hubert/polyglot-arrays-181123986 branch from 5e34d39 to b22edb1 Compare August 23, 2022 13:27
@hubertp hubertp added the CI: Ready to merge This PR is eligible for automatic merge label Aug 23, 2022
@mergify mergify bot merged commit 4b9c916 into develop Aug 23, 2022
@mergify mergify bot deleted the wip/hubert/polyglot-arrays-181123986 branch August 23, 2022 21:13
Comment on lines +117 to +125
arr = self.storage.to_array
case arr of
Array ->
arr
_ ->
len = self.storage.length
a = Array.new len
Array.copy arr 0 a 0 len
a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this code?

My experience tells me that polyglot arrays also used to match the Array case in pattern matching, so I'm not really sure if it will do what you want - it will most likely always enter the Array branch. Any array-object that is compatible with Array.copy will also match the Array branch, so the second branch is essentially dead code, unless I'm missing something.

See the following code:

from Standard.Base import all

polyglot java import java.util.List

main =
    listarr = List.of 1 2 3
    IO.println listarr
    m = Meta.meta listarr
    IO.println m
    IO.println m.get_language
    IO.println (Meta.get_simple_type_name listarr)
    IO.println (Meta.get_qualified_type_name listarr)
    case listarr of
        Array -> IO.println "Matches Array!"
        _ -> IO.println "nope"

it actually prints

[1, 2, 3]
(Polyglot [1, 2, 3])
Java
java.util.ImmutableCollections$ListN
null
Matches Array!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the to_array method should have some doc comment, if at least ## PRIVATE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for commenting after merge, but I did not notice these earlier)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was affecting my Array.set_at removal tests, so I fixed it in 29c4abc (part of #3634).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants