all/any builtins implemented using Truffle #239

OctaveLarose · 2021-10-13T16:23:36Z

Hi all,

First of all, thank you for your work on this project! Second of all, while I've read the contributing guidelines and signed the OCA, I'm still waiting for it to be approved ; I thought I'd open this pull request to get some potential feedback in the meantime.

I've only been looking at this project for a couple of days and I'm not too familiar with it, just like I'm fairly new to Graal as well. Which is why I'm assuming this work should be revised.

While functional, it can be definitely improved in many ways I'm most likely not familiar with. As a series of notes:

Unsure about a while(true) loop in the generic implementations for both. While I'm assuming a StopIteration exception will always be called and caught no matter what, I can't say it with full confidence out of lack of knowledge regarding GraalPython. Also unsure whether this implementation can prevent KeyboardInterrupt.
No string specialization, as I assumed they weren't warranted as an all or any call on a str will always return True as far as I know anyway (unless any is fed an empty string). I could write a specialization method for strings that simply returns true, however. (and does a simple check for empty strings with any).
I wrote a dict specialization but don't know how warranted it is for the reasons mentioned above. I made the all one return True by default because of them ; let me know if you can think of counterarguments to this implementation. (or this potential PString one)
No bytes/bytearray specializations. I originally specialized using PSequence, which obviously caught them as they extend it, but I assumed specificity yielded better performance. Adding specifications for these types would be trivial.
Small note: I match on PSet and not PBaseSet, hence not accounting for PFrozenSet. Unsure how relevant that is and how it should be done.
All my code is in BuiltinFunctions.java, but I'd rather have the bulk of it be in one/two separate class(es), as seems to be the case for many builtin functions. However, I wasn't sure what to name them, where to put them in the file tree, etc. Which plays into my next question:
I've been told specializing on different storage strategies may be pertinent, i.e subclasses of SequenceStorage instead of simply relying on this parent class (as I do here). If relevant, how should one go about this? This sounds like a lot of added logic if done right.

Thanks for your time and hard work!

graalvmbot · 2021-10-14T08:11:56Z

Hello Octave Larose, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address octave -(dot)- larose -(at)- hotmail -(dot)- fr. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

msimacek

Thank you for the contribution. The code is mostly correct, but has few performance problems, see the comments inline. I'll answer your questions in a spearate comment, the review UI is not good for writing long replies

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

msimacek · 2021-10-14T10:52:36Z

Unsure about a while(true) loop in the generic implementations for both. While I'm assuming a StopIteration exception will always be called and caught no matter what, I can't say it with full confidence out of lack of knowledge regarding GraalPython. Also unsure whether this implementation can prevent KeyboardInterrupt.

while(true) is fine. You call e.expectStopIteration, which does basically this (pseudocode):

if (e.type != StopIteration)
   throw e;

So any other exception than StopIteration will terminate the loop implicitly.

No string specialization, as I assumed they weren't warranted as an all or any call on a str will always return True as far as I know anyway (unless any is fed an empty string). I could write a specialization method for strings that simply returns true, however. (and does a simple check for empty strings with any).

Agreed, the generic specialization will handle strings. We don't need to cover all builtin types, just the common ones where we expect some performance benefit from taking advantage of their internals.

I wrote a dict specialization but don't know how warranted it is for the reasons mentioned above. I made the all one return True by default because of them ; let me know if you can think of counterarguments to this implementation. (or this potential PString one)

It's not guaranteed, I added a counterexample in the review.

No bytes/bytearray specializations. I originally specialized using PSequence, which obviously caught them as they extend it, but I assumed specificity yielded better performance. Adding specifications for these types would be trivial.

I think it's ok as it is. Specializing on PSequence would also work, but then you would have to use GetSequenceStorageNode instead of calling the getSequenceStorage method (see the review comment about lenght() virtual call).

Small note: I match on PSet and not PBaseSet, hence not accounting for PFrozenSet. Unsure how relevant that is and how it should be done.

You can match on PBaseSet, there should be no downside in doing that.

All my code is in BuiltinFunctions.java, but I'd rather have the bulk of it be in one/two separate class(es), as seems to be the case for many builtin functions. However, I wasn't sure what to name them, where to put them in the file tree, etc.

I'd personally keep them in BuiltinFunctions to make them easy to find. They are not that big. Which "many builtin functions" do you see outside BuiltinFunctions?

I've been told specializing on different storage strategies may be pertinent, i.e subclasses of SequenceStorage instead of simply relying on this parent class (as I do here). If relevant, how should one go about this? This sounds like a lot of added logic if done right.

I mentioned it in the review too. You don't have to specialize on every subclass, just the ones that will be common and then have a generic fallback that handles the rest. I'd personally do that in a separate inner helper node to avoid combinatorial explosion of specializations.
You may also consider generalizing the all and any operations into a common superclass or helper node that would have a boolean field or boolean argument that determines if the operation is all or any.

OctaveLarose · 2021-10-14T15:37:29Z

Thanks for your feedback! I'll try to send an updated version next week or so.

To answer a specific few comments:

dict iteration yields the keys. There is no guarantee that all/any of them would be always true.

I did know it yielded the keys (which makes me realize calling entries was an obvious mistake I should have noticed), but actually had no idea they could be None... or anything other than strings, hence my default true. Frankly I don't know how I never came across that feature after years of using Python.

I'd personally keep them in BuiltinFunctions to make them easy to find. They are not that big. Which "many builtin functions" do you see outside BuiltinFunctions?

I was just thinking of calls to various nodes, like PyObjectSizeNode, my wording was off. But really, my intent was to move at least part of the logic to a specialized node in the same way, which you've also recommended in a code comment, so that answers it.

You may also consider generalizing the all and any operations into a common superclass or helper node that would have a boolean field or boolean argument that determines if the operation is all or any.

I did consider it, actually! I thought it might be better to keep them separate for code clarity reasons and maybe performance (it's hard for me to assess what can impact performance negatively and what can't, for now). But yes, if so I'll gladly make them share some logic. Not a fan of boolean arguments and might use an enum instead for clarity, but that's hardly important.

Many questions I had but didn't necessarily ask got answered in your comments, so thank you for that! Thanks for taking the time to write these remarks, in general.

OctaveLarose · 2021-10-22T11:32:13Z

Here's the updated version of the code, taking your feedback into account. The implementation should now be correct now that the behavior of all/any on dict have been fixed, but let me know if this isn't the case.

A few notes:

I did the specialization over SequenceStorage subtypes in a nested node named AllOrAnySequenceStorageNode, as mentioned. I only wrote specializations for bool and int, although obviously adding more is trivial.
My real query w.r.t that node is about child nodes I declared for it, PyObjectIsTrueNode and LenNode ; I wanted to cache them to avoid having to pass them to the original specialization methods (e.g doInt in AllNode or AnyNode), and thought it would be wise to rely on the @Child annotation (a bit of a shot in the dark) for this. I'm not sure if this is advisable or if the annotation could be removed entirely (I'm also unsure about the @GenerateNodeFactory I prepended to it, again, another shot in the dark hoping it'd be better for performance)
Since I added children nodes, I needed AllOrAnySequenceStorageNode to actually inherit from a parent node class. I chose PythonUnaryBuiltinNode since I didn't know what to pick, but that one hardly seems appropriate for this usage given its name.
Unsure about my AllOrAnySequenceStorageNode.checkSequenceStorage(...) method which checks the type using instanceof. It sounded like the natural solution to me, but I later considered that there may be more optimized/smart ways of doing this in this case that I'm simply not aware of.

That's pretty much it, let me know if I didn't implement code related to your previous feedback correctly. Also looking for advice on improving code readability/quality!

smarr

Just some brief comments.

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

msimacek

I added some comments, but I'm out of time now, I'll probably comment more next week.
I recommend you read some of the main papers about GraalVM/Truffle, if you haven't done that already. If you're writing a thesis, they will serve well for citations. In particular:

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

msimacek

Looks good to me overall, thank you for fixing the problems. There are two comments left unaddressed and I added one more. I'll submit it for review internally, so I might add some comments from others or results from the CI.

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

graalvmbot · 2021-10-29T08:10:48Z

Octave Larose has signed the Oracle Contributor Agreement (based on email address octave -(dot)- larose -(at)- hotmail -(dot)- fr) so can contribute to this repository.

msimacek · 2021-10-29T10:50:28Z

@OctaveLarose could you please rebase the PR on top of current master and make it build? It doesn't build anymore due to some unused import of a class we removed in the meantime

OctaveLarose · 2021-10-29T13:34:09Z

@OctaveLarose could you please rebase the PR on top of current master and make it build? It doesn't build anymore due to some unused import of a class we removed in the meantime

Sure thing, should be fixed now. I'd been basing it off an older version of master because the most recent one never seemed to build for me ; even now, an mx build yields a java.lang.NullPointerException when building cext on my machine.

As for the rest, I wasn't 100% done (hence the odd @Specialization(limit = "3") for doHashStorage(), for instance) and wanted to implement those last changes and proofread my work, but feel free to submit it for review internally in the meantime.

I was also trying to compare my changes with the current implementation via mx benchmark, but it can't find any Java VMs to default to and no configs are available, so I can't get it to work with graalpython. I'll probably try to fix that a bit more on my end then eventually give up and ask on your Slack!

msimacek · 2021-10-29T13:59:09Z

even now, an mx build yields a java.lang.NullPointerException when building cext on my machine.

If master cannot build, there are 2 common reasons why that could happen:

The graal repository checkout is out of sync with the revision we expect. A simple fix is mx sforceimports. It's a command that checks out the revisions of dependencies specified in the suite file. It's generally a good idea to run mx sforceimports everytime you pull master.
There could be some unexpected files from previous builds. Usually fixed by mx clean and building again.

I wasn't 100% done

Ah, sorry, it seemed quite complete, so I thought it was. No hurry, take your time.

it can't find any Java VMs to default to

You need to build and run with the graal compiler enabled. By default, most mx commands run without it. To enable it, you have to add --dy /compiler after mx. You first have to build it with mx --dy /compiler build, then you can run mx --dy /compiler benchmark ....

ask on your Slack

Feel free to ping me there directly.

OctaveLarose · 2021-11-01T11:23:59Z

That should be it, I thought I had more left to do - my bad for not getting it done last Friday. I wanted to try to improve the @Specialization(limit = "3") for doHashStorage() which I found a bit arbitrary, but I can't and I'm assuming there's a reason for the compiler to enforce this limit attribute being set for functions that rely on @CachedLibrary ; I could have set the limit to another value, but I preferred leaving it as is since I already don't understand it very well.

I also wanted to move AllOrAnyNode to another file/directory since as far as I know, BuiltinFunctions.java mostly only contains @Builtin annotated classes, but I suppose it wouldn't be the only exception to that rule.

In other news, Graal was indeed not up to date and mx sforceimports did the trick, I hadn't thought of that ; I had previously tried and failed with mx clean, though. Same for the benchmarks, I can now run them with graalpython ; I aim to get some working later this week for me to get an idea of what the speedup was.

msimacek · 2021-11-02T16:09:25Z

Regarding benchmarks, there are two major metrics we measure:

peak performance - the performance after the code has been running for some time and has been fully compiled
interpreter performance - the performance without the graal compiler, which approximates how it performs during the "warm up"

We care a lot about peak performance because that's how we differentiate from CPython, but interpreter performance is also important so that programs don't take minutes to start up. Ideally, you would benchmark both. The default configuration measures peak performance. For benchmarking interpreter performance, you can run a benchmark like this: mx --dy /compiler benchmark micro-small:name_of_your_benchmark -- --python-vm-config interpreter.
AFAIK we don't have any benchmarks that would use all or any, so if you want to measure the improvements you did, you need to write your own benchmark (to run via mx, you need to add an entry for it in mx.graalpython/mx_graalpython_bench_param.py).

msimacek · 2021-11-03T08:47:05Z

I ran the CI gates and there were some gate failures:

JUnit tests for all failed. You can execute the failed suite using mx punittest --no-leak-tests com.oracle.graal.python.test.builtin.BuiltinFunctionTests. Since the builtins are now more complex, it might be a good idea to also add more tests to cover more specializations.
The style gate failed due to formatting. Please run mx python-style --fix --no-spotbugs

OctaveLarose · 2021-11-03T11:07:24Z

Fixed the tests, there was a big oversight in all's behavior, sorry ; should be functional now (the tests do pass on my machine).

For the formatting, though, I'm not sure. mx python-style --fix --no-spotbugs doesn't seem to have turned up anything on my end and I can't see what I changed to the python code that could have broken it ; I did remove the all and any python functions, but I did make sure to keep the number of line breaks between the previous and next functions just the same as before. I also suspected the newline at the end of the file, but that's PEP 8 stuff and I don't believe I added one that wasn't already here before. Are CRLF/LF issues a possibility? I'm on Ubuntu LTS 20.04, if relevant.

so if you want to measure the improvements you did, you need to write your own benchmark (to run via mx, you need to add an entry for it in mx.graalpython/mx_graalpython_bench_param.py).

Thanks! This should save me some searching.

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

…r lists and no error messages/correct behavior for invalid inputs Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

…ts/sets/tuples as well Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

…ation via shouldStopIteration() Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

…changed parent node class Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

… class to PNodeWithContext Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

…formatting Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

OctaveLarose · 2021-11-03T11:09:46Z

I rebased it on this repo's main branch again to make absolutely sure it wasn't the formatting issue didn't only happen when completely up to date, but this doesn't seem to have been the case.

msimacek · 2021-11-03T12:54:48Z

Ah, sorry, I realized what the problem is with the formatter - it uses eclipse formatter and it needs to have eclipse installed, otherwise it only checks licenses. Please download eclipse (we use version 4.14 for the formatter, so it's probably a good idea to get the same), extract it somewhere and set environment variable ECLIPSE_EXE to the path of the eclipse binary. Then rerun the formatter.

Additionally, you can get the formatting in your IDE. If you're using Eclipse it should work already. If you're using Intellij IDEA you can install Eclipse Formatter Plugin and rerun mx intellijinit, it should configure itself.

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

OctaveLarose · 2021-11-03T13:41:52Z

Ah, because of the name of the command, I was assuming it was purely for .py files. Anyway, I've ran the formatter using Eclipse v. 4.14 and should have fixed the issue

msimacek

I forwarded some comments that I got from the internal review. There were some suggestions regarding loop profiling, which I marked as optional, because we're missing loop profiling in a lot of other places. Loop profiling is done to help the compiler have more accurate statistics about which code parts are "hotter" to guide other optimizations. I don't know relevant it is to your thesis, so I'm leaving it up to you if you want to deal with that.

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

…very all/any specialization method Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

msimacek · 2021-11-11T15:58:44Z

You have a bug somewhere:

>>> any([False, True])
False

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

OctaveLarose · 2021-11-12T10:59:32Z

You have a bug somewhere:
>>> any([False, True])
False

Thanks for the heads up, corrected in fa07d30 - was an oversight on my part.

PullRequest: graalpython/2011

msimacek reviewed Oct 14, 2021

View reviewed changes

smarr reviewed Oct 22, 2021

View reviewed changes

msimacek reviewed Oct 22, 2021

View reviewed changes

msimacek reviewed Oct 29, 2021

View reviewed changes

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java Outdated Show resolved Hide resolved

graalvmbot added the oca-signed label Oct 29, 2021

Ck1129 approved these changes Oct 29, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

OctaveLarose force-pushed the master branch from 5cd5ae6 to 2ed928a Compare October 29, 2021 13:22

OctaveLarose added 7 commits November 3, 2021 11:08

Non functional Truffle implementation of "all" builtin - test

d559e88

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

all() and any() builtin functions implemented in GraalPython. Only fo…

334be4c

…r lists and no error messages/correct behavior for invalid inputs Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Temporarily removed any(), all() implementation now functional on dic…

2bdbfa6

…ts/sets/tuples as well Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

all() now throws correctly if given a non-iterable object

d63e8b5

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Functional all() implementation for every iterable (famous last words)

0b3c21c

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Added any() builtin, using very similar logic to all()

a82ba4f

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Specializations on lists/sets/tuples/dicts for all/any

73ac20d

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

OctaveLarose added 10 commits November 3, 2021 11:08

Attempts to cache PyObjectIsTrueNode and LenNode, reduces code duplic…

8b6104a

…ation via shouldStopIteration() Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Removed nodeType field

4a8c67f

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Removed shouldStopIteration function by "inlining" it

3107467

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

AllOrAnySequenceStorageNode: Removed factory generation annotation + …

fb93d4c

…changed parent node class Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

AllOrAnyNode: added child PyObjectIsTrueNode attribute

784b19e

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Removed a broken import and an unused one in BuiltinFunctions

1828e55

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Removed unnecessary public qualifiers + changed AllOrAnyNode's parent…

13a3b26

… class to PNodeWithContext Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

mx checkstyle (had left a public keyword behind accidentally) + code …

80059f8

…formatting Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Shared annotation for AllOrAnyNode/GetClassNode instances

65bbacf

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Fixed all builtin behavior

0fb7459

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

OctaveLarose force-pushed the master branch from c87e904 to 0fb7459 Compare November 3, 2021 11:08

Ran Eclipse formatter for BuiltinFunctions changes

16b54b6

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

msimacek approved these changes Nov 8, 2021

View reviewed changes

OctaveLarose added 3 commits November 8, 2021 10:00

Removed public identifier on AllOrAnyNode + added static keyword to e…

aa7cde4

…very all/any specialization method Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Merged doSet and doDict specialization methods

5dd75dc

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Loop profiling for all/any related methods

de15e38

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

msimacek reviewed Nov 9, 2021

View reviewed changes

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java Show resolved Hide resolved

msimacek reviewed Nov 9, 2021

View reviewed changes

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java Outdated Show resolved Hide resolved

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java Outdated Show resolved Hide resolved

OctaveLarose added 2 commits November 10, 2021 09:58

Fixed infinite loops + profiling for HashingStorage

522e6ed

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Fixed loop profiling for doObject in all/any implementations

bceb24a

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

steve-s reviewed Nov 10, 2021

View reviewed changes

...n/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java Outdated Show resolved Hide resolved

loopConditionProfile.profileCounted() calls instead of just profile()

66661b4

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

Fixed doBoolSequence() skipping some iterable elements

fa07d30

Signed-off-by: Octave Larose <octave.larose@hotmail.fr>

graalvmbot pushed a commit that referenced this pull request Nov 16, 2021

[GR-34782] GitHub PR #239: intrinsify all and any builtins

a7a4bf8

PullRequest: graalpython/2011

graalvmbot merged commit fa07d30 into oracle:master Nov 16, 2021

all/any builtins implemented using Truffle #239

all/any builtins implemented using Truffle #239

Uh oh!

Conversation

OctaveLarose commented Oct 13, 2021

Uh oh!

graalvmbot commented Oct 14, 2021

Uh oh!

msimacek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msimacek commented Oct 14, 2021

Uh oh!

OctaveLarose commented Oct 14, 2021

Uh oh!

OctaveLarose commented Oct 22, 2021

Uh oh!

smarr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msimacek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msimacek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

graalvmbot commented Oct 29, 2021

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

msimacek commented Oct 29, 2021

Uh oh!

OctaveLarose commented Oct 29, 2021

Uh oh!

msimacek commented Oct 29, 2021

Uh oh!

OctaveLarose commented Nov 1, 2021

Uh oh!

msimacek commented Nov 2, 2021

Uh oh!

msimacek commented Nov 3, 2021

Uh oh!

OctaveLarose commented Nov 3, 2021

Uh oh!

OctaveLarose commented Nov 3, 2021

Uh oh!

msimacek commented Nov 3, 2021

Uh oh!

OctaveLarose commented Nov 3, 2021

Uh oh!

msimacek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!