Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If cascades 5x slower than case-of due to limit of recursive inlining #5709

Closed
1 task done
Akirathan opened this issue Feb 20, 2023 · 18 comments · Fixed by #6255
Closed
1 task done

If cascades 5x slower than case-of due to limit of recursive inlining #5709

Akirathan opened this issue Feb 20, 2023 · 18 comments · Fixed by #6255
Assignees
Labels
--low-performance -compiler p-highest Should be completed ASAP s-research-needed Status: the task will require heavy research to complete

Comments

@Akirathan
Copy link
Member

Akirathan commented Feb 20, 2023

During investigation of #5687, we found out that if-then-else expressions are roughly 5x slower than their case-of equivalents.

Tasks

@Akirathan Akirathan self-assigned this Feb 20, 2023
@Akirathan Akirathan added this to the Beta Release milestone Feb 20, 2023
@enso-bot
Copy link

enso-bot bot commented Feb 20, 2023

Pavel Marek reports a new STANDUP for today (2023-02-20):

Progress: Created Enso benchmarks. Looking into the compilation logs from those benchmarks - there is too much clutter, compilation does not seem to get stable. Creating corresponding benchmarks in JMH. Finished other PR. It should be finished by 2023-02-22.

@Akirathan
Copy link
Member Author

Results from IfVsCaseBenchmarks in 343cf43:

bench_name iteration duration (ms)
ifBench3 8.132
ifBench6 14.672
caseBench3 1.332
caseBench6 2.7611

We can see that the slowdown of nested if-expressions is increasing with the number of nested ifs.

@enso-bot
Copy link

enso-bot bot commented Feb 21, 2023

Pavel Marek reports a new STANDUP for today (2023-02-21):

Progress: Created draft PR. Created a proper JMH benchmark. The performance comparison of if-then-else to case-of is clearly visible. Also tried to decide whether there is some ongoing Graal compilation in measurement phase. It should be finished by 2023-02-22.

@enso-bot
Copy link

enso-bot bot commented Feb 24, 2023

Pavel Marek reports a new 🔴 DELAY for today (2023-02-24):

Summary: There is 5 days delay in implementation of the If-then-else is 5x slower than equivalent case expression (#5709) task.
It will cause 5 days delay for the delivery of this weekly plan.

Delay Cause: Returning back to this issue after some time. Did other important issues meanwhile.

@enso-bot
Copy link

enso-bot bot commented Feb 24, 2023

Pavel Marek reports a new STANDUP for today (2023-02-24):

Progress: Compared compiler logs of if_bench vs case_bench and nothing suspicious found - only that code size of compilations from if_bench is slightly larger, but nothing suggesting that if_bench is 3x slower than case_bench. Sampler in VisualVM revealed nothing. Will continue with a brief look at IGV and fiddling with some profiles. It should be finished by 2023-02-27.

@jdunkerley jdunkerley added the s-research-needed Status: the task will require heavy research to complete label Feb 28, 2023
@Akirathan
Copy link
Member Author

During my work on #5845, I had some counterintuitive observations to this issue.

1.

Adding Meta.is_same_object self that to the beginning of Any.== (200x slower than original):

diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
index 6770205a1..70a3fecc9 100644
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
@@ -106,13 +106,13 @@ type Any
                  a == 147
     == : Any -> Boolean
     == self that =
-        # If there is No_Such_Conversion, then `self` and `that` are probably
-        # host or polyglot values, so we just compare them with the default comparator.
-        eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
-        eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
-        case Meta.is_same_object eq_self Incomparable of
-            True -> False
+        case Meta.is_same_object self that of
+            True -> True
             False ->
+                # If there is No_Such_Conversion, then `self` and `that` are probably
+                # host or polyglot values, so we just compare them with the default comparator.
+                eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
+                eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
                 similar_type = Meta.is_same_object eq_self eq_that
                 case similar_type of
                     False -> False

Even if I refactored Meta.is_same_object method to be a very simple comparison of references in Java. So Meta.is_same_object is definitely not the issue here.

2.

Extracting 3 nested case expressions to a different method (2x faster than 1):

diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
index 70a3fecc9..ce4939d7a 100644
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
@@ -108,21 +108,24 @@ type Any
     == self that =
         case Meta.is_same_object self that of
             True -> True
-            False ->
-                # If there is No_Such_Conversion, then `self` and `that` are probably
-                # host or polyglot values, so we just compare them with the default comparator.
-                eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
-                eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
-                similar_type = Meta.is_same_object eq_self eq_that
-                case similar_type of
-                    False -> False
-                    True ->
-                        case Meta.is_same_object eq_self Default_Comparator of
-                            True -> Comparable.equals_builtin self that
-                            False ->
-                                case eq_self.compare self that of
-                                    Ordering.Equal -> True
-                                    _ -> False
+            False -> self.my_eq that
+
+    my_eq self that =
+        # If there is No_Such_Conversion, then `self` and `that` are probably
+        # host or polyglot values, so we just compare them with the default comparator.
+        eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
+        eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
+        similar_type = Meta.is_same_object eq_self eq_that
+        case similar_type of
+            False -> False
+            True ->
+                case Meta.is_same_object eq_self Default_Comparator of
+                    True -> Comparable.equals_builtin self that
+                    False ->
+                        case eq_self.compare self that of
+                            Ordering.Equal -> True
+                            _ -> False

3.

Refactor the first case expression to an if expression:

diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
index 70a3fecc9..8edff6691 100644
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Any.enso
@@ -106,23 +106,21 @@ type Any
                  a == 147
     == : Any -> Boolean
     == self that =
-        case Meta.is_same_object self that of
-            True -> True
-            False ->
-                # If there is No_Such_Conversion, then `self` and `that` are probably
-                # host or polyglot values, so we just compare them with the default comparator.
-                eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
-                eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
-                similar_type = Meta.is_same_object eq_self eq_that
-                case similar_type of
-                    False -> False
-                    True ->
-                        case Meta.is_same_object eq_self Default_Comparator of
-                            True -> Comparable.equals_builtin self that
-                            False ->
-                                case eq_self.compare self that of
-                                    Ordering.Equal -> True
-                                    _ -> False
+        if Meta.is_same_object self that then True else
+            # If there is No_Such_Conversion, then `self` and `that` are probably
+            # host or polyglot values, so we just compare them with the default comparator.
+            eq_self = Panic.catch No_Such_Conversion (Comparable.from self) _-> Default_Comparator
+            eq_that = Panic.catch No_Such_Conversion (Comparable.from that) _-> Default_Comparator
+            similar_type = Meta.is_same_object eq_self eq_that
+            case similar_type of
+                False -> False
+                True ->
+                    case Meta.is_same_object eq_self Default_Comparator of
+                        True -> Comparable.equals_builtin self that
+                        False ->
+                            case eq_self.compare self that of
+                                Ordering.Equal -> True
+                                _ -> False

2x faster than 1.

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Apr 3, 2023

I've spent few more hours looking into the IGV graphs as generated by withDebug --dumpGraphs benchOnly -- IfVsCaseBenchmarks.ifBench3 and withDebug --dumpGraphs benchOnly -- IfVsCaseBenchmarks.caseBench3. When I get the graphs, I select TruffleIR::org_graalvm_polyglot_Value<Function>_execute(). Then I search for LoadField node loading fields - it is in both graphs.

obrazek

However when I inspect this hot piece of code for case and/or if statements, I don't see any difference! The generated nodes are the same for both graphs! But the execution times differ a lot! Now I found a difference, there is one OptimizedCallTarget.callBoundary at the deepest if/else line:

obrazek

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Apr 4, 2023

I created if_bench_1 and case_bench_1 - they seem to work fine and at the same speed. E.g. the cascade is the problem. After discussing this with Štěpán, we believe the PE hits a limit on number of inlined DirectNodeCalls. According to Štěpán, we could detect a cascade of if statements and extract the real Node and inline it directly rather than doing that via CallTarget. The rumor is that they do it in Python this way.

Moreover there are various Truffle compiler options and it turns out the culprit is in InliningRecursionDepth=2. If we set it higher, the ifBench3 gets faster:

diff --git engine/runtime/src/bench/java/org/enso/interpreter/bench/BenchmarksRunner.java engine/runtime/src/bench/java/org/enso/interpreter/bench/BenchmarksRunner.java
index 16954b0282..025f8bcbe6 100644
--- engine/runtime/src/bench/java/org/enso/interpreter/bench/BenchmarksRunner.java
+++ engine/runtime/src/bench/java/org/enso/interpreter/bench/BenchmarksRunner.java
@@ -33,7 +33,7 @@ public class BenchmarksRunner {
    */
   public BenchmarkItem run(String label) throws RunnerException, JAXBException {
     ChainedOptionsBuilder builder = new OptionsBuilder()
-      .jvmArgsAppend("-Xss16M", "-Dpolyglot.engine.MultiTier=false")
+      .jvmArgsAppend("-Xss16M", "-Dpolyglot.engine.MultiTier=false", "-Dpolyglot.engine.InliningRecursionDepth=10")
       .include("^" + label + "$");
 
     if (Boolean.getBoolean("bench.compileOnly")) {

e.g. looks like @steve-s is right. Our if cascades are slow due to a limit on recursive inlining.

@JaroslavTulach JaroslavTulach changed the title If-then-else is 5x slower than equivalent case expression If cascades 5x slower than equivalent case expression due to limit of recursive inlining Apr 5, 2023
@JaroslavTulach JaroslavTulach changed the title If cascades 5x slower than equivalent case expression due to limit of recursive inlining If cascades 5x slower than case-of due to limit of recursive inlining Apr 5, 2023
@JaroslavTulach
Copy link
Member

@kustosz suggests:

when you call if_then_else, don't use the normal dispatcher, but rather have a node that specializes on the bool case

@enso-bot
Copy link

enso-bot bot commented Apr 6, 2023

Pavel Marek reports a new 🔴 DELAY for today (2023-04-06):

Summary: There is 45 days delay in implementation of the If cascades 5x slower than case-of due to limit of recursive inlining (#5709) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: I have not worked on this issue for a long time. I expect roughly two more days on this issue, if everything goes smoothly.

@enso-bot
Copy link

enso-bot bot commented Apr 6, 2023

Pavel Marek reports a new STANDUP for today (2023-04-06):

Progress: Fiddling around with cloning a root node once IfThenElseNode detects that there is another Boolean.if_then_else call on the stack. The goal is to not have recursive calls of Boolean.if_then_else. Simplest possible solution would be to refactor IfThenElseNode to an ExpressionNode, but that would require modification to IR processing and would remove Boolean.if_then_else method. It should be finished by 2023-04-13.

@enso-bot
Copy link

enso-bot bot commented Apr 13, 2023

Jaroslav Tulach reports a new 🔴 DELAY for today (2023-04-13):

Summary: There is 4 days delay in implementation of the If cascades 5x slower than case-of due to limit of recursive inlining (#5709) task.
It will cause 4 days delay for the delivery of this weekly plan.

Some progress has been made, PR exists and improves something, but we need more...

Delay Cause: Not sure how I managed to set the final day to yesterday, when I only started to work on the issue yesterday?

Possible solutions: I need to investigate & try slightly different approach and see if it helps.

@enso-bot
Copy link

enso-bot bot commented Apr 13, 2023

Jaroslav Tulach reports a new STANDUP for yesterday (2023-04-12):

Progress: - Discussions about if_then_else: https://discord.com/channels/@me/944471092771295253/1095732209698209812

Next Day: 5x slower if cascade

Discord
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
GitHub
As we now have a built-in Map type we would like to add a literal format to create a map instance. There appears to have been old support within the old parser for a syntax: enso/engine/runtime/src...
GitHub
The context The PR #3764 introduced a calling convention that allows to write Foo.method (Mk_Foo 123) as a synonym of (Mk_Foo 123).method (just to note, the status of the static/instance calls befo...

@enso-bot
Copy link

enso-bot bot commented Apr 14, 2023

Jaroslav Tulach reports a new STANDUP for yesterday (2023-04-13):

Progress: - More than 10x speedup: #6255

Next Day: Check benchmarks and integrated #6255. Improve Any.== a bit

@mergify mergify bot closed this as completed in #6255 Apr 14, 2023
mergify bot pushed a commit that referenced this issue Apr 14, 2023
Fixes #5709. We have a test and a generic fix that improves inlining of every builtin. Everything seems to be faster.
@enso-bot
Copy link

enso-bot bot commented Apr 15, 2023

Jaroslav Tulach reports a new STANDUP for yesterday (2023-04-14):

Progress: - Got merged: if cascade: #6255 - still needs some post-mortem actions

Next Day: Polish EqualsNode for integration

Discord
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
GitHub
This subject came up as part of the discussion on a PR. It's trivial to replace that to throw a Panic but we seem to rely on that under some circumstances. I saw at least that we check type of Unre...

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Apr 15, 2023

Benchmark results of IfVsCaseBenchmarks in after this issue got integrated:

bench_name resulting duration (ms) original duration (ms)
ifBench3 0.523 8.132
ifBench6 1.274 14.672
caseBench3 0.556 1.332
caseBench6 0.801 2.7611

We can see that the original 5x slowdown is gone.

@enso-bot
Copy link

enso-bot bot commented Apr 16, 2023

Jaroslav Tulach reports a new STANDUP for yesterday (2023-04-15):

Progress: - Found problem with InteropLibrary: #6280 (comment)

Next Day: Polish EqualsNode for integration

@enso-bot
Copy link

enso-bot bot commented Apr 17, 2023

Jaroslav Tulach reports a new STANDUP for yesterday (2023-04-16):

Progress: - how to document code discussion: https://github.com/orgs/enso-org/discussions/6296#discussioncomment-5626754

Next Day: Review benchmarks for EqualsNode & meetings & discussions

GitHub
There are some various opinions about how we should approach documenting the knowledge related to our internal libraries, and there was a proposal to start a discussion, so I'm starting one. I'd li...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--low-performance -compiler p-highest Should be completed ASAP s-research-needed Status: the task will require heavy research to complete
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants