Add safepoints to standard libraries Java polyglot helpers #7183

radeusgd · 2023-07-03T11:03:13Z

Pull Request Description

Closes #7129

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

radeusgd · 2023-07-03T11:05:51Z

std-bits/table/src/main/java/org/enso/table/data/index/MultiValueKeyBase.java

  /** Checks if all cells in the current row are missing. */
  public boolean areAllNull() {
+    Context context = Context.getCurrent();
    for (Storage<?> storage : storages) {
      if (!storage.isNa(rowIndex)) {
        return false;
      }
+
+      context.safepoint();
    }
    return true;
  }


In e194b62 I'm adding safepoints also to some loops in MultiValueKeys that are bounded by the number of columns.

These are rather tight loops, so I'm not sure if adding the safepoints here is worth it. But technically we could have tables with 100s of columns where this could start being useful.

(There are other bottlenecks with tables with so many columns though, so such big tables will not work very smoothly without some other improvements anyway, but maybe it's worth including at least this)

I'm a bit on the fence if this particular commit is worth it - @JaroslavTulach @jdunkerley what do you think about this?

radeusgd · 2023-07-03T11:36:06Z

std-bits/table/src/main/java/org/enso/table/data/column/storage/SpecializedStorage.java

@@ -139,13 +151,15 @@ public SpecializedStorage<T> slice(int offset, int limit) {

  @Override
  public SpecializedStorage<T> slice(List<SliceRange> ranges) {
+    Context context = Context.getCurrent();
    int newSize = SliceRange.totalLength(ranges);
    T[] newData = newUnderlyingArray(newSize);
    int offset = 0;
    for (SliceRange range : ranges) {
      int length = range.end() - range.start();
      System.arraycopy(data, range.start(), newData, offset, length);


Actually, the amount of ranges will likely not be that big in comparison to their totalLength.

So to make this method interruptible, we should actually need to be able to interrupt System.arraycopy while its operating.

This pattern repeats in many places throughought our Storage implementation as it tries to use arraycopy etc. to be efficient.

To make it interruptible we would have to create our own interruptibleCopy which would essentially be just a for loop copying from one array to another, hitting the safepoint() on each iteration.

However, I'm slightly worried that it may bring down the performance - I believe arraycopy can be faster than a regular for loop. But maybe I'm wrong and it is as fast? I guess we may have to do some benchmarking to find out - but I'm not sure if that is in the scope of this PR.

@jdunkerley what do you think? How much should we spend on this?

I think the gains from the existing safepoints will already be substantial, but indeed we may need to go a bit deeper to ensure that all operations that are bounded by row count are interruptible.

I think how big of a priority this is depends on:

How much of a pain this still is after this PR,

What will be the conclusion of the discussion on alternative and more general solutions (are safepoints 'temporary' and we plan to adapt some more general solution (if at all possible) or do we need to make-do with justsafepoints).

jdunkerley

What's the impact of calling these functions on every single iteration?

I feel like it will have a significant impact on performance.

jdunkerley · 2023-07-03T13:51:29Z

std-bits/base/src/main/java/org/enso/base/Text_Utils.java

@@ -94,6 +96,8 @@ public static List<String> split_on_lines(String str, boolean keep_endings) {
      } else {
        currentPos += 1;
      }
+
+      context.safepoint();


What's the impact on checking this on every single character?

Feels like this must have a significant detrimental affect?
On text processing like this every 10,000 characters would suffice or similar.

There is a Techniques and Applications for Guest-Language Safepoints paper that claims that the guest language safe point check has zero performance overhead.

That may be true, but this code is executed as hosted Java code, not a guest language code.

radeusgd · 2023-07-03T18:42:42Z

How does this affect performance?

My initial testing showed that the overhead is pretty negligible.

Since asked, I have done some follow up measurements, especially focusing on split_on_lines.

Raw data for the report are here:
safepoint bench 2.txt

Code used for running the benchmark.

from Standard.Base import all

from Standard.Test import Bench

polyglot java import org.enso.base.Text_Utils

main =
    safepoints = Text_Utils.safepoints_enabled
    IO.println "Benchmarking: safepoints_enabled="+safepoints.to_text

    n = 10^8
    iter_size = 1
    num_iters = 100

    string = 'a\n' * (n . div 2)

    IO.println "Length Pre-Computation"
    Bench.measure (string.length) "length-first" 1 1

    IO.println "Length Cached"
    Bench.measure (string.length) "length-second" 1 1

    IO.println "Measuring split performance:"
    Bench.measure (string.lines) "split_on_lines" iter_size num_iters

I've tried 3 approaches:

safepoints removed,
safepoints present as in this PR,
running context.safepoint() every 8k iterations (re-using an existing counter and checking its value using bitwise operators (I assume (x & 8191) == 0 is faster than x % 8000 == 0).

The findings show that there is actually some overhead, but I think it is rather reasonably small.

radeusgd · 2023-07-03T18:56:36Z

My interpretation of the data above:

Cost of safepoints

Analysis

A more complex example, measured with more care shows (quite expectedly, I was surprised by these initial measurements), that there is some overhead.

How significant it is - is in my opinion a bit hard to say.

Computing the averages for safepoints / no safepoints, we can see that the raw average running time is worse by 7% with safepoints on.

That is a significant figure, not awful but not that great.

Looking from another perspective at the data distributions - we can see that indeed the safepoint code is generally a bit slower than 'raw' code. However, in some run-times it achieves speeds just as fast as the code without safepoints. Looking at the outliers - we can see that both safepoints and no-safepoints code actually have very similar min and max run-times. Still, the code with safepoints tends to be a bit more leaning towards longer run-times than one without them.

Summary

So we can see that there is definitely some cost, but looking at the variability of the run-times overall, it is not very large.

Whether it is a 'price' we are willing to pay for interruptability is a tough question and one that I don't feel fully qualified to answer.

In fact, we probably would need to measure and compare many more operations as this is just one example and it is hart to say how representative it is. But that is a process that can take some time. Possibly worth considering as further next steps, not necesarily urgent.

In my personal opinion, based just on the limited data we can see, the benefit of interactivity is worth the drop in performance (since it is relatively small). Enso is an interactive product and we want to allow the user to change stuff and quickly see how it refreshes. Blocking the IDE for many seconds by some non-interruptible computation is going to be much more visible and painful to the user than things taking slightly more time to compute (which also depends on many many more other factors).

This also highlights, that we may need to consider some better solutions to interruptability other than safepoints, as I already suggested in my comment on the issue. Ideally we should achieve interruptability of all operations (including ones that are still blocking now - like arraycopy, OpenCV and JDBC calls and many more) preserving as good performance as possible - but how much of that is possible - I have not enough information yet.

radeusgd · 2023-07-03T19:02:58Z

Safepoints run every n-th iteration

I ran a third benchmark based on a hypothesis discussed with @jdunkerley that we could try limiting the cost of safepoints by only running the safepoint every few iterations - ~8k iterations will usually take very short time to complete so the 'latency' of waiting for the nth iteration will still be small, making it still possible to cancel computations relatively easily, but hypothetically reducing the overhead.

To do so I used the following diff:

diff --git a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
index fbba81e8c..d14655c7c 100644
--- a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
+++ b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
@@ -65,6 +65,10 @@ public class Text_Utils {
     return str.codePoints().toArray();
   }

+  public static boolean safepoints_enabled() {
+    return true;
+  }
+
   /**
    * Splits the string on each occurrence of UTF-8 vertical whitespace, returning the resulting
    * substrings in an array.
@@ -97,7 +101,9 @@ public class Text_Utils {
         currentPos += 1;
       }

-      context.safepoint();
+      if ((currentPos & 8191) == 0) {
+        context.safepoint();
+      }
     }

     if (currentStart < length) {

I assumed that currentPos & 8191 will be faster to compute than currentPos % C.

Looking at the data above, we can see that this hypothesis did not stand the practical test. Actually, the run "Safepoints & 8192" was slowest, with average time as slow as the highest outlier of the other two runs.

I think this shows that the best we can do is rely on the JVM handling context.safepoint() very efficiently and that we are not able to improve it well with our tricks. I assume the JVM may "know" about this special method and apply some special optimizations that are simply not used with our check (I think the most costly part here is the conditional jump that is present if a safepoint is hit - but our check still has such a conditional jump on every iteration, with less profiling information than the "safepoint" one; and we also need to compute some bitwise operation - which is relatively cheap but apparently more costly than some 'flag' check that the context.safepoint() operation seems to boil down to).

Akirathan

As noted in my comment on this PR - there is a Techniques and Applications for Guest-Language Safepoints paper that claims that safepoint polling has zero performance overhead and statistically insignificant compilation overhead. Therefore, it is highly probably that the 7% average performance regression has a different meaning. Note that our Bench.measure infrastructure is not sophisticated enough to properly differentiate between warmup and measurement runs. As far as I understand, it is just a simple tool that should be able to reveal significant performance regressions. I would not add any value to 7% average regression that also includes warmup runs.

radeusgd · 2023-07-04T12:12:01Z

As noted in my comment on this PR - there is a Techniques and Applications for Guest-Language Safepoints paper that claims that safepoint polling has zero performance overhead and statistically insignificant compilation overhead. Therefore, it is highly probably that the 7% average performance regression has a different meaning. Note that our Bench.measure infrastructure is not sophisticated enough to properly differentiate between warmup and measurement runs. As far as I understand, it is just a simple tool that should be able to reveal significant performance regressions.

Thanks for this clarification and the linked paper.

I would not add any value to 7% average regression that also includes warmup runs.

I guess for peak performance you are right, but we probably should at least partially still consider this - warmup time is actually pretty important too in Enso in situations where we are interactively editing code on possibly smaller datasets there may be not enough time to reach peak performance and the apparent performance 'felt' by the user will be influenced by the not-warmed-up performance.

Of course, the non-peak performance will likely be influenced by many things, so it is hard to judge the overall effect without some practical measurements. But the point is - peak performance is not the only metric we need to be aware of.

radeusgd · 2023-07-04T15:41:27Z

I tried using SafepointALot to have some measurements to compare.

The results are... surprising. It seems that the average safepoint interval has increased whereas I'd expect a decrease.

Results

Base After PR

[info] [2023-07-04T15:35:38.728Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     66389 |         6150,452 us           1,4 us   153659890,7 us
  -------------------------------------------------------------------------------------
   All threads              66389 |         6150,452 us           1,4 us   153659890,7 us

[info] [2023-07-04T15:35:38.733Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 140713246 |            2,888 us           0,4 us     3404336,0 us
  -------------------------------------------------------------------------------------
   All threads          140713246 |            2,888 us           0,4 us     3404336,0 us

Table After PR

[info] [2023-07-04T15:20:06.325Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     39971 |        13322,735 us           1,3 us   387881897,5 us
  -------------------------------------------------------------------------------------
   All threads              39971 |        13322,735 us           1,3 us   387881897,5 us

[info] [2023-07-04T15:20:06.326Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 107518123 |            4,974 us           0,5 us     7729662,0 us
  -------------------------------------------------------------------------------------
   All threads          107518123 |            4,974 us           0,5 us     7729662,0 us

Base Before PR

[info] [2023-07-04T14:56:53.186Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     67508 |         5875,439 us           1,1 us   136308064,6 us
  -------------------------------------------------------------------------------------
   All threads              67508 |         5875,439 us           1,1 us   136308064,6 us

[info] [2023-07-04T14:56:53.186Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 141397961 |            2,798 us           0,4 us     4246604,0 us
  -------------------------------------------------------------------------------------
   All threads          141397961 |            2,798 us           0,4 us     4246604,0 us

Table Before PR

[info] [2023-07-04T14:49:11.72Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                  85200632 |            5,659 us           0,5 us     4705853,6 us
  -------------------------------------------------------------------------------------
   All threads           85200632 |            5,659 us           0,5 us     4705853,6 us

[info] [2023-07-04T14:49:11.719Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     41981 |        11473,074 us           1,2 us   349079578,2 us
  -------------------------------------------------------------------------------------
   All threads              41981 |        11473,074 us           1,2 us   349079578,2 us

It may be because I was doing measurements on my laptop while running a browser and some other software which could influence some timing results. If we think it is necessary, we can try measuring again on a 'clean' machine. That is however not the main metric for this PR I think, so I will just leave it for now unless further steps are needed.

JaroslavTulach

I have no problem with spreading Context.getCurrent().safepoint() thru-out the code base, if it helps something.

Personally I'd first invest in measuring infrastructure and only then change the codebase to have a real proof we did improve something.

This reverts commit dde8cd5.

…rser thread is stopped (pt. 1)

We may revisit it if oracle/graal#6931 is implemented.

This reverts commit 72f9f11.

This reverts commit bf4df2f.

This reverts commit b1d2fe9.

radeusgd · 2023-07-05T12:54:39Z

Personally I'd first invest in measuring infrastructure and only then change the codebase to have a real proof we did improve something.

I think it is a good idea and I'll be very happy to see such a measurement tool implemented.

I think we can check the measurements retoractively though - we can merge this PR and once the tool is created we can temporarily revert it to do a comparative testing.

I have no problem with spreading Context.getCurrent().safepoint() thru-out the code base, if it helps something.

Right. Upon request of @jdunkerley I did some further testing. I modified slightly the Colorado COVID example - I enlarged the table 5x (by appending it to itself) to make the computation a bit heavier to better show the latency and I added a dropdown A - B - C allowing to select a multiplier 1 - 3 - 5 that is applied to the case count, causing the interesting operations (*, cross_tab, aggregate, join, order_by) to be recomputed.

I will compare the latency of selecting A/B/C and how fast the 1/3/5 shows up in the visualization - the time it takes to update this visualization is the time wasted in pending computations that should be cancelled.

Here is the version without safepoints:

CC-no-safepoints.mp4

For example in 1:29 I switch from 'A' to 'C'. The 1 changes into 5 at 1:44, meaning that we just lost 15 seconds in a computation that was to be thrown away anyway.

Now, let's see with safepoints:

CC-with-safepoints.mp4

For example in 0:30 I switch 'C' to 'B'. The switch from 5 to 3 happens in the same second. The latency is now minimal. Surely it's not because the computation of the heavy functions is not running, because we can see that the visualization at the bottom is yet to update - due to the often switching of the parameters, it did not get a chance to complete yet (this would be better visible if pending computations were visualized better in the GUI).

radeusgd · 2023-07-05T14:07:49Z

I also measured the runtime of the base Colorado COVID example in the CLI, the results from 3 runs are as follows:

It seems that with safepoints, the average runtime is about 3% slower.

radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 3, 2023

radeusgd self-assigned this Jul 3, 2023

radeusgd commented Jul 3, 2023

View reviewed changes

radeusgd marked this pull request as ready for review July 3, 2023 11:37

radeusgd requested review from jdunkerley, GregoryTravis, 4e6, JaroslavTulach, hubertp and Akirathan as code owners July 3, 2023 11:37

Base automatically changed from wip/radeusgd/6921-database-add-row-number to develop July 3, 2023 11:51

radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from f0a8784 to d39038a Compare July 3, 2023 12:10

jdunkerley approved these changes Jul 3, 2023

View reviewed changes

radeusgd mentioned this pull request Jul 3, 2023

Make sure long running Table operations are interruptible #7129

Closed

3 tasks

Akirathan approved these changes Jul 4, 2023

View reviewed changes

radeusgd mentioned this pull request Jul 4, 2023

Ability to check if a context is entered without throwing exceptions oracle/graal#6931

Open

radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from 55e3c34 to 51a6901 Compare July 4, 2023 12:29

radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from 51a6901 to ed8efc5 Compare July 4, 2023 15:51

JaroslavTulach approved these changes Jul 5, 2023

View reviewed changes

radeusgd added 5 commits July 5, 2023 12:06

Text_Utils

b903050

other std-base

771ed9e

other std-base 2

4ca1588

std-image

7896b82

std-table aggregations

b4647f7

radeusgd added 21 commits July 5, 2023 12:06

std-table storage ops

d58f369

add missing dep

c5a36e6

std-table index, table

2097a3a

std-table join

1c8e4d4

std-table excel

1fe5a0a

std-table MultiValueKey - should we safepoint here?

bf4df2f

std-table followup

72f9f11

std-table parsing

f6cb333

std-table read

aa93f1d

std-table bitset util

ff389e9

std-table write

b1d2fe9

formatting

85e7f77

javafmt

ac09249

Revert "formatting"

d329598

This reverts commit dde8cd5.

fix a note from previous PR

7e189ea

remove safepoint from code called in another thread; make sure the pa…

8764528

…rser thread is stopped (pt. 1)

Adapt DelimitedReader to more precisely manage the parser thread.

c9d0b5b

Remove commented-out safepoint from ReportingStreamDecoder for now.

d6e7760

We may revisit it if oracle/graal#6931 is implemented.

Revert "std-table followup"

68c5c0b

This reverts commit 72f9f11.

Revert "std-table MultiValueKey - should we safepoint here?"

71b4563

This reverts commit bf4df2f.

Revert "std-table write"

bef4b7d

This reverts commit b1d2fe9.

radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from ed8efc5 to bef4b7d Compare July 5, 2023 11:29

javafmt

0587a8b

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jul 5, 2023

mergify bot merged commit 78545b4 into develop Jul 5, 2023

mergify bot deleted the wip/radeusgd/7129-java-helpers-interruptible branch July 5, 2023 14:12

jdunkerley mentioned this pull request Jul 11, 2023

Node pending status does not show/propagate correctly when recalculating #7021

Closed

radeusgd mentioned this pull request Jan 23, 2024

Add parser for line by line processing #8719

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add safepoints to standard libraries Java polyglot helpers #7183

Add safepoints to standard libraries Java polyglot helpers #7183

radeusgd commented Jul 3, 2023 •

edited

Loading

radeusgd Jul 3, 2023

radeusgd Jul 3, 2023

jdunkerley left a comment

jdunkerley Jul 3, 2023

Akirathan Jul 4, 2023

JaroslavTulach Jul 5, 2023

radeusgd commented Jul 3, 2023 •

edited

Loading

radeusgd commented Jul 3, 2023

radeusgd commented Jul 3, 2023

Akirathan left a comment

radeusgd commented Jul 4, 2023

radeusgd commented Jul 4, 2023

Base After PR

Table After PR

Base Before PR

Table Before PR

JaroslavTulach left a comment

radeusgd commented Jul 5, 2023 •

edited

Loading

radeusgd commented Jul 5, 2023

Add safepoints to standard libraries Java polyglot helpers #7183

Add safepoints to standard libraries Java polyglot helpers #7183

Conversation

radeusgd commented Jul 3, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd Jul 3, 2023

Choose a reason for hiding this comment

radeusgd Jul 3, 2023

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

jdunkerley Jul 3, 2023

Choose a reason for hiding this comment

Akirathan Jul 4, 2023

Choose a reason for hiding this comment

JaroslavTulach Jul 5, 2023

Choose a reason for hiding this comment

radeusgd commented Jul 3, 2023 • edited Loading

radeusgd commented Jul 3, 2023

Cost of safepoints

Analysis

Summary

radeusgd commented Jul 3, 2023

Safepoints run every n-th iteration

Akirathan left a comment

Choose a reason for hiding this comment

radeusgd commented Jul 4, 2023

radeusgd commented Jul 4, 2023

Base After PR

Table After PR

Base Before PR

Table Before PR

JaroslavTulach left a comment

Choose a reason for hiding this comment

radeusgd commented Jul 5, 2023 • edited Loading

radeusgd commented Jul 5, 2023

radeusgd commented Jul 3, 2023 •

edited

Loading

radeusgd commented Jul 3, 2023 •

edited

Loading

radeusgd commented Jul 5, 2023 •

edited

Loading