Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safepoints to standard libraries Java polyglot helpers #7183

Merged
merged 29 commits into from
Jul 5, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Jul 3, 2023

Pull Request Description

Closes #7129

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 3, 2023
@radeusgd radeusgd self-assigned this Jul 3, 2023
Comment on lines 33 to 40
/** Checks if all cells in the current row are missing. */
public boolean areAllNull() {
Context context = Context.getCurrent();
for (Storage<?> storage : storages) {
if (!storage.isNa(rowIndex)) {
return false;
}

context.safepoint();
}
return true;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e194b62 I'm adding safepoints also to some loops in MultiValueKeys that are bounded by the number of columns.

These are rather tight loops, so I'm not sure if adding the safepoints here is worth it. But technically we could have tables with 100s of columns where this could start being useful.

(There are other bottlenecks with tables with so many columns though, so such big tables will not work very smoothly without some other improvements anyway, but maybe it's worth including at least this)

I'm a bit on the fence if this particular commit is worth it - @JaroslavTulach @jdunkerley what do you think about this?

@@ -139,13 +151,15 @@ public SpecializedStorage<T> slice(int offset, int limit) {

@Override
public SpecializedStorage<T> slice(List<SliceRange> ranges) {
Context context = Context.getCurrent();
int newSize = SliceRange.totalLength(ranges);
T[] newData = newUnderlyingArray(newSize);
int offset = 0;
for (SliceRange range : ranges) {
int length = range.end() - range.start();
System.arraycopy(data, range.start(), newData, offset, length);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the amount of ranges will likely not be that big in comparison to their totalLength.

So to make this method interruptible, we should actually need to be able to interrupt System.arraycopy while its operating.

This pattern repeats in many places throughought our Storage implementation as it tries to use arraycopy etc. to be efficient.

To make it interruptible we would have to create our own interruptibleCopy which would essentially be just a for loop copying from one array to another, hitting the safepoint() on each iteration.

However, I'm slightly worried that it may bring down the performance - I believe arraycopy can be faster than a regular for loop. But maybe I'm wrong and it is as fast? I guess we may have to do some benchmarking to find out - but I'm not sure if that is in the scope of this PR.

@jdunkerley what do you think? How much should we spend on this?

I think the gains from the existing safepoints will already be substantial, but indeed we may need to go a bit deeper to ensure that all operations that are bounded by row count are interruptible.

I think how big of a priority this is depends on:

  1. How much of a pain this still is after this PR,
  2. What will be the conclusion of the discussion on alternative and more general solutions (are safepoints 'temporary' and we plan to adapt some more general solution (if at all possible) or do we need to make-do with justsafepoints).

@radeusgd radeusgd marked this pull request as ready for review July 3, 2023 11:37
Base automatically changed from wip/radeusgd/6921-database-add-row-number to develop July 3, 2023 11:51
@radeusgd radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from f0a8784 to d39038a Compare July 3, 2023 12:10
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact of calling these functions on every single iteration?

I feel like it will have a significant impact on performance.

@@ -94,6 +96,8 @@ public static List<String> split_on_lines(String str, boolean keep_endings) {
} else {
currentPos += 1;
}

context.safepoint();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact on checking this on every single character?

Feels like this must have a significant detrimental affect?
On text processing like this every 10,000 characters would suffice or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a Techniques and Applications for Guest-Language Safepoints paper that claims that the guest language safe point check has zero performance overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be true, but this code is executed as hosted Java code, not a guest language code.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 3, 2023

How does this affect performance?

My initial testing showed that the overhead is pretty negligible.

Since asked, I have done some follow up measurements, especially focusing on split_on_lines.

Raw data for the report are here:
safepoint bench 2.txt

Code used for running the benchmark.
from Standard.Base import all

from Standard.Test import Bench

polyglot java import org.enso.base.Text_Utils

main =
    safepoints = Text_Utils.safepoints_enabled
    IO.println "Benchmarking: safepoints_enabled="+safepoints.to_text

    n = 10^8
    iter_size = 1
    num_iters = 100

    string = 'a\n' * (n . div 2)

    IO.println "Length Pre-Computation"
    Bench.measure (string.length) "length-first" 1 1

    IO.println "Length Cached"
    Bench.measure (string.length) "length-second" 1 1

    IO.println "Measuring split performance:"
    Bench.measure (string.lines) "split_on_lines" iter_size num_iters

I've tried 3 approaches:

  • safepoints removed,
  • safepoints present as in this PR,
  • running context.safepoint() every 8k iterations (re-using an existing counter and checking its value using bitwise operators (I assume (x & 8191) == 0 is faster than x % 8000 == 0).

The findings show that there is actually some overhead, but I think it is rather reasonably small.

image
image

@radeusgd
Copy link
Member Author

radeusgd commented Jul 3, 2023

My interpretation of the data above:

Cost of safepoints

Analysis

A more complex example, measured with more care shows (quite expectedly, I was surprised by these initial measurements), that there is some overhead.

How significant it is - is in my opinion a bit hard to say.

Computing the averages for safepoints / no safepoints, we can see that the raw average running time is worse by 7% with safepoints on.

That is a significant figure, not awful but not that great.

Looking from another perspective at the data distributions - we can see that indeed the safepoint code is generally a bit slower than 'raw' code. However, in some run-times it achieves speeds just as fast as the code without safepoints. Looking at the outliers - we can see that both safepoints and no-safepoints code actually have very similar min and max run-times. Still, the code with safepoints tends to be a bit more leaning towards longer run-times than one without them.

Summary

So we can see that there is definitely some cost, but looking at the variability of the run-times overall, it is not very large.

Whether it is a 'price' we are willing to pay for interruptability is a tough question and one that I don't feel fully qualified to answer.

In fact, we probably would need to measure and compare many more operations as this is just one example and it is hart to say how representative it is. But that is a process that can take some time. Possibly worth considering as further next steps, not necesarily urgent.

In my personal opinion, based just on the limited data we can see, the benefit of interactivity is worth the drop in performance (since it is relatively small). Enso is an interactive product and we want to allow the user to change stuff and quickly see how it refreshes. Blocking the IDE for many seconds by some non-interruptible computation is going to be much more visible and painful to the user than things taking slightly more time to compute (which also depends on many many more other factors).

This also highlights, that we may need to consider some better solutions to interruptability other than safepoints, as I already suggested in my comment on the issue. Ideally we should achieve interruptability of all operations (including ones that are still blocking now - like arraycopy, OpenCV and JDBC calls and many more) preserving as good performance as possible - but how much of that is possible - I have not enough information yet.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 3, 2023

Safepoints run every n-th iteration

I ran a third benchmark based on a hypothesis discussed with @jdunkerley that we could try limiting the cost of safepoints by only running the safepoint every few iterations - ~8k iterations will usually take very short time to complete so the 'latency' of waiting for the nth iteration will still be small, making it still possible to cancel computations relatively easily, but hypothetically reducing the overhead.

To do so I used the following diff:

diff --git a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
index fbba81e8c..d14655c7c 100644
--- a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
+++ b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
@@ -65,6 +65,10 @@ public class Text_Utils {
     return str.codePoints().toArray();
   }

+  public static boolean safepoints_enabled() {
+    return true;
+  }
+
   /**
    * Splits the string on each occurrence of UTF-8 vertical whitespace, returning the resulting
    * substrings in an array.
@@ -97,7 +101,9 @@ public class Text_Utils {
         currentPos += 1;
       }

-      context.safepoint();
+      if ((currentPos & 8191) == 0) {
+        context.safepoint();
+      }
     }

     if (currentStart < length) {

I assumed that currentPos & 8191 will be faster to compute than currentPos % C.

Looking at the data above, we can see that this hypothesis did not stand the practical test. Actually, the run "Safepoints & 8192" was slowest, with average time as slow as the highest outlier of the other two runs.

I think this shows that the best we can do is rely on the JVM handling context.safepoint() very efficiently and that we are not able to improve it well with our tricks. I assume the JVM may "know" about this special method and apply some special optimizations that are simply not used with our check (I think the most costly part here is the conditional jump that is present if a safepoint is hit - but our check still has such a conditional jump on every iteration, with less profiling information than the "safepoint" one; and we also need to compute some bitwise operation - which is relatively cheap but apparently more costly than some 'flag' check that the context.safepoint() operation seems to boil down to).

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in my comment on this PR - there is a Techniques and Applications for Guest-Language Safepoints paper that claims that safepoint polling has zero performance overhead and statistically insignificant compilation overhead. Therefore, it is highly probably that the 7% average performance regression has a different meaning. Note that our Bench.measure infrastructure is not sophisticated enough to properly differentiate between warmup and measurement runs. As far as I understand, it is just a simple tool that should be able to reveal significant performance regressions. I would not add any value to 7% average regression that also includes warmup runs.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 4, 2023

As noted in my comment on this PR - there is a Techniques and Applications for Guest-Language Safepoints paper that claims that safepoint polling has zero performance overhead and statistically insignificant compilation overhead. Therefore, it is highly probably that the 7% average performance regression has a different meaning. Note that our Bench.measure infrastructure is not sophisticated enough to properly differentiate between warmup and measurement runs. As far as I understand, it is just a simple tool that should be able to reveal significant performance regressions.

Thanks for this clarification and the linked paper.

I would not add any value to 7% average regression that also includes warmup runs.

I guess for peak performance you are right, but we probably should at least partially still consider this - warmup time is actually pretty important too in Enso in situations where we are interactively editing code on possibly smaller datasets there may be not enough time to reach peak performance and the apparent performance 'felt' by the user will be influenced by the not-warmed-up performance.

Of course, the non-peak performance will likely be influenced by many things, so it is hard to judge the overall effect without some practical measurements. But the point is - peak performance is not the only metric we need to be aware of.

@radeusgd radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from 55e3c34 to 51a6901 Compare July 4, 2023 12:29
@radeusgd
Copy link
Member Author

radeusgd commented Jul 4, 2023

I tried using SafepointALot to have some measurements to compare.

The results are... surprising. It seems that the average safepoint interval has increased whereas I'd expect a decrease.

Results

Base After PR

[info] [2023-07-04T15:35:38.728Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     66389 |         6150,452 us           1,4 us   153659890,7 us
  -------------------------------------------------------------------------------------
   All threads              66389 |         6150,452 us           1,4 us   153659890,7 us

[info] [2023-07-04T15:35:38.733Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 140713246 |            2,888 us           0,4 us     3404336,0 us
  -------------------------------------------------------------------------------------
   All threads          140713246 |            2,888 us           0,4 us     3404336,0 us

Table After PR

[info] [2023-07-04T15:20:06.325Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     39971 |        13322,735 us           1,3 us   387881897,5 us
  -------------------------------------------------------------------------------------
   All threads              39971 |        13322,735 us           1,3 us   387881897,5 us

[info] [2023-07-04T15:20:06.326Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 107518123 |            4,974 us           0,5 us     7729662,0 us
  -------------------------------------------------------------------------------------
   All threads          107518123 |            4,974 us           0,5 us     7729662,0 us

Base Before PR

[info] [2023-07-04T14:56:53.186Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     67508 |         5875,439 us           1,1 us   136308064,6 us
  -------------------------------------------------------------------------------------
   All threads              67508 |         5875,439 us           1,1 us   136308064,6 us

[info] [2023-07-04T14:56:53.186Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                 141397961 |            2,798 us           0,4 us     4246604,0 us
  -------------------------------------------------------------------------------------
   All threads          141397961 |            2,798 us           0,4 us     4246604,0 us

Table Before PR

[info] [2023-07-04T14:49:11.72Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                  85200632 |            5,659 us           0,5 us     4705853,6 us
  -------------------------------------------------------------------------------------
   All threads           85200632 |            5,659 us           0,5 us     4705853,6 us

[info] [2023-07-04T14:49:11.719Z] [engine] Safepoint Statistics
  --------------------------------------------------------------------------------------
   Thread Name         Safepoints | Interval     Avg              Min              Max
  --------------------------------------------------------------------------------------
   main                     41981 |        11473,074 us           1,2 us   349079578,2 us
  -------------------------------------------------------------------------------------
   All threads              41981 |        11473,074 us           1,2 us   349079578,2 us

It may be because I was doing measurements on my laptop while running a browser and some other software which could influence some timing results. If we think it is necessary, we can try measuring again on a 'clean' machine. That is however not the main metric for this PR I think, so I will just leave it for now unless further steps are needed.

@radeusgd radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from 51a6901 to ed8efc5 Compare July 4, 2023 15:51
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with spreading Context.getCurrent().safepoint() thru-out the code base, if it helps something.

Personally I'd first invest in measuring infrastructure and only then change the codebase to have a real proof we did improve something.

@radeusgd radeusgd force-pushed the wip/radeusgd/7129-java-helpers-interruptible branch from ed8efc5 to bef4b7d Compare July 5, 2023 11:29
@radeusgd
Copy link
Member Author

radeusgd commented Jul 5, 2023

Personally I'd first invest in measuring infrastructure and only then change the codebase to have a real proof we did improve something.

I think it is a good idea and I'll be very happy to see such a measurement tool implemented.

I think we can check the measurements retoractively though - we can merge this PR and once the tool is created we can temporarily revert it to do a comparative testing.

I have no problem with spreading Context.getCurrent().safepoint() thru-out the code base, if it helps something.

Right. Upon request of @jdunkerley I did some further testing. I modified slightly the Colorado COVID example - I enlarged the table 5x (by appending it to itself) to make the computation a bit heavier to better show the latency and I added a dropdown A - B - C allowing to select a multiplier 1 - 3 - 5 that is applied to the case count, causing the interesting operations (*, cross_tab, aggregate, join, order_by) to be recomputed.

I will compare the latency of selecting A/B/C and how fast the 1/3/5 shows up in the visualization - the time it takes to update this visualization is the time wasted in pending computations that should be cancelled.

Here is the version without safepoints:

CC-no-safepoints.mp4

For example in 1:29 I switch from 'A' to 'C'. The 1 changes into 5 at 1:44, meaning that we just lost 15 seconds in a computation that was to be thrown away anyway.

Now, let's see with safepoints:

CC-with-safepoints.mp4

For example in 0:30 I switch 'C' to 'B'. The switch from 5 to 3 happens in the same second. The latency is now minimal. Surely it's not because the computation of the heavy functions is not running, because we can see that the visualization at the bottom is yet to update - due to the often switching of the parameters, it did not get a chance to complete yet (this would be better visible if pending computations were visualized better in the GUI).

@radeusgd
Copy link
Member Author

radeusgd commented Jul 5, 2023

I also measured the runtime of the base Colorado COVID example in the CLI, the results from 3 runs are as follows:

image

It seems that with safepoints, the average runtime is about 3% slower.

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jul 5, 2023
@mergify mergify bot merged commit 78545b4 into develop Jul 5, 2023
@mergify mergify bot deleted the wip/radeusgd/7129-java-helpers-interruptible branch July 5, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure long running Table operations are interruptible
4 participants