-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safepoints to standard libraries Java polyglot helpers #7183
Add safepoints to standard libraries Java polyglot helpers #7183
Conversation
/** Checks if all cells in the current row are missing. */ | ||
public boolean areAllNull() { | ||
Context context = Context.getCurrent(); | ||
for (Storage<?> storage : storages) { | ||
if (!storage.isNa(rowIndex)) { | ||
return false; | ||
} | ||
|
||
context.safepoint(); | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In e194b62 I'm adding safepoints also to some loops in MultiValueKeys that are bounded by the number of columns.
These are rather tight loops, so I'm not sure if adding the safepoints here is worth it. But technically we could have tables with 100s of columns where this could start being useful.
(There are other bottlenecks with tables with so many columns though, so such big tables will not work very smoothly without some other improvements anyway, but maybe it's worth including at least this)
I'm a bit on the fence if this particular commit is worth it - @JaroslavTulach @jdunkerley what do you think about this?
@@ -139,13 +151,15 @@ public SpecializedStorage<T> slice(int offset, int limit) { | |||
|
|||
@Override | |||
public SpecializedStorage<T> slice(List<SliceRange> ranges) { | |||
Context context = Context.getCurrent(); | |||
int newSize = SliceRange.totalLength(ranges); | |||
T[] newData = newUnderlyingArray(newSize); | |||
int offset = 0; | |||
for (SliceRange range : ranges) { | |||
int length = range.end() - range.start(); | |||
System.arraycopy(data, range.start(), newData, offset, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the amount of ranges
will likely not be that big in comparison to their totalLength
.
So to make this method interruptible, we should actually need to be able to interrupt System.arraycopy
while its operating.
This pattern repeats in many places throughought our Storage implementation as it tries to use arraycopy
etc. to be efficient.
To make it interruptible we would have to create our own interruptibleCopy
which would essentially be just a for
loop copying from one array to another, hitting the safepoint()
on each iteration.
However, I'm slightly worried that it may bring down the performance - I believe arraycopy
can be faster than a regular for loop. But maybe I'm wrong and it is as fast? I guess we may have to do some benchmarking to find out - but I'm not sure if that is in the scope of this PR.
@jdunkerley what do you think? How much should we spend on this?
I think the gains from the existing safepoints will already be substantial, but indeed we may need to go a bit deeper to ensure that all operations that are bounded by row count are interruptible.
I think how big of a priority this is depends on:
- How much of a pain this still is after this PR,
- What will be the conclusion of the discussion on alternative and more general solutions (are safepoints 'temporary' and we plan to adapt some more general solution (if at all possible) or do we need to make-do with justsafepoints).
f0a8784
to
d39038a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact of calling these functions on every single iteration?
I feel like it will have a significant impact on performance.
@@ -94,6 +96,8 @@ public static List<String> split_on_lines(String str, boolean keep_endings) { | |||
} else { | |||
currentPos += 1; | |||
} | |||
|
|||
context.safepoint(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact on checking this on every single character?
Feels like this must have a significant detrimental affect?
On text processing like this every 10,000 characters would suffice or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a Techniques and Applications for Guest-Language Safepoints paper that claims that the guest language safe point check has zero performance overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be true, but this code is executed as hosted Java code, not a guest language code.
How does this affect performance? My initial testing showed that the overhead is pretty negligible. Since asked, I have done some follow up measurements, especially focusing on Raw data for the report are here: Code used for running the benchmark.
I've tried 3 approaches:
The findings show that there is actually some overhead, but I think it is rather reasonably small. |
My interpretation of the data above: Cost of safepointsAnalysisA more complex example, measured with more care shows (quite expectedly, I was surprised by these initial measurements), that there is some overhead. How significant it is - is in my opinion a bit hard to say. Computing the averages for safepoints / no safepoints, we can see that the raw average running time is worse by 7% with safepoints on. That is a significant figure, not awful but not that great. Looking from another perspective at the data distributions - we can see that indeed the safepoint code is generally a bit slower than 'raw' code. However, in some run-times it achieves speeds just as fast as the code without safepoints. Looking at the outliers - we can see that both safepoints and no-safepoints code actually have very similar min and max run-times. Still, the code with safepoints tends to be a bit more leaning towards longer run-times than one without them. SummarySo we can see that there is definitely some cost, but looking at the variability of the run-times overall, it is not very large. Whether it is a 'price' we are willing to pay for interruptability is a tough question and one that I don't feel fully qualified to answer. In fact, we probably would need to measure and compare many more operations as this is just one example and it is hart to say how representative it is. But that is a process that can take some time. Possibly worth considering as further next steps, not necesarily urgent. In my personal opinion, based just on the limited data we can see, the benefit of interactivity is worth the drop in performance (since it is relatively small). Enso is an interactive product and we want to allow the user to change stuff and quickly see how it refreshes. Blocking the IDE for many seconds by some non-interruptible computation is going to be much more visible and painful to the user than things taking slightly more time to compute (which also depends on many many more other factors). This also highlights, that we may need to consider some better solutions to interruptability other than safepoints, as I already suggested in my comment on the issue. Ideally we should achieve interruptability of all operations (including ones that are still blocking now - like |
Safepoints run every n-th iterationI ran a third benchmark based on a hypothesis discussed with @jdunkerley that we could try limiting the cost of safepoints by only running the safepoint every few iterations - ~8k iterations will usually take very short time to complete so the 'latency' of waiting for the nth iteration will still be small, making it still possible to cancel computations relatively easily, but hypothetically reducing the overhead. To do so I used the following diff: diff --git a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
index fbba81e8c..d14655c7c 100644
--- a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
+++ b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
@@ -65,6 +65,10 @@ public class Text_Utils {
return str.codePoints().toArray();
}
+ public static boolean safepoints_enabled() {
+ return true;
+ }
+
/**
* Splits the string on each occurrence of UTF-8 vertical whitespace, returning the resulting
* substrings in an array.
@@ -97,7 +101,9 @@ public class Text_Utils {
currentPos += 1;
}
- context.safepoint();
+ if ((currentPos & 8191) == 0) {
+ context.safepoint();
+ }
}
if (currentStart < length) { I assumed that Looking at the data above, we can see that this hypothesis did not stand the practical test. Actually, the run "Safepoints & 8192" was slowest, with average time as slow as the highest outlier of the other two runs. I think this shows that the best we can do is rely on the JVM handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in my comment on this PR - there is a Techniques and Applications for Guest-Language Safepoints paper that claims that safepoint polling has zero performance overhead and statistically insignificant compilation overhead. Therefore, it is highly probably that the 7% average performance regression has a different meaning. Note that our Bench.measure infrastructure is not sophisticated enough to properly differentiate between warmup and measurement runs. As far as I understand, it is just a simple tool that should be able to reveal significant performance regressions. I would not add any value to 7% average regression that also includes warmup runs.
Thanks for this clarification and the linked paper.
I guess for peak performance you are right, but we probably should at least partially still consider this - warmup time is actually pretty important too in Enso in situations where we are interactively editing code on possibly smaller datasets there may be not enough time to reach peak performance and the apparent performance 'felt' by the user will be influenced by the not-warmed-up performance. Of course, the non-peak performance will likely be influenced by many things, so it is hard to judge the overall effect without some practical measurements. But the point is - peak performance is not the only metric we need to be aware of. |
55e3c34
to
51a6901
Compare
I tried using The results are... surprising. It seems that the average safepoint interval has increased whereas I'd expect a decrease. ResultsBase After PR
Table After PR
Base Before PR
Table Before PR
It may be because I was doing measurements on my laptop while running a browser and some other software which could influence some timing results. If we think it is necessary, we can try measuring again on a 'clean' machine. That is however not the main metric for this PR I think, so I will just leave it for now unless further steps are needed. |
51a6901
to
ed8efc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with spreading Context.getCurrent().safepoint()
thru-out the code base, if it helps something.
Personally I'd first invest in measuring infrastructure and only then change the codebase to have a real proof we did improve something.
This reverts commit dde8cd5.
…rser thread is stopped (pt. 1)
We may revisit it if oracle/graal#6931 is implemented.
This reverts commit 72f9f11.
This reverts commit bf4df2f.
This reverts commit b1d2fe9.
ed8efc5
to
bef4b7d
Compare
I think it is a good idea and I'll be very happy to see such a measurement tool implemented. I think we can check the measurements retoractively though - we can merge this PR and once the tool is created we can temporarily revert it to do a comparative testing.
Right. Upon request of @jdunkerley I did some further testing. I modified slightly the Colorado COVID example - I enlarged the table 5x (by appending it to itself) to make the computation a bit heavier to better show the latency and I added a dropdown A - B - C allowing to select a multiplier 1 - 3 - 5 that is applied to the case count, causing the interesting operations ( I will compare the latency of selecting A/B/C and how fast the 1/3/5 shows up in the visualization - the time it takes to update this visualization is the time wasted in pending computations that should be cancelled. Here is the version without safepoints: CC-no-safepoints.mp4For example in 1:29 I switch from 'A' to 'C'. The Now, let's see with safepoints: CC-with-safepoints.mp4For example in 0:30 I switch 'C' to 'B'. The switch from |
Pull Request Description
Closes #7129
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.