Report only unique warnings #6372

hubertp · 2023-04-20T15:35:00Z

Pull Request Description

This change makes sure that reported warnings are unique, based on the value of internal clock tick and ignoring differences in reassignments.

Before:

After:

On the positive side, no further changes, like in LS, have to be done.

Closes #6257.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

hubertp · 2023-04-20T15:35:59Z

test/Table_Tests/src/Database/SQLite_Spec.enso

@@ -172,4 +172,11 @@ spec =

        file.delete_if_exists

+        Test.specify 'should not duplicate warnings' <|


This feels more like integration test but at least it demonstrates that we don't generate rubbish anymore.

I think it's good to have such integration tests.

4e6

One comment about the test

test/Tests/src/Semantic/Warnings_Spec.enso

radeusgd · 2023-04-20T17:10:25Z

Oh, I feel bad like I'm constantly criticising.

But I'm really having doubts if this is the right solution. Can we discuss it?

I think we should deduplicate the warnings when adding/combining them.

Otherwise, what this does is it hides the duplication - so from UI perspective it's all good, but the problem still stays under the hood. If the warnings duplicate so easily, they can very easily use a lot of memory. Why keep the duplicates in memory when we can just easily run the deduplication on merge instead?

JaroslavTulach

I am glad this change helps the UI, but I am not sure I understand the logic that drives it.

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/Warning.java

hubertp · 2023-04-21T07:13:56Z

But I'm really having doubts if this is the right solution. Can we discuss it?
I think we should deduplicate the warnings when adding/combining them.
Otherwise, what this does is it hides the duplication - so from UI perspective it's all good, but the problem still stays under the hood. If the warnings duplicate so easily, they can very easily use a lot of memory. Why keep the duplicates in memory when we can just easily run the deduplication on merge instead?

My first attempt at this PR had uniqueness introduced at all data structures that export getAll, since this unpacking and packing again of warnings is the main source of evil. So that followed more closely to what you are suggesting. But I wasn't convinced this is the right way to go, since this de-duplication would be run on essentially every call. I can explore that more.
Alternatively, I can explore rewriting WithWarning to get rid of ArrayRope.

One potential positive side of this PR, apart from being relatively simple, is that we can still make use of duplicate reassigned warnings, if we ever decide to resurrect them for the IDE purposes. It won't be possible if we de-duplicate at the insertion.

But first, I also want to add some benchmarks to see how (a large number of) warnings affect simple computations.

hubertp · 2023-04-21T07:16:17Z

I am glad this change helps the UI, but I am not sure I understand the logic that drives it.

Displaying of warnings is done, like for any other structure, by attaching an appropriate visualization. By doing the de-duplication at the point when we return all warnings of the value, we only display the unique ones.

hubertp · 2023-04-21T08:21:44Z

OK, it still SOs on a fold with a vector of 10000 warnings. This needs more work.

radeusgd · 2023-04-21T09:03:02Z

One potential positive side of this PR, apart from being relatively simple, is that we can still make use of duplicate reassigned warnings, if we ever decide to resurrect them for the IDE purposes. It won't be possible if we de-duplicate at the insertion.

Indeed, I also thought that it may be useful at some point. But I'm afraid the performance drain could be too much. And also, at least currently I'm not quite certain we need it that much from Libs perspective - the major goal is to be able to display a warning when there is a risk of data loss/corruption and be sure that the user can track its source. A single path should be enough for most cases I think.

But first, I also want to add some benchmarks to see how (a large number of) warnings affect simple computations.

That would be ideal.

...runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/WarningBenchmarks.java

JaroslavTulach · 2023-04-25T05:44:33Z

test/Tests/src/Semantic/Warnings_Spec.enso

+        Warning.get_all result_2 . length . should_equal 1
+
+        result_3 = b + b + d
+        Warning.get_all result_3 . map (x-> x.value.to_text) . should_equal ["Foo!", "Foo!"]


This shows that the warning message isn't important - two warnings can have the same message and they will be treated as different. OK.

Just a note that the warnings don't really have a message but a generic payload which can be any object (in these examples they are just Text payloads).

test/Tests/src/Semantic/Warnings_Spec.enso

radeusgd · 2023-04-25T11:41:31Z

Btw. looking around the creationTime logic, I'm thinking if we should make the Warning constructors private, to disallow creating a Warning with a custom creationTime - it should always be initialized with ctx.clockTick() or with existing time when creating a reassigned copy of an existing warning. Having the constructors public is a risk that someone may use them wrongly.

hubertp · 2023-04-25T12:20:16Z

Btw. looking around the creationTime logic, I'm thinking if we should make the Warning constructors private, to disallow creating a Warning with a custom creationTime - it should always be initialized with ctx.clockTick() or with existing time when creating a reassigned copy of an existing warning. Having the constructors public is a risk that someone may use them wrongly.

sgtm

hubertp · 2023-04-25T14:10:47Z

Took me a while to bring it into an acceptable state and make native image happy. In the end, the change is rather minimal and perf is pretty good. Previously it would run out of heap space or SO pretty easily.

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/WithWarnings.java

radeusgd · 2023-04-26T14:03:52Z

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/Warning.java

+    Arrays.sort(arr, Comparator.comparing(Warning::getCreationTime).reversed());
+  }
+
+  /** Converts set to an array behing a truffle boundary. */


Suggested change

/** Converts set to an array behing a truffle boundary. */

/** Converts set to an array behind a truffle boundary. */

radeusgd

The Enso part (tests) looks perfect - that's exactly what we needed and I'm really glad we also got a benchmark.

One small suggestion/question in line.

radeusgd · 2023-04-26T14:10:11Z

engine/runtime/src/main/java/org/enso/interpreter/node/callable/InvokeConversionNode.java

@@ -172,10 +172,10 @@ Object doWarning(
      }
    }
    arguments[thatArgumentPosition] = that.getValue();
-    ArrayRope<Warning> warnings = that.getReassignedWarnings(this);
+    ArrayRope<Warning> warnings = that.getReassignedWarningsAsRope(this);


At this point, when storing the warnings as sets, what is the point of getting these warnings as a rope?

Won't a simple array suffice? I don't see any reason to keep using ArrayRopes in the warning system at this moment.

Not really. Gathering warnings means that we don't know the target size of the array/collection. So then a lot of methods that create/append collections/arrays would have to be behind truffle boundary as they are blacklisted. ArrayRope<Warning> is not an ideal data structure, yes. Eventually I would like to get rid of it completely but it would be a bigger change.
You also have to remember that this is added in generated code e.g. ReadArrayElementMethodGen.

Oh, so we are appending the warnings using the ArrayRope and only at the end merging them into a Set? Ok I think I missed that part. Makes sense now.

JaroslavTulach · 2023-04-26T14:19:07Z

...runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/WarningBenchmarks.java

+    }
+
+    @Benchmark
+    public void sameWarningVecSum() {


What are the benchmark results?

JaroslavTulach

I am interested in seeing the benchmark results. I like the overall changes. I'd like us to move even further, especially by not using EconomicSet directly, but rather relying on some Enso specific implementation of hash map/set.

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/Array.java

JaroslavTulach · 2023-04-26T14:26:05Z

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/WithWarnings.java

+    @Override
+    public boolean equals(Object a, Object b) {
+      if (a instanceof Warning thisObj && b instanceof Warning thatObj) {
+        return thisObj.getCreationTime() == thatObj.getCreationTime();


I got confused by the "creation time" in the past and (almost) again right now. Can't we rename to something like sequenceId?

JaroslavTulach · 2023-04-26T14:27:37Z

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/WithWarnings.java

+  }
+
+  @CompilerDirectives.TruffleBoundary
+  private EconomicSet<Warning> createSetFromArray(Warning[] entries) {


I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary. CCing @Akirathan

Wasn't the EconomicSet supposed to be PE friendly?

Graal's collections are primarily memory-friendly. But they sometimes do call some blacklisted methods, requiring @TruffleBoundary annotations. It's not a bug, it's by design.

Wasn't the EconomicSet supposed to be PE friendly?

The primary usage of EconomicXyz is in Graal compiler. Originally the compiler was using plain HashMap or HashSet, but when the compiler is running in JVM mode, the usages of HashXyz by the compiler and by user program influenced each other. Particularly: if the user program uses HashXyz with objects with a "bad" hashing function with too many collisions the HashXyz switches to slower "defensive mode" - however that influences all the HashXyz usages across the whole JVM. To isolate the compiler from such performance degradation the EconomicXyz implementations were created.

When Thomas wrote them, I don't think he was thinking about PE much.

I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary.

Tracked in #5233. I have just added a new task to that issue.

Tracking as a real issue

Migrate WithWarnings to use EnsoHashMap to speed them up significantly #8682

Akirathan · 2023-04-27T11:26:42Z

...runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/WarningBenchmarks.java

+                        Paths.get("../../distribution/component").toFile().getAbsolutePath()
+                )
+                .option("engine.MultiTier", "true")
+                .option("engine.BackgroundCompilation", "true")


For an unknown reason, we set engine.BackgroundCompilation to false as the default, even in benchmarks. Here https://github.com/enso-org/enso/blob/develop/build.sbt#L1029. Good that you override it here, but we should maybe get rid of the default option in build.sbt?

I think @JaroslavTulach made this possible recently in #6335 so maybe that was just forgotten there?

Akirathan · 2023-04-27T11:32:06Z

...runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/WarningBenchmarks.java

+        noWarningsVec = createVec.execute(INPUT_VEC_SIZE, elem);
+        sameWarningVec = createVec.execute(INPUT_VEC_SIZE, elemWithWarning);


It would be nice to randomize the contents of these vectors a bit, for example like in EqualsBenchmarks.generatePrimitiveVector - to be sure that there will be no magical constant fold. But I guess that even without the randomization, this benchmark should give us relevant numbers.

Yes, although later I'm checking the consistency of the end result which now is pretty easy to calculate :)

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/Array.java

Akirathan · 2023-04-27T11:42:28Z

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/WithWarnings.java

+  }
+
+  @CompilerDirectives.TruffleBoundary
+  private EconomicSet<Warning> createSetFromArray(Warning[] entries) {


I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary.

Tracked in #5233. I have just added a new task to that issue.

engine/runtime/src/test/java/org/enso/interpreter/test/WarningsTest.java

This change makes sure that reported warnings are unique, based on the value of internal clock tick and ignoring differences in reassignments. The `Warning.get_all` is the main entry point to all requests for values' warnings. So rather than implementing de-duplication all over the place, we just do it in one place. There is a cost involved in deduplication (sorting, copying the array + going over the whole array) but that seems like an acceptable cost.

hubertp · 2023-04-27T13:15:55Z

Benchmarks:

# Benchmark mode: Average time, time/op
# Benchmark: org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.noWarningsVecSum

# Run progress: 0.00% complete, ETA 00:00:14
# Fork: 1 of 1
OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
# Warmup Iteration   1: 0.213 ms/op
# Warmup Iteration   2: 0.026 ms/op
# Warmup Iteration   3: 0.025 ms/op
# Warmup Iteration   4: 0.025 ms/op
# Warmup Iteration   5: 0.024 ms/op
Iteration   1: 0.024 ms/op
Iteration   2: 0.024 ms/op
Iteration   3: 0.024 ms/op


Result "org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.noWarningsVecSum":
  0.024 ±(99.9%) 0.004 ms/op [Average]
  (min, avg, max) = (0.024, 0.024, 0.024), stdev = 0.001
  CI (99.9%): [0.020, 0.028] (assumes normal distribution)

# Run complete. Total time: 00:00:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark                           Mode  Cnt  Score   Error  Units
WarningBenchmarks.noWarningsVecSum  avgt    3  0.024 ± 0.004  ms/op
Difference was: 0.019454539400594495.
[info] - should not be slower than before
[info] org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum
# JMH version: 1.35
# VM version: JDK 11.0.18, OpenJDK 64-Bit Server VM, 11.0.18+10-jvmci-22.3-b13
# VM invoker: /home/hubert/opt/graalvm-ce-java11-22.3.1/bin/java
# VM options: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:JVMCIThreadsPerNativeLibraryRuntime=1 -XX:-UnlockExperimentalVMOptions -ea -Dpolyglot.engine.IterativePartialEscape=true -Dpolyglot.engine.BackgroundCompilation=false -Dgraalvm.locatorDisabled=true --upgrade-module-path=/home/hubert/tmp/enso/engine/runtime/build-cache/truffle-api.jar -Xss16M -Dpolyglot.engine.MultiTier=false
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 1 s each
# Measurement: 3 iterations, 3 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum
# Run progress: 0.00% complete, ETA 00:00:14
# Fork: 1 of 1
OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
# Warmup Iteration   1: 27.953 ms/op
# Warmup Iteration   2: 5.226 ms/op
# Warmup Iteration   3: 3.716 ms/op
# Warmup Iteration   4: 3.647 ms/op
# Warmup Iteration   5: 3.372 ms/op
Iteration   1: 3.346 ms/op
Iteration   2: 3.345 ms/op
Iteration   3: 3.360 ms/op


Result "org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum":
  3.351 ±(99.9%) 0.153 ms/op [Average]
  (min, avg, max) = (3.345, 3.351, 3.360), stdev = 0.008
  CI (99.9%): [3.198, 3.504] (assumes normal distribution)

hubertp requested review from jdunkerley, radeusgd, 4e6 and JaroslavTulach as code owners April 20, 2023 15:35

hubertp commented Apr 20, 2023

View reviewed changes

4e6 approved these changes Apr 20, 2023

View reviewed changes

test/Tests/src/Semantic/Warnings_Spec.enso Outdated Show resolved Hide resolved

JaroslavTulach reviewed Apr 21, 2023

View reviewed changes

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/Warning.java Outdated Show resolved Hide resolved

hubertp marked this pull request as draft April 21, 2023 08:22

JaroslavTulach reviewed Apr 25, 2023

View reviewed changes

This was referenced Apr 25, 2023

Vector.sort should attach limited number of warnings #6283

Closed

Get rid of TruffleBoundary in Map.insert and Map.get #5233

Closed

hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from a1f9039 to bb80aa9 Compare April 25, 2023 14:08

hubertp marked this pull request as ready for review April 25, 2023 14:09

hubertp requested a review from JaroslavTulach April 25, 2023 14:10

hubertp commented Apr 25, 2023

View reviewed changes

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/WithWarnings.java Outdated Show resolved Hide resolved

radeusgd reviewed Apr 26, 2023

View reviewed changes

radeusgd approved these changes Apr 26, 2023

View reviewed changes

JaroslavTulach reviewed Apr 26, 2023

View reviewed changes

JaroslavTulach approved these changes Apr 26, 2023

View reviewed changes

jdunkerley approved these changes Apr 27, 2023

View reviewed changes

Akirathan reviewed Apr 27, 2023

View reviewed changes

hubertp added 11 commits April 27, 2023 15:10

more tests

8caf629

update changelog

e3051f4

add a sample benchmark

c42a5a3

towards de-duplicated warnings

ef20899

nits

98fa634

getWarnings is always unique

3e6ed29

private Warning constructor

6b72d7b

more shedding

1b821dc

nits

daf4d2a

final nits

b1fe0fe

remove prepend

72d3cb9

cleanup tests, address review

74392dd

hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from 2678d35 to 74392dd Compare April 27, 2023 14:57

hubertp requested a review from GregoryTravis as a code owner April 27, 2023 14:57

s/creation time/sequence id

5a7ad12

hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from 28362e5 to 5a7ad12 Compare April 27, 2023 15:05

hubertp added 2 commits April 27, 2023 17:31

More cleanup in tests

ea52e63

more test cleanups

58cd84d

hubertp added the CI: Ready to merge This PR is eligible for automatic merge label Apr 28, 2023

mergify bot merged commit c6790f1 into develop Apr 28, 2023
28 checks passed

mergify bot deleted the wip/hubert/6257-eliminate-duplicate-warnings branch April 28, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report only unique warnings #6372

Report only unique warnings #6372

hubertp commented Apr 20, 2023 •

edited

hubertp Apr 20, 2023

radeusgd Apr 25, 2023

4e6 left a comment

radeusgd commented Apr 20, 2023

JaroslavTulach left a comment

hubertp commented Apr 21, 2023

hubertp commented Apr 21, 2023

hubertp commented Apr 21, 2023

radeusgd commented Apr 21, 2023

JaroslavTulach Apr 25, 2023

radeusgd Apr 25, 2023

radeusgd commented Apr 25, 2023

hubertp commented Apr 25, 2023

hubertp commented Apr 25, 2023

radeusgd Apr 26, 2023

radeusgd left a comment

radeusgd Apr 26, 2023

hubertp Apr 26, 2023

radeusgd Apr 26, 2023

JaroslavTulach Apr 26, 2023

JaroslavTulach left a comment

JaroslavTulach Apr 26, 2023

JaroslavTulach Apr 26, 2023

radeusgd Apr 26, 2023

hubertp Apr 27, 2023

JaroslavTulach Apr 27, 2023

Akirathan Apr 27, 2023

JaroslavTulach Jan 5, 2024

Akirathan Apr 27, 2023

hubertp Apr 27, 2023

Akirathan Apr 27, 2023

hubertp Apr 27, 2023 •

edited

Akirathan Apr 27, 2023

hubertp commented Apr 27, 2023 •

edited

		@@ -172,4 +172,11 @@ spec =

		file.delete_if_exists

		Test.specify 'should not duplicate warnings' <\|

	/** Converts set to an array behing a truffle boundary. */
	/** Converts set to an array behind a truffle boundary. */

		noWarningsVec = createVec.execute(INPUT_VEC_SIZE, elem);
		sameWarningVec = createVec.execute(INPUT_VEC_SIZE, elemWithWarning);

Report only unique warnings #6372

Report only unique warnings #6372

Conversation

hubertp commented Apr 20, 2023 • edited

Pull Request Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

4e6 left a comment

Choose a reason for hiding this comment

radeusgd commented Apr 20, 2023

JaroslavTulach left a comment

Choose a reason for hiding this comment

hubertp commented Apr 21, 2023

hubertp commented Apr 21, 2023

hubertp commented Apr 21, 2023

radeusgd commented Apr 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd commented Apr 25, 2023

hubertp commented Apr 25, 2023

hubertp commented Apr 25, 2023

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaroslavTulach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hubertp Apr 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hubertp commented Apr 27, 2023 • edited

hubertp commented Apr 20, 2023 •

edited

hubertp Apr 27, 2023 •

edited

hubertp commented Apr 27, 2023 •

edited