Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report only unique warnings #6372

Merged
merged 16 commits into from Apr 28, 2023

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented Apr 20, 2023

Pull Request Description

This change makes sure that reported warnings are unique, based on the value of internal clock tick and ignoring differences in reassignments.

Before:
Screenshot from 2023-04-20 15-42-55
After:
Screenshot from 2023-04-20 15-27-27

On the positive side, no further changes, like in LS, have to be done.

Closes #6257.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@@ -172,4 +172,11 @@ spec =

file.delete_if_exists

Test.specify 'should not duplicate warnings' <|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like integration test but at least it demonstrates that we don't generate rubbish anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have such integration tests.

Copy link
Contributor

@4e6 4e6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about the test

test/Tests/src/Semantic/Warnings_Spec.enso Outdated Show resolved Hide resolved
@radeusgd
Copy link
Member

Oh, I feel bad like I'm constantly criticising.

But I'm really having doubts if this is the right solution. Can we discuss it?

I think we should deduplicate the warnings when adding/combining them.

Otherwise, what this does is it hides the duplication - so from UI perspective it's all good, but the problem still stays under the hood. If the warnings duplicate so easily, they can very easily use a lot of memory. Why keep the duplicates in memory when we can just easily run the deduplication on merge instead?

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad this change helps the UI, but I am not sure I understand the logic that drives it.

@hubertp
Copy link
Contributor Author

hubertp commented Apr 21, 2023

But I'm really having doubts if this is the right solution. Can we discuss it?
I think we should deduplicate the warnings when adding/combining them.
Otherwise, what this does is it hides the duplication - so from UI perspective it's all good, but the problem still stays under the hood. If the warnings duplicate so easily, they can very easily use a lot of memory. Why keep the duplicates in memory when we can just easily run the deduplication on merge instead?

My first attempt at this PR had uniqueness introduced at all data structures that export getAll, since this unpacking and packing again of warnings is the main source of evil. So that followed more closely to what you are suggesting. But I wasn't convinced this is the right way to go, since this de-duplication would be run on essentially every call. I can explore that more.
Alternatively, I can explore rewriting WithWarning to get rid of ArrayRope.

One potential positive side of this PR, apart from being relatively simple, is that we can still make use of duplicate reassigned warnings, if we ever decide to resurrect them for the IDE purposes. It won't be possible if we de-duplicate at the insertion.

But first, I also want to add some benchmarks to see how (a large number of) warnings affect simple computations.

@hubertp
Copy link
Contributor Author

hubertp commented Apr 21, 2023

I am glad this change helps the UI, but I am not sure I understand the logic that drives it.

Displaying of warnings is done, like for any other structure, by attaching an appropriate visualization. By doing the de-duplication at the point when we return all warnings of the value, we only display the unique ones.

@hubertp
Copy link
Contributor Author

hubertp commented Apr 21, 2023

OK, it still SOs on a fold with a vector of 10000 warnings. This needs more work.

@hubertp hubertp marked this pull request as draft April 21, 2023 08:22
@radeusgd
Copy link
Member

One potential positive side of this PR, apart from being relatively simple, is that we can still make use of duplicate reassigned warnings, if we ever decide to resurrect them for the IDE purposes. It won't be possible if we de-duplicate at the insertion.

Indeed, I also thought that it may be useful at some point. But I'm afraid the performance drain could be too much. And also, at least currently I'm not quite certain we need it that much from Libs perspective - the major goal is to be able to display a warning when there is a risk of data loss/corruption and be sure that the user can track its source. A single path should be enough for most cases I think.

But first, I also want to add some benchmarks to see how (a large number of) warnings affect simple computations.

That would be ideal.

Warning.get_all result_2 . length . should_equal 1

result_3 = b + b + d
Warning.get_all result_3 . map (x-> x.value.to_text) . should_equal ["Foo!", "Foo!"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows that the warning message isn't important - two warnings can have the same message and they will be treated as different. OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that the warnings don't really have a message but a generic payload which can be any object (in these examples they are just Text payloads).

test/Tests/src/Semantic/Warnings_Spec.enso Show resolved Hide resolved
@radeusgd
Copy link
Member

Btw. looking around the creationTime logic, I'm thinking if we should make the Warning constructors private, to disallow creating a Warning with a custom creationTime - it should always be initialized with ctx.clockTick() or with existing time when creating a reassigned copy of an existing warning. Having the constructors public is a risk that someone may use them wrongly.

@hubertp
Copy link
Contributor Author

hubertp commented Apr 25, 2023

Btw. looking around the creationTime logic, I'm thinking if we should make the Warning constructors private, to disallow creating a Warning with a custom creationTime - it should always be initialized with ctx.clockTick() or with existing time when creating a reassigned copy of an existing warning. Having the constructors public is a risk that someone may use them wrongly.

sgtm

@hubertp hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from a1f9039 to bb80aa9 Compare April 25, 2023 14:08
@hubertp hubertp marked this pull request as ready for review April 25, 2023 14:09
@hubertp
Copy link
Contributor Author

hubertp commented Apr 25, 2023

Took me a while to bring it into an acceptable state and make native image happy. In the end, the change is rather minimal and perf is pretty good. Previously it would run out of heap space or SO pretty easily.

Arrays.sort(arr, Comparator.comparing(Warning::getCreationTime).reversed());
}

/** Converts set to an array behing a truffle boundary. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Converts set to an array behing a truffle boundary. */
/** Converts set to an array behind a truffle boundary. */

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Enso part (tests) looks perfect - that's exactly what we needed and I'm really glad we also got a benchmark.

One small suggestion/question in line.

@@ -172,10 +172,10 @@ Object doWarning(
}
}
arguments[thatArgumentPosition] = that.getValue();
ArrayRope<Warning> warnings = that.getReassignedWarnings(this);
ArrayRope<Warning> warnings = that.getReassignedWarningsAsRope(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, when storing the warnings as sets, what is the point of getting these warnings as a rope?

Won't a simple array suffice? I don't see any reason to keep using ArrayRopes in the warning system at this moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Gathering warnings means that we don't know the target size of the array/collection. So then a lot of methods that create/append collections/arrays would have to be behind truffle boundary as they are blacklisted. ArrayRope<Warning> is not an ideal data structure, yes. Eventually I would like to get rid of it completely but it would be a bigger change.
You also have to remember that this is added in generated code e.g. ReadArrayElementMethodGen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so we are appending the warnings using the ArrayRope and only at the end merging them into a Set? Ok I think I missed that part. Makes sense now.

}

@Benchmark
public void sameWarningVecSum() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the benchmark results?

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in seeing the benchmark results. I like the overall changes. I'd like us to move even further, especially by not using EconomicSet directly, but rather relying on some Enso specific implementation of hash map/set.

@Override
public boolean equals(Object a, Object b) {
if (a instanceof Warning thisObj && b instanceof Warning thatObj) {
return thisObj.getCreationTime() == thatObj.getCreationTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the "creation time" in the past and (almost) again right now. Can't we rename to something like sequenceId?

}

@CompilerDirectives.TruffleBoundary
private EconomicSet<Warning> createSetFromArray(Warning[] entries) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary. CCing @Akirathan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the EconomicSet supposed to be PE friendly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graal's collections are primarily memory-friendly. But they sometimes do call some blacklisted methods, requiring @TruffleBoundary annotations. It's not a bug, it's by design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the EconomicSet supposed to be PE friendly?

The primary usage of EconomicXyz is in Graal compiler. Originally the compiler was using plain HashMap or HashSet, but when the compiler is running in JVM mode, the usages of HashXyz by the compiler and by user program influenced each other. Particularly: if the user program uses HashXyz with objects with a "bad" hashing function with too many collisions the HashXyz switches to slower "defensive mode" - however that influences all the HashXyz usages across the whole JVM. To isolate the compiler from such performance degradation the EconomicXyz implementations were created.

When Thomas wrote them, I don't think he was thinking about PE much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary.

Tracked in #5233. I have just added a new task to that issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths.get("../../distribution/component").toFile().getAbsolutePath()
)
.option("engine.MultiTier", "true")
.option("engine.BackgroundCompilation", "true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an unknown reason, we set engine.BackgroundCompilation to false as the default, even in benchmarks. Here https://github.com/enso-org/enso/blob/develop/build.sbt#L1029. Good that you override it here, but we should maybe get rid of the default option in build.sbt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @JaroslavTulach made this possible recently in #6335 so maybe that was just forgotten there?

Comment on lines +84 to +85
noWarningsVec = createVec.execute(INPUT_VEC_SIZE, elem);
sameWarningVec = createVec.execute(INPUT_VEC_SIZE, elemWithWarning);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to randomize the contents of these vectors a bit, for example like in EqualsBenchmarks.generatePrimitiveVector - to be sure that there will be no magical constant fold. But I guess that even without the randomization, this benchmark should give us relevant numbers.

Copy link
Contributor Author

@hubertp hubertp Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although later I'm checking the consistency of the end result which now is pretty easy to calculate :)

}

@CompilerDirectives.TruffleBoundary
private EconomicSet<Warning> createSetFromArray(Warning[] entries) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should share the Map implementation with "EnsoHashMap" and together move towards a hash map/set that works in PE mode well and doesn't have to hide behind @TruffleBoundary.

Tracked in #5233. I have just added a new task to that issue.

This change makes sure that reported warnings are unique, based on the
value of internal clock tick and ignoring differences in reassignments.

The `Warning.get_all` is the main entry point to all requests for
values' warnings. So rather than implementing de-duplication all over
the place, we just do it in one place. There is a cost involved in
deduplication (sorting, copying the array + going over the whole array)
but that seems like an acceptable cost.
@hubertp
Copy link
Contributor Author

hubertp commented Apr 27, 2023

Benchmarks:

# Benchmark mode: Average time, time/op
# Benchmark: org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.noWarningsVecSum

# Run progress: 0.00% complete, ETA 00:00:14
# Fork: 1 of 1
OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
# Warmup Iteration   1: 0.213 ms/op
# Warmup Iteration   2: 0.026 ms/op
# Warmup Iteration   3: 0.025 ms/op
# Warmup Iteration   4: 0.025 ms/op
# Warmup Iteration   5: 0.024 ms/op
Iteration   1: 0.024 ms/op
Iteration   2: 0.024 ms/op
Iteration   3: 0.024 ms/op


Result "org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.noWarningsVecSum":
  0.024 ±(99.9%) 0.004 ms/op [Average]
  (min, avg, max) = (0.024, 0.024, 0.024), stdev = 0.001
  CI (99.9%): [0.020, 0.028] (assumes normal distribution)

# Run complete. Total time: 00:00:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark                           Mode  Cnt  Score   Error  Units
WarningBenchmarks.noWarningsVecSum  avgt    3  0.024 ± 0.004  ms/op
Difference was: 0.019454539400594495.
[info] - should not be slower than before
[info] org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum
# JMH version: 1.35
# VM version: JDK 11.0.18, OpenJDK 64-Bit Server VM, 11.0.18+10-jvmci-22.3-b13
# VM invoker: /home/hubert/opt/graalvm-ce-java11-22.3.1/bin/java
# VM options: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:JVMCIThreadsPerNativeLibraryRuntime=1 -XX:-UnlockExperimentalVMOptions -ea -Dpolyglot.engine.IterativePartialEscape=true -Dpolyglot.engine.BackgroundCompilation=false -Dgraalvm.locatorDisabled=true --upgrade-module-path=/home/hubert/tmp/enso/engine/runtime/build-cache/truffle-api.jar -Xss16M -Dpolyglot.engine.MultiTier=false
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 1 s each
# Measurement: 3 iterations, 3 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum
# Run progress: 0.00% complete, ETA 00:00:14
# Fork: 1 of 1
OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
# Warmup Iteration   1: 27.953 ms/op
# Warmup Iteration   2: 5.226 ms/op
# Warmup Iteration   3: 3.716 ms/op
# Warmup Iteration   4: 3.647 ms/op
# Warmup Iteration   5: 3.372 ms/op
Iteration   1: 3.346 ms/op
Iteration   2: 3.345 ms/op
Iteration   3: 3.360 ms/op


Result "org.enso.interpreter.bench.benchmarks.semantic.WarningBenchmarks.sameWarningVecSum":
  3.351 ±(99.9%) 0.153 ms/op [Average]
  (min, avg, max) = (3.345, 3.351, 3.360), stdev = 0.008
  CI (99.9%): [3.198, 3.504] (assumes normal distribution)

@hubertp hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from 2678d35 to 74392dd Compare April 27, 2023 14:57
@hubertp hubertp force-pushed the wip/hubert/6257-eliminate-duplicate-warnings branch from 28362e5 to 5a7ad12 Compare April 27, 2023 15:05
@hubertp hubertp added the CI: Ready to merge This PR is eligible for automatic merge label Apr 28, 2023
@mergify mergify bot merged commit c6790f1 into develop Apr 28, 2023
28 checks passed
@mergify mergify bot deleted the wip/hubert/6257-eliminate-duplicate-warnings branch April 28, 2023 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ridiculously high warning duplication
6 participants