DataflowError.withoutTrace shall not store a trace #8608

JaroslavTulach · 2023-12-20T16:49:46Z

Pull Request Description

Fixes #8137 by storing only Error.throw location in DataflowError. There is a getStackTrace message in InteropLibrary, so using it to take full control over the DataflowError stacktrace.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated in c678799
All code follows the
Scala,
Java,
All code has been tested:
- Unit tests have been written where possible.
- Runtime*Test have been updated
- Benchmark run shows speedup
- Enable richer stack traces via an option - done
- Not done: reasons for different stacktraces

radeusgd · 2023-12-20T16:59:30Z

As mentioned (pt. 2) full stack traces of dataflow errors are really quite invaluable for debugging.

Can we at least add some option (e.g. environment variable, engine runner option?) that re-enables the full traces at the cost of performance? Something like a 'debug' mode?

JaroslavTulach · 2023-12-21T04:01:25Z

As mentioned (pt. 2) full stack traces of dataflow errors are really quite invaluable for debugging.

While I hear you I don't think there is a way to have exactly correct stacktraces and excellent performance for DataflowError. As soon as the DataflowError is assigned to some variable, it needs to materialize to have correct stacktrace. Following example creates a Vector:

deep_arr n =
    err = Error.throw (Illegal_Argument.Error "Problem"+n.to_text)
    if n <= 0 then [err] else deep_arr n-1 + [err]

and such a Vector can fly thru the program for a very long time before its DataflowError stacktraces are printed or ignored.

Can we at least add some option (e.g. environment variable, engine runner option?) that re-enables the full traces at the cost of performance? Something like a 'debug' mode?

Knowing (at Error.throw) site whether a stacktrace is needed or not isn't possible in advance. There are two options:

a hint from the developer
optimize subsequent executions based on their prior behavior

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

JaroslavTulach · 2023-12-21T04:08:12Z

Right now there are:

one failure in RuntimeVisualizationTest
three failures in RuntimeErrorsTest

…owError_8137

radeusgd · 2023-12-21T08:41:46Z

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

I understand.

That's why I suggested relying on an environment variable or runner option allowing us to 'enable detailed dataflow stack traces'. This can be a compile-time constant, that can be checked at Error.throw. By default we can keep the short trace (I guess for users the error message is far more important than location anyway), but us developers can enable (globally) more detailed traces (at the cost of performance).

Akirathan · 2023-12-21T09:35:37Z

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

I understand.

That's why I suggested relying on an environment variable or runner option allowing us to 'enable detailed dataflow stack traces'. This can be a compile-time constant, that can be checked at Error.throw. By default we can keep the short trace (I guess for users the error message is far more important than location anyway), but us developers can enable (globally) more detailed traces (at the cost of performance).

What about enabling deep stack traces when JVM assertions (-ea) are enabled? They are enabled in unit tests and Enso tests, both on the CI and locally, but they are disabled in the production.

Akirathan

Consider enabling full stack traces when assertions enabled (#8608 (comment)). But that might be too complicated to implement. At least in this PR. Looks fine otherwise. Also, don't forget to report the performance changes pls.

...me/src/main/java/org/enso/interpreter/node/expression/builtin/runtime/GetStackTraceNode.java

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/DataflowError.java

…cktrace

radeusgd · 2023-12-21T15:32:55Z

But that might be too complicated to implement. At least in this PR.

I don't think we can merge a PR that disables stack traces and does not give us a way to enable them in debug mode. That would hinder developer UX horribly.

I would very much appreciate being able to get stack traces in some 'debug' mode, I really need it much for my day-to-day work. Trying to summarize my day-to-day work, I think I browse tens of traces a day on average (maybe more), some of which are from panics but a significant part of these are from dataflow errors. Losing that capability will make it harder to implement new features and will slow us down.

JaroslavTulach · 2023-12-21T15:58:56Z

.../runtime/src/test/scala/org/enso/interpreter/test/instrument/RuntimeVisualizationsTest.scala

@@ -1877,8 +1877,9 @@ class RuntimeVisualizationsTest
              message =
                "Method `does_not_exist` of type Main could not be found.",
              stack = Vector(
-                Api.StackTraceElement("<eval>", None, None, None),
-                Api.StackTraceElement("Debug.eval", None, None, None)
+// empty stack for now


Unfortunate change. Result of trying to unify working with stacktraces behind InteropLibrary. Let's see if it can be somewhat mitigated tomorrow.

JaroslavTulach · 2023-12-22T04:32:52Z

We have four FAILED tests in test/Tests suite.

JaroslavTulach · 2023-12-22T05:39:20Z

Benchmark run indicates the Panics_And_Errors_10000_Dataflow_Error test (created by #8130 pull request) has been sped up significantly:

Previously it took 329ms, now it runs in 0,045ms.

JaroslavTulach · 2023-12-22T07:43:06Z

Since 10a8452 the following x.enso program:

from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

deep n = if n <= 0 then Error.throw (Illegal_Argument.Error "Problem") else deep n-1

main =
    d = deep 10
    d.get_stack_trace_text

produces just:

$ enso --run x.enso
        at <enso> x.deep<arg-1>(x.enso:4:25-70)

and full thread dump when executed with assertions:

$ JAVA_OPTS=-ea enso --run x.enso
        at <enso> x.deep<arg-1>(x.enso:4:25-70)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.main(x.enso:7:9-15)

I believe this addresses Radek's complain - or at least provides the basics to mitigate it for now.

engine/runner/src/main/scala/org/enso/runner/Main.scala

engine/runtime/src/test/java/org/enso/interpreter/test/FindExceptionMessageTest.java

radeusgd · 2023-12-22T11:16:47Z

test/Tests/src/Network/Enso_Cloud/Enso_Cloud_Spec.enso

@@ -76,7 +76,7 @@ spec =

        Test.group "Enso_User - local mock integration tests" pending=pending_has_url <|
            # These tests should be kept in sync with tools/http-test-helper/src/main/java/org/enso/shttp/cloud_mock/UsersHandler.java
-            Test.specify "current user can be fetched from mock API" <|
+            Test.specify "current user can be fetched from mock API" pending=pending_has_url <|


Sorry I missed that 🤦

I fixed that in the #8591 PR which has been merged recently.

radeusgd

Changes look good.

Thank you very much for addressing my concerns and ensuring that developer UX is not significantly worsened - the debug mode will really be invaluable for us!

The performance improvements are awesome (expectably 🙂).

radeusgd · 2023-12-22T11:22:50Z

I have a suggestion (for another followup PR perhaps).

Given that we have the withTrace and withoutTrace methods - why not expose this to users (i.e. lib devs)?

I think it could be good to give us control over whether a given error will hold the trace or not - this way we can decide when we want better performance vs better diagnostics. I think there are definitely places where both make sense:

in places like Map.get - Error.throw Not_Found should likely be fast and the stack trace may not be that needed. So we could keep using the fast Error.throw (as I assume lack of traces would be the default).
in places like handle_sql_errors - I think we may want to prefer a Error.throw_with_trace. As I told before - the error location of handle_sql_errors will tell the user nothing useful - so here including the full trace will be more helpful. And an SQL error is not a performance-intensive kind of error - it is unlikely to happen in a loop; it will have collected the trace on the original Host Java exception anyway - so the performance difference will not be noticeable, but we could regain better diagnostics.

I think this could be the best compromise we can get.

JaroslavTulach · 2023-12-24T10:07:23Z

I have a suggestion (for another followup PR perhaps).

Certainly a follow up PR and issue. Feel free to record it/implement it.

Given that we have the withTrace and withoutTrace methods - why not expose this to users (i.e. lib devs)?

Yes, I was also thinking about Error.throw withTrace=False. However I am not conviced that it is exactly what users want...

I think it could be good to give us control over whether a given error will hold the trace or not - this way we can decide when we want better performance vs better diagnostics. I think there are definitely places where both make sense:

1. in places like `Map.get` - `Error.throw Not_Found` should likely be fast and the stack trace may not be that needed. So we could keep using the fast `Error.throw` (as I assume lack of traces would be the default).

2. in places like `handle_sql_errors` - I think we may want to prefer a `Error.throw_with_trace`. As I told before - the error location of `handle_sql_errors` will tell the user nothing useful - so here including the full trace will be more helpful. And an SQL error is not a performance-intensive kind of error - it is unlikely to happen in a loop; it will have collected the trace on the original Host Java exception _anyway_ - so the performance difference will not be noticeable, but we could regain better diagnostics.

The issue is:

it is not the author of the Error.throw who should decide whether to capture the stack trace or not
It is the user of some library code that than accidentally calls Error.throw
E.g. there should be a way to turn stack trace capture even for code which isn't doing that by default
Currently we have -ea for that, but yes, the meaning of -ea is becoming a bit too overloaded

I was rather thinking about something associated with State (at the end the tracing behavior is associated with state of the execution, not the code itself). Something like Runtime.Context.Tracing.with_enabled, maybe? Anyway that's for another issue/discussion/PR.

DataflowError.withoutTrace shall not store a trace

0226b4a

JaroslavTulach self-assigned this Dec 20, 2023

JaroslavTulach requested review from jdunkerley, radeusgd, GregoryTravis, 4e6, hubertp and Akirathan as code owners December 20, 2023 16:49

enso-bot bot mentioned this pull request Dec 21, 2023

Try to improve performance of Dataflow Errors #8137

Closed

JaroslavTulach added 5 commits December 21, 2023 06:09

Source location of Error.throw is recorded

2350c6b

Panic.recover records whole stacktrace

1cd87cc

Merge remote-tracking branch 'origin/develop' into wip/jtulach/Datafl…

bfc9893

…owError_8137

Skip HTTP and Cloud tests when pending_has_url

9a029fd

Concentrate stack trace handling into GetStackTraceNode

97c5b3b

DataflowError.stack_trace shows the most important line at index 0

8d57904

Akirathan approved these changes Dec 21, 2023

View reviewed changes

...me/src/main/java/org/enso/interpreter/node/expression/builtin/runtime/GetStackTraceNode.java Show resolved Hide resolved

engine/runtime/src/main/java/org/enso/interpreter/runtime/error/DataflowError.java Outdated Show resolved Hide resolved

JaroslavTulach added 4 commits December 21, 2023 14:52

Using GetStackTraceNode.stackTraceToArray to obtain standard Enso sta…

bcb8960

…cktrace

Ensure correctness of the tests

1355b7a

Keep stacktrace of index out of bounds DataflowError

7e68aba

Depp locations of DataflowError aren't tracked

eaa8be1

JaroslavTulach added 2 commits December 21, 2023 16:46

Adding changelog note

086c1d9

Always sent in InteropLibrary

4849ef9

JaroslavTulach force-pushed the wip/jtulach/DataflowError_8137 branch from fefb7e7 to 4849ef9 Compare December 21, 2023 15:52

Merging and resolving conficts

8271b26

Make sure RuntimeVisualizationsTest passes OK

891b08d

JaroslavTulach commented Dec 21, 2023

View reviewed changes

JaroslavTulach added 4 commits December 22, 2023 05:35

Merging with latest develop branch

10a54f1

Drop just a single frame in Runtime.get_stack_trace

2b2f253

Giving the field ownTrace name

c068407

Report unexpected exceptions when assertions are on

9d4c333

JaroslavTulach added 3 commits December 22, 2023 06:56

Centralized ownTrace check

13e1da6

Format multiline strings properly

1152e3d

Record full stack traces when running with assertions

10a8452

Adjusting to -ea and not -ea difference

a5616da

radeusgd reviewed Dec 22, 2023

View reviewed changes

engine/runner/src/main/scala/org/enso/runner/Main.scala Show resolved Hide resolved

radeusgd reviewed Dec 22, 2023

View reviewed changes

engine/runtime/src/test/java/org/enso/interpreter/test/FindExceptionMessageTest.java Outdated Show resolved Hide resolved

radeusgd reviewed Dec 22, 2023

View reviewed changes

radeusgd approved these changes Dec 22, 2023

View reviewed changes

JaroslavTulach added 6 commits December 23, 2023 07:12

Check for != -1

80dc1cb

Verify polyglot Enso/Java stacktrace is mixed

f943a2b

Get ready for error_message being Nothing

2e7199f

Location doesn't have to be a Text

e98ded3

Record stacktrace without using PanicException

903e7d9

Documenting how to get full stack trace

c678799

JaroslavTulach merged commit 07d58f2 into develop Dec 24, 2023
34 checks passed

JaroslavTulach deleted the wip/jtulach/DataflowError_8137 branch December 24, 2023 10:07

radeusgd mentioned this pull request Jan 3, 2024

Ability to programmatically control dataflow error traces #8665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataflowError.withoutTrace shall not store a trace #8608

DataflowError.withoutTrace shall not store a trace #8608

JaroslavTulach commented Dec 20, 2023 •

edited

Loading

radeusgd commented Dec 20, 2023 •

edited

Loading

JaroslavTulach commented Dec 21, 2023

JaroslavTulach commented Dec 21, 2023

radeusgd commented Dec 21, 2023

Akirathan commented Dec 21, 2023

Akirathan left a comment

radeusgd commented Dec 21, 2023

JaroslavTulach Dec 21, 2023

JaroslavTulach commented Dec 22, 2023

JaroslavTulach commented Dec 22, 2023

JaroslavTulach commented Dec 22, 2023

radeusgd Dec 22, 2023 •

edited

Loading

radeusgd left a comment

radeusgd commented Dec 22, 2023

JaroslavTulach commented Dec 24, 2023

DataflowError.withoutTrace shall not store a trace #8608

DataflowError.withoutTrace shall not store a trace #8608

Conversation

JaroslavTulach commented Dec 20, 2023 • edited Loading

Pull Request Description

Checklist

radeusgd commented Dec 20, 2023 • edited Loading

JaroslavTulach commented Dec 21, 2023

JaroslavTulach commented Dec 21, 2023

radeusgd commented Dec 21, 2023

Akirathan commented Dec 21, 2023

Akirathan left a comment

Choose a reason for hiding this comment

radeusgd commented Dec 21, 2023

JaroslavTulach Dec 21, 2023

Choose a reason for hiding this comment

JaroslavTulach commented Dec 22, 2023

JaroslavTulach commented Dec 22, 2023

JaroslavTulach commented Dec 22, 2023

radeusgd Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd commented Dec 22, 2023

JaroslavTulach commented Dec 24, 2023

JaroslavTulach commented Dec 20, 2023 •

edited

Loading

radeusgd commented Dec 20, 2023 •

edited

Loading

radeusgd Dec 22, 2023 •

edited

Loading