Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataflowError.withoutTrace shall not store a trace #8608

Merged
merged 29 commits into from
Dec 24, 2023

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Dec 20, 2023

Pull Request Description

Fixes #8137 by storing only Error.throw location in DataflowError. There is a getStackTrace message in InteropLibrary, so using it to take full control over the DataflowError stacktrace.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

@radeusgd
Copy link
Member

radeusgd commented Dec 20, 2023

As mentioned (pt. 2) full stack traces of dataflow errors are really quite invaluable for debugging.

Can we at least add some option (e.g. environment variable, engine runner option?) that re-enables the full traces at the cost of performance? Something like a 'debug' mode?

@JaroslavTulach
Copy link
Member Author

As mentioned (pt. 2) full stack traces of dataflow errors are really quite invaluable for debugging.

While I hear you I don't think there is a way to have exactly correct stacktraces and excellent performance for DataflowError. As soon as the DataflowError is assigned to some variable, it needs to materialize to have correct stacktrace. Following example creates a Vector:

deep_arr n =
    err = Error.throw (Illegal_Argument.Error "Problem"+n.to_text)
    if n <= 0 then [err] else deep_arr n-1 + [err]

and such a Vector can fly thru the program for a very long time before its DataflowError stacktraces are printed or ignored.

Can we at least add some option (e.g. environment variable, engine runner option?) that re-enables the full traces at the cost of performance? Something like a 'debug' mode?

Knowing (at Error.throw) site whether a stacktrace is needed or not isn't possible in advance. There are two options:

  • a hint from the developer
  • optimize subsequent executions based on their prior behavior

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

@JaroslavTulach
Copy link
Member Author

Right now there are:

@radeusgd
Copy link
Member

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

I understand.

That's why I suggested relying on an environment variable or runner option allowing us to 'enable detailed dataflow stack traces'. This can be a compile-time constant, that can be checked at Error.throw. By default we can keep the short trace (I guess for users the error message is far more important than location anyway), but us developers can enable (globally) more detailed traces (at the cost of performance).

@Akirathan
Copy link
Member

In any case tracking exact stacktraces cannot be default & automatic, if the performance shall be on par with Panic.

I understand.

That's why I suggested relying on an environment variable or runner option allowing us to 'enable detailed dataflow stack traces'. This can be a compile-time constant, that can be checked at Error.throw. By default we can keep the short trace (I guess for users the error message is far more important than location anyway), but us developers can enable (globally) more detailed traces (at the cost of performance).

What about enabling deep stack traces when JVM assertions (-ea) are enabled? They are enabled in unit tests and Enso tests, both on the CI and locally, but they are disabled in the production.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enabling full stack traces when assertions enabled (#8608 (comment)). But that might be too complicated to implement. At least in this PR. Looks fine otherwise. Also, don't forget to report the performance changes pls.

@radeusgd
Copy link
Member

But that might be too complicated to implement. At least in this PR.

I don't think we can merge a PR that disables stack traces and does not give us a way to enable them in debug mode. That would hinder developer UX horribly.

I would very much appreciate being able to get stack traces in some 'debug' mode, I really need it much for my day-to-day work. Trying to summarize my day-to-day work, I think I browse tens of traces a day on average (maybe more), some of which are from panics but a significant part of these are from dataflow errors. Losing that capability will make it harder to implement new features and will slow us down.

@@ -1877,8 +1877,9 @@ class RuntimeVisualizationsTest
message =
"Method `does_not_exist` of type Main could not be found.",
stack = Vector(
Api.StackTraceElement("<eval>", None, None, None),
Api.StackTraceElement("Debug.eval", None, None, None)
// empty stack for now
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate change. Result of trying to unify working with stacktraces behind InteropLibrary. Let's see if it can be somewhat mitigated tomorrow.

@JaroslavTulach
Copy link
Member Author

We have four FAILED tests in test/Tests suite.

@JaroslavTulach
Copy link
Member Author

Benchmark run indicates the Panics_And_Errors_10000_Dataflow_Error test (created by #8130 pull request) has been sped up significantly:

7000x speedup

Previously it took 329ms, now it runs in 0,045ms.

@JaroslavTulach
Copy link
Member Author

Since 10a8452 the following x.enso program:

from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

deep n = if n <= 0 then Error.throw (Illegal_Argument.Error "Problem") else deep n-1

main =
    d = deep 10
    d.get_stack_trace_text

produces just:

$ enso --run x.enso
        at <enso> x.deep<arg-1>(x.enso:4:25-70)

and full thread dump when executed with assertions:

$ JAVA_OPTS=-ea enso --run x.enso
        at <enso> x.deep<arg-1>(x.enso:4:25-70)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.deep<arg-2>(x.enso:4:77-84)
        at <enso> x.deep(x.enso:4:10-84)
        at <enso> x.main(x.enso:7:9-15)

I believe this addresses Radek's complain - or at least provides the basics to mitigate it for now.

@@ -76,7 +76,7 @@ spec =

Test.group "Enso_User - local mock integration tests" pending=pending_has_url <|
# These tests should be kept in sync with tools/http-test-helper/src/main/java/org/enso/shttp/cloud_mock/UsersHandler.java
Test.specify "current user can be fetched from mock API" <|
Test.specify "current user can be fetched from mock API" pending=pending_has_url <|
Copy link
Member

@radeusgd radeusgd Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that 🤦

I fixed that in the #8591 PR which has been merged recently.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Thank you very much for addressing my concerns and ensuring that developer UX is not significantly worsened - the debug mode will really be invaluable for us!

The performance improvements are awesome (expectably 🙂).

@radeusgd
Copy link
Member

I have a suggestion (for another followup PR perhaps).

Given that we have the withTrace and withoutTrace methods - why not expose this to users (i.e. lib devs)?

I think it could be good to give us control over whether a given error will hold the trace or not - this way we can decide when we want better performance vs better diagnostics. I think there are definitely places where both make sense:

  1. in places like Map.get - Error.throw Not_Found should likely be fast and the stack trace may not be that needed. So we could keep using the fast Error.throw (as I assume lack of traces would be the default).
  2. in places like handle_sql_errors - I think we may want to prefer a Error.throw_with_trace. As I told before - the error location of handle_sql_errors will tell the user nothing useful - so here including the full trace will be more helpful. And an SQL error is not a performance-intensive kind of error - it is unlikely to happen in a loop; it will have collected the trace on the original Host Java exception anyway - so the performance difference will not be noticeable, but we could regain better diagnostics.

I think this could be the best compromise we can get.

@JaroslavTulach
Copy link
Member Author

I have a suggestion (for another followup PR perhaps).

Certainly a follow up PR and issue. Feel free to record it/implement it.

Given that we have the withTrace and withoutTrace methods - why not expose this to users (i.e. lib devs)?

Yes, I was also thinking about Error.throw withTrace=False. However I am not conviced that it is exactly what users want...

I think it could be good to give us control over whether a given error will hold the trace or not - this way we can decide when we want better performance vs better diagnostics. I think there are definitely places where both make sense:

1. in places like `Map.get` - `Error.throw Not_Found` should likely be fast and the stack trace may not be that needed. So we could keep using the fast `Error.throw` (as I assume lack of traces would be the default).

2. in places like `handle_sql_errors` - I think we may want to prefer a `Error.throw_with_trace`. As I told before - the error location of `handle_sql_errors` will tell the user nothing useful - so here including the full trace will be more helpful. And an SQL error is not a performance-intensive kind of error - it is unlikely to happen in a loop; it will have collected the trace on the original Host Java exception _anyway_ - so the performance difference will not be noticeable, but we could regain better diagnostics.

The issue is:

  • it is not the author of the Error.throw who should decide whether to capture the stack trace or not
  • It is the user of some library code that than accidentally calls Error.throw
  • E.g. there should be a way to turn stack trace capture even for code which isn't doing that by default
  • Currently we have -ea for that, but yes, the meaning of -ea is becoming a bit too overloaded

I was rather thinking about something associated with State (at the end the tracing behavior is associated with state of the execution, not the code itself). Something like Runtime.Context.Tracing.with_enabled, maybe? Anyway that's for another issue/discussion/PR.

@JaroslavTulach JaroslavTulach merged commit 07d58f2 into develop Dec 24, 2023
34 checks passed
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/DataflowError_8137 branch December 24, 2023 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to improve performance of Dataflow Errors
3 participants