Add filename and line number to backtrace #3343

tanishiking · 2023-06-21T08:21:24Z

Based on #2869

TL;DR

This PR adds filename and line number to scala-native backtrace using DWARF debug information.

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Throwables.scala:56)
        at java.lang.StackTrace$$$Lambda$3.apply(Throwables.scala:56)
        at scala.scalanative.unsafe.Zone$.apply(Zone.scala:27)
        at java.lang.StackTrace$.currentStackTrace(Throwables.scala:50)
        at java.lang.Throwable.fillInStackTrace(Throwables.scala:126)
        at Test$.error(Test.scala:9)
        at Test$.g(Test.scala:7)
        at Test$.f(Test.scala:5)
        at Test$.main(Test.scala:1)
        at Test.main(Test.scala:1)
        at <none>.main(Unknown Source)

Read/parse myself executable at runtime (when fillInStacktrace is called), and extract debug information and add the filename and line number information to backtrace. Also, the runtime will read /proc/self/maps or call ~~vmmap <pid>~~ _dyld_get_image_vmaddr_slide to calculate the ASLR offset.

Background

Current scala-native backtrace doesn't have filename and line number information.

For example:

object Test {
  def main(args: Array[String]): Unit = {
    f()
  }

  def f() = g()

  def g() = error()

  def error() = throw new Error("test")
}

scala-native will emit the following backtrace, that have Unknown Source instead of filename and line number.

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Unknown Source)
        at java.lang.StackTrace$$$Lambda$2.apply(Unknown Source)
        at scala.scalanative.unsafe.Zone$.apply(Unknown Source)
        at java.lang.StackTrace$.currentStackTrace(Unknown Source)
        at java.lang.Throwable.fillInStackTrace(Unknown Source)
        at Test$.error(Unknown Source)
        at Test$.g(Unknown Source)
        at Test$.f(Unknown Source)
        at Test$.main(Unknown Source)
        at Test.main(Unknown Source)
        at <none>.main(Unknown Source)

scala-native construct backtrace using (llvm) libunwind around here and it gives us the (mangled) symbol name and the call address, but nothing beyond.

However, thanks to @keynmol's recent effort #2869 scala-native will be able to generate DWARF debug information. And they (filename and line number) are in the DWARF information of the binary, and given the call address we can extract the exact call location.

Also, he developed DWARF parser (+ macho parser) in Scala, which can be re-used in scala-native implementation https://github.com/indoorvivants/macho-elf-coff-parser/ 🎉

Algorithm overview

When exception is raised and called fillInStackTrace
Extract the DWARF debug information by parsing myself executable (currently macho is supported)
- It takes approximately 300ms (for sandbox project, on my local machine). It's fast as something like llvm-dwarfdump AFAIK, and it's only one time. However, still it hurts performance if the program contains a Throwable object 🤔
Calculate the ASLR slide by running ~~vmmap command against for myself PID~~ _dyld_get_image_vmaddr_slide (for OSX, for Linux we will need to check /proc/<pid>/maps).
Search for a Debug Information Entry that corresponds to the address of the subprogram (retrieved from unwind.get_reg(cursor, unwind.UNW_REG_IP, ip) and adjusted for ASLR offset)
We should be able to get the subprogram (function) name and line number from DIEs.

What this PR does

This PR

Migrate https://github.com/indoorvivants/macho-elf-coff-parser/ to this repository
- Added some improvement to speed up parsing DWARF information
Read/parse myself executable (argv[0]) and read/parse DWARF information
Read ~~vmmap (in OSX) / /proc/self/maps~~ _dyld_get_image_vmaddr_slide to calculate ASLR offset
- I have an WIP small blog post why this is needed https://gist.github.com/tanishiking/3fbacf36a12dfd0991127f3db6df9e53
Search for an DIE that has filename and line number
Add those information to StacktraceElement

As a result, scala-native's backtrace (for the above Scala snippet) looks like this:

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Throwables.scala:56)
        at java.lang.StackTrace$$$Lambda$3.apply(Throwables.scala:56)
        at scala.scalanative.unsafe.Zone$.apply(Zone.scala:27)
        at java.lang.StackTrace$.currentStackTrace(Throwables.scala:50)
        at java.lang.Throwable.fillInStackTrace(Throwables.scala:126)
        at Test$.error(Test.scala:9)
        at Test$.g(Test.scala:7)
        at Test$.f(Test.scala:5)
        at Test$.main(Test.scala:1)
        at Test.main(Test.scala:1)
        at <none>.main(Unknown Source)

References

libunwind basics
- Programmatic access to the call stack in C++ - Eli Bendersky's website
- libunwind documentation (though scala-native uses llvm one)
Confirm we're on a right direction
- ianlancetaylor/libbacktrace: A C library that may be linked into a C/C++ program to produce symbolic backtraces
- Backtrace line numbers by ysbaddaden · Pull Request #3303 · crystal-lang/crystal
Workaround for PIE/ASLR
- Not getting line numbers with addr2line · Issue #97 · boostorg/stacktrace
- (Chinese blog post about Mach-O and ASLR) Mach-O 文件之进程(虚拟)地址空间、ASLR - 简书
- How do you read the memory maps of a Mac process?
DWARF
- How debuggers work: Part 3 - Debugging information - Eli Bendersky's website
- (Japanese blog post about debug info) デバッグ情報の歩き方 - Qiita

…ent (fixes linux-x86 builds)

https://github.com/indoorvivants/macho-elf-coff-parser Co-authored-by: Anton Sviridov <keynmol@gmail.com>

tanishiking · 2023-08-04T08:13:14Z

One last thing to fix is run/build-library-static on Windows 🤔 but I don't have windows machine at hand right now, will check later

WojciechMazur

Good job on this one! So far looks good, in long term it would be great to lower ammount of class allocations via tuples, Options or Try, mostly becouse it's very low level part of runtime and we should make sure not the impose to much overhead. However, these changes could wait until it would works correctly.

WojciechMazur · 2023-08-04T11:13:09Z

javalib/src/main/scala/java/lang/Throwables.scala

+      val maybeFileline =
+        if (recur) None
+        else
+          Try(Backtrace.decodeFileline(ip.toLong)) match {
+            // Ignore the exception, should we expose the internal error somehow?
+            case Failure(exception) => None
+            case Success(value)     => value
+          }
+
+      val updated =
+        maybeFileline match {
+          case None => elem
+          case Some(v) =>
+            new StackTraceElement(
+              elem.getClassName,
+              elem.getMethodName,
+              v._1,
+              v._2
+            )
+        }


Becouse it is really low level part of the code, it might be better to use try-catch instead of util.Try, and if possible to get rid of Option which when not optimized out would give us additional instantisation of Option per each element of buffer.
Probably something like following should be fine as well:

if(recur) elem else try { val v = Backtrace.decodeFileline(ip.toLong) new StackTraceElement(..,.., v._1, v._2) } catch {case ex: ??? => elem }

WojciechMazur · 2023-08-04T11:16:31Z

javalib/src/main/scala/java/lang/Throwables.scala

+      val maybeFileline =
+        if (recur) None
+        else
+          Try(Backtrace.decodeFileline(ip.toLong)) match {


I wonder can it decoding filename be a part of makeStackTraceElement. This which is effectively cached in cachedStackTraceElemen. This way we would need to decode file name only once per IP. In such case we also would not be required to allocate Tuple of (StackTraceElement, CULong)

Originally, we used to call Backtrace.decodeFileline from within the makeStackTraceElement function. However, I intentionally moved this functionality out to a different location, as can be seen in this commit: 0e2c67a

This change was made because we were checking for recursive calls to currentStackTrace by examining the stack trace and counting the occurrences of currentStackTrace entries.

I thought we can call it from within makeStackTraceElement in other way (which I found doesn't work)
The idea was that the first call to currentStackTrace would record the stack pointer's address at that moment. Subsequent calls to currentStackTrace would then compare their stack pointer addresses to the initial one. If the address was lower in the subsequent call, it would indicate a recursive invocation, and thus could be treated as such.

However, I realized that this approach doesn't always work as expected, particularly if the exception object is generated multiple times within the application.

In order to tell we have recursive call to currentStackTrace, we have to know the whole stacktrace.

WojciechMazur · 2023-08-04T11:17:35Z

javalib/src/main/scala/java/lang/Throwables.scala

+            )
+        }
+      // Update cache with the updated stacktrace element
+      cache.update(ip, updated)


We probably should check if value in cache already has a filename set. currently we set filename even if StackTraceElem already contains it

Right, now we check the stacktrace already has filename before decoding their filename and line number by

if (... elem.getFileName != null // Skip decoding if we already have filename information ) elem

in Throwables.scala, currentStackTrace()

WojciechMazur · 2023-08-04T13:32:42Z

javalib/src/main/scala/java/lang/Throwables.scala

+    //
+    // Consequently, to mitigate the risk of cascading recursive exceptions,
+    // skip executing "BackTrace.decodeFileline" if "currentStackTrace" is already being invoked recursively.
+    val recur = buffer.count(e => e._1.getMethodName == "currentStackTrace") > 1


At some point to would be good to eliminate this check, becouse it would happen each time we create stack trace.
Based on the information where the exception in decodeFileline is thrown we should be able to guard it with runtime checks and not allow to throw the exception in the first place.

An alternative would be also to use ThreadLocal boolean set at the beginning of currentStacktrace This way if we renter this method recursively we would know that the current thread throwed the exception.

Since it is really low level part of the code, and it might be better to use `try-catch`? Also, skip decoding if the elem already has an filename

javalib/src/main/scala/java/lang/Throwables.scala

tanishiking · 2023-08-14T05:25:21Z

The rest thing to do is to work on #3343 (comment) by using ThreadLocal boolean instead. But maybe we can work on it in another PR? Otherwise, it's ready for review :)

WojciechMazur

Looks good. When testing this out I've made some changes in my local branch which could be potentially integrated into this PR: WojciechMazur@1cdb0c7

1 remaining thing would be the automatic invoke fo dsymutils, I've added a bigger comment below

WojciechMazur · 2023-08-22T10:27:58Z

javalib/src/main/scala/java/lang/Throwables.scala

        }
      }
    }

-    buffer.toArray
+    if (LinktimeInfo.isMac) {


We should add additional LinktimeInfo.hasDebugMetadata which would be set by the NativeConfig value and check it here as well.. This way we would not risk the penalty of trying to decode positions which would not be existing in the output file anyway.

WojciechMazur · 2023-08-22T11:33:29Z

scripted-tests/run/backtrace/build.sbt

+        .getProperty("os.name")
+        .toLowerCase(Locale.ROOT)
+        .startsWith("mac")) {
+    Process(s"dsymutil ${path.getAbsolutePath()}") !


This step is necessary to make the positions available. Can we try to integrate it with the main workflow? We can try to run it, and then if it's not present we should log a warning for the users that dsumutil and missing and stacktraces would not contain positions, and potentially if executiono fit fails we should give another warning.
Streaminline this into the main workflow would actually make this feature the most usefull in tests where frequent exceptions would guide us in resolving bugs and slower startup is not an issue.

dsymutil is part of LLVM toolchain so we can try to discover it similary to llvm-ar here: https://github.com/WojciechMazur/scala-native/blob/1cdb0c71d9dcd1b527d70d0d36a0ce81855977af/tools/src/main/scala/scala/scalanative/build/LLVM.scala#L232

This step is necessary to make the positions available. Can we try to integrate it with the main workflow? ...
dsymutil is part of LLVM toolchain so we can try to discover it similary to llvm-ar here:

Ah, that sounds better! 👍 Actually clang also integrate dsymutil into its workflow.

Just as a note, Apple toolchain does supply dsymutil but not llvm-ar but I understand ar will work. I am thinking the archive won't work on Apple toolchain with our current setup.

WojciechMazur · 2023-08-24T08:28:17Z

tools/src/main/scala/scala/scalanative/build/LLVM.scala

+    if (result != 0) {
+      throw new BuildException(s"Failed to run dsymutil ${path}")
+    }


If the dsymutil fails it should rather be a warning saying that we failed to postprocess binary and the stack traces would not work correctly.

WojciechMazur · 2023-08-24T08:28:53Z

tools/src/main/scala/scala/scalanative/build/LLVM.scala

@@ -150,6 +150,16 @@ private[scalanative] object LLVM {
    copyOutput(config, buildPath)
  }

+  def dsymutil(config: Config, path: Path) = {


It's a public method so explicit result type would be recommended

WojciechMazur · 2023-08-24T08:33:19Z

tools/src/main/scala/scala/scalanative/build/LLVM.scala

@@ -150,6 +150,16 @@ private[scalanative] object LLVM {
    copyOutput(config, buildPath)
  }

+  def dsymutil(config: Config, path: Path) = {
+    val dsymutil = Discover.discover("dsymutil", "LLVM_BIN")


Currently discover is always failing if it cannot find given binary. Maybe let's introduce an additional variant of it def tryDiscover(String, String): Option[Path]. If we won't find it should not be a fatal error, but warning instead. It would be great if in case when we don't find dsymutil we point to the user that we couldn't find this program, possibly show the paths we tried, and point him up that he can use the LLVM_BIN env to try fix this issue.

WojciechMazur · 2023-08-24T08:34:08Z

scripted-tests/run/backtrace/build.sbt

+    .withMode(Mode.debug) // otherwise, clang O2 inlines the call stack in Linux
+}
+
+lazy val debugBuild = taskKey[Unit]("Compile and run dsymutil if exists")


This should not be required anymore

WojciechMazur · 2023-08-24T08:37:55Z

tools/src/main/scala/scala/scalanative/build/Build.scala

+            .map { p =>
+              val linked = backend.link(p)
+              if (Platform.isMac && config.compilerConfig.debugMetadata) {
+                backend.dsymutil(linked)
+              }
+              linked
+            }


Maybe let's create an additional method backend.postprocess instead which would call the dsymutil and possibly other non-crucial post processing operations.

WojciechMazur · 2023-08-24T08:40:14Z

tools/src/main/scala/scala/scalanative/build/LLVM.scala

+  private def prepareDSymUtilCommand() = {
+    val dsymutil = Discover.discover("dsymutil", "LLVM_BIN")
+  }
+


Unused method?

WojciechMazur · 2023-08-24T08:42:22Z

tools/src/main/scala/scala/scalanative/build/Build.scala

@@ -172,6 +178,12 @@ object Build {
    ) {
      LLVM.link(config, linkerResult, compiled)
    }
+
+    def dsymutil(linked: Path): Unit = time(
+      s"Running dsymutil on ${linked.getFileName}"


We should not require users to now what dsymutil is and what it does. Maybe let's just introduce a single logging of time spent in whole postprocessing stage (based on suggestion to move all postprocessing to seperate step). At that point can remove this method and use LLVM.dsymutil directly in postprocessing stage.

keynmol and others added 30 commits February 16, 2023 16:25

WIP somewhat functional DWARF renderer

7b19254

WIP: generate dwarf metadata

72b9575

Try anything at this point

270d0e8

Quick, commit while it works

33695fd

Formatting

648e4a0

Handle unit return types

147bfd1

Cache call position by scope as well

9fd664e

Add location metadata to invoke

654835e

Return sandbox to the way it was

8655222

Fix compilation errors

c843ba0

Erase GenIdx from public API, and cleanup

702eae9

WIP

d14c74d

Fix issues after rebase, rename to LLVMDebugInformation

8d67657

Merge branch 'main' into they-are-taking-DWARFs-to-isengard

483aeaa

Hacky resolution of filename for URLs

1e884a1

chore: run clangfmt

ee966f6

chore: scalafmt

f01161b

Be even more restrictive about positions

543ef98

Back to making file a required parameter

8aeb2c6

Try to satisfy JVM 8

deb5e03

just a gift that keeps on giving

9b2c0dc

Make LLVM metadata addition configurable and disable it by default

b4482ab

Merge branch 'main' into they-are-taking-DWARFs-to-isengard

d785864

Code review comments

f55c49b

Merge branch 'main' into they-are-taking-DWARFs-to-isengard

c4dc4b3

Add mising dbg metadata for invoke calls.

bd92200

Ensure dereferenceable_or_null is always using i64 type as an argum…

4cf6888

…ent (fixes linux-x86 builds)

Copy macho-elf-coff-parser to scala-native

43f535d

https://github.com/indoorvivants/macho-elf-coff-parser Co-authored-by: Anton Sviridov <keynmol@gmail.com>

Make dwarf parser compile

ac88a4c

WIP

4cedf1a

tanishiking added 6 commits August 3, 2023 19:11

Fix CI

ad42fba

Extend timeout in OptimizerSpec since it started timeout in CI

1b85b08

Fix scripted tests backtrace in Windows and Linux

8b623af

Run dsymutil only on Mac

04d7f6a

Use toString for stacktrace

38cb60e

Remove unused getpid

69ab6a7

WojciechMazur reviewed Aug 4, 2023

View reviewed changes

tanishiking added 4 commits August 8, 2023 12:47

Don't try to run decodeFileline on Linux and Windows

35eb304

Use try-catch block instead of util.Try

8223e68

Since it is really low level part of the code, and it might be better to use `try-catch`? Also, skip decoding if the elem already has an filename

Test: print raw stacktraces

a08556b

try noinline

ebdf020

armanbilge reviewed Aug 9, 2023

View reviewed changes

javalib/src/main/scala/java/lang/Throwables.scala Outdated Show resolved Hide resolved

tanishiking added 3 commits August 13, 2023 19:45

Use LinktimeInfo + fix for Linux CI

11d3e46

Merge branch 'main' into fileline-stacktrace

82335e8

Fix scripted test in Linux by running clang with O0

7d2e3eb

Merge branch 'main' into fileline-stacktrace

dda0271

WojciechMazur reviewed Aug 22, 2023

View reviewed changes

tanishiking added 2 commits August 24, 2023 15:02

Integrate dsymutil into scala-native

51527c4

Guard not to enhance stacktrace if no debug metadata

5b32df9

WojciechMazur reviewed Aug 24, 2023

View reviewed changes

tanishiking added 4 commits August 25, 2023 13:21

Running dsymutil in postprocess

aa87156

Remove unused method

7c41c00

Don't fail if dsymutil didn't work

042abed

Remove unnecessary debugBuild task from test

c5ccc3e

WojciechMazur approved these changes Aug 25, 2023

View reviewed changes

WojciechMazur merged commit 723aef6 into scala-native:main Aug 25, 2023
79 checks passed

WojciechMazur mentioned this pull request Aug 26, 2023

N_OSO symbols for packages in IssuesTeset.scala are missing in the final binary #3458

Closed

tanishiking deleted the fileline-stacktrace branch August 30, 2023 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filename and line number to backtrace #3343

Add filename and line number to backtrace #3343

tanishiking commented Jun 21, 2023 •

edited

tanishiking commented Aug 4, 2023

WojciechMazur left a comment

WojciechMazur Aug 4, 2023

WojciechMazur Aug 4, 2023

tanishiking Aug 4, 2023

WojciechMazur Aug 4, 2023

tanishiking Aug 14, 2023 •

edited

WojciechMazur Aug 4, 2023

tanishiking commented Aug 14, 2023

WojciechMazur left a comment

WojciechMazur Aug 22, 2023

WojciechMazur Aug 22, 2023

tanishiking Aug 22, 2023

ekrich Aug 22, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

WojciechMazur Aug 24, 2023

Add filename and line number to backtrace #3343

Add filename and line number to backtrace #3343

Conversation

tanishiking commented Jun 21, 2023 • edited

TL;DR

Background

Algorithm overview

What this PR does

References

tanishiking commented Aug 4, 2023

WojciechMazur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanishiking Aug 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanishiking commented Aug 14, 2023

WojciechMazur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanishiking commented Jun 21, 2023 •

edited

tanishiking Aug 14, 2023 •

edited