Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filename and line number to backtrace #3343

Merged
merged 65 commits into from Aug 25, 2023

Conversation

tanishiking
Copy link
Member

@tanishiking tanishiking commented Jun 21, 2023

Based on #2869

TL;DR

This PR adds filename and line number to scala-native backtrace using DWARF debug information.

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Throwables.scala:56)
        at java.lang.StackTrace$$$Lambda$3.apply(Throwables.scala:56)
        at scala.scalanative.unsafe.Zone$.apply(Zone.scala:27)
        at java.lang.StackTrace$.currentStackTrace(Throwables.scala:50)
        at java.lang.Throwable.fillInStackTrace(Throwables.scala:126)
        at Test$.error(Test.scala:9)
        at Test$.g(Test.scala:7)
        at Test$.f(Test.scala:5)
        at Test$.main(Test.scala:1)
        at Test.main(Test.scala:1)
        at <none>.main(Unknown Source)

Read/parse myself executable at runtime (when fillInStacktrace is called), and extract debug information and add the filename and line number information to backtrace. Also, the runtime will read /proc/self/maps or call vmmap <pid> _dyld_get_image_vmaddr_slide to calculate the ASLR offset.

  • Support various executable format
    • MachO
    • ELF
    • PE/COFF
  • Enable to work under PIE/ASLR
    • OSX/MACHO (_dyld_get_image_vmaddr_slide)
    • ELF/Linux (/proc/self/maps)
    • Windows (?)
  • Read separate debug file
    • dSYM
    • build id
    • gnu debug link
  • optional: binary search DIE

Background

Current scala-native backtrace doesn't have filename and line number information.

For example:

object Test {
  def main(args: Array[String]): Unit = {
    f()
  }

  def f() = g()

  def g() = error()

  def error() = throw new Error("test")
}

scala-native will emit the following backtrace, that have Unknown Source instead of filename and line number.

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Unknown Source)
        at java.lang.StackTrace$$$Lambda$2.apply(Unknown Source)
        at scala.scalanative.unsafe.Zone$.apply(Unknown Source)
        at java.lang.StackTrace$.currentStackTrace(Unknown Source)
        at java.lang.Throwable.fillInStackTrace(Unknown Source)
        at Test$.error(Unknown Source)
        at Test$.g(Unknown Source)
        at Test$.f(Unknown Source)
        at Test$.main(Unknown Source)
        at Test.main(Unknown Source)
        at <none>.main(Unknown Source)

scala-native construct backtrace using (llvm) libunwind around here and it gives us the (mangled) symbol name and the call address, but nothing beyond.

However, thanks to @keynmol's recent effort #2869 scala-native will be able to generate DWARF debug information. And they (filename and line number) are in the DWARF information of the binary, and given the call address we can extract the exact call location.

Also, he developed DWARF parser (+ macho parser) in Scala, which can be re-used in scala-native implementation https://github.com/indoorvivants/macho-elf-coff-parser/ 🎉

Algorithm overview

  • When exception is raised and called fillInStackTrace
  • Extract the DWARF debug information by parsing myself executable (currently macho is supported)
    • It takes approximately 300ms (for sandbox project, on my local machine). It's fast as something like llvm-dwarfdump AFAIK, and it's only one time. However, still it hurts performance if the program contains a Throwable object 🤔
  • Calculate the ASLR slide by running vmmap command against for myself PID _dyld_get_image_vmaddr_slide (for OSX, for Linux we will need to check /proc/<pid>/maps).
  • Search for a Debug Information Entry that corresponds to the address of the subprogram (retrieved from unwind.get_reg(cursor, unwind.UNW_REG_IP, ip) and adjusted for ASLR offset)
  • We should be able to get the subprogram (function) name and line number from DIEs.

What this PR does

This PR

As a result, scala-native's backtrace (for the above Scala snippet) looks like this:

java.lang.Error: test
        at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Throwables.scala:56)
        at java.lang.StackTrace$$$Lambda$3.apply(Throwables.scala:56)
        at scala.scalanative.unsafe.Zone$.apply(Zone.scala:27)
        at java.lang.StackTrace$.currentStackTrace(Throwables.scala:50)
        at java.lang.Throwable.fillInStackTrace(Throwables.scala:126)
        at Test$.error(Test.scala:9)
        at Test$.g(Test.scala:7)
        at Test$.f(Test.scala:5)
        at Test$.main(Test.scala:1)
        at Test.main(Test.scala:1)
        at <none>.main(Unknown Source)

References

@tanishiking
Copy link
Member Author

One last thing to fix is run/build-library-static on Windows 🤔 but I don't have windows machine at hand right now, will check later

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on this one! So far looks good, in long term it would be great to lower ammount of class allocations via tuples, Options or Try, mostly becouse it's very low level part of runtime and we should make sure not the impose to much overhead. However, these changes could wait until it would works correctly.

Comment on lines 77 to 96
val maybeFileline =
if (recur) None
else
Try(Backtrace.decodeFileline(ip.toLong)) match {
// Ignore the exception, should we expose the internal error somehow?
case Failure(exception) => None
case Success(value) => value
}

val updated =
maybeFileline match {
case None => elem
case Some(v) =>
new StackTraceElement(
elem.getClassName,
elem.getMethodName,
v._1,
v._2
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becouse it is really low level part of the code, it might be better to use try-catch instead of util.Try, and if possible to get rid of Option which when not optimized out would give us additional instantisation of Option per each element of buffer.
Probably something like following should be fine as well:

if(recur) elem
else try { 
  val v = Backtrace.decodeFileline(ip.toLong)
  new StackTraceElement(..,.., v._1, v._2)
 } catch {case ex: ??? => elem }

val maybeFileline =
if (recur) None
else
Try(Backtrace.decodeFileline(ip.toLong)) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder can it decoding filename be a part of makeStackTraceElement. This which is effectively cached in cachedStackTraceElemen. This way we would need to decode file name only once per IP. In such case we also would not be required to allocate Tuple of (StackTraceElement, CULong)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, we used to call Backtrace.decodeFileline from within the makeStackTraceElement function. However, I intentionally moved this functionality out to a different location, as can be seen in this commit: 0e2c67a

This change was made because we were checking for recursive calls to currentStackTrace by examining the stack trace and counting the occurrences of currentStackTrace entries.


I thought we can call it from within makeStackTraceElement in other way (which I found doesn't work)
The idea was that the first call to currentStackTrace would record the stack pointer's address at that moment. Subsequent calls to currentStackTrace would then compare their stack pointer addresses to the initial one. If the address was lower in the subsequent call, it would indicate a recursive invocation, and thus could be treated as such.

However, I realized that this approach doesn't always work as expected, particularly if the exception object is generated multiple times within the application.

In order to tell we have recursive call to currentStackTrace, we have to know the whole stacktrace.

)
}
// Update cache with the updated stacktrace element
cache.update(ip, updated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should check if value in cache already has a filename set. currently we set filename even if StackTraceElem already contains it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now we check the stacktrace already has filename before decoding their filename and line number by

        if (...
            elem.getFileName != null // Skip decoding if we already have filename information
        ) elem

in Throwables.scala, currentStackTrace()

//
// Consequently, to mitigate the risk of cascading recursive exceptions,
// skip executing "BackTrace.decodeFileline" if "currentStackTrace" is already being invoked recursively.
val recur = buffer.count(e => e._1.getMethodName == "currentStackTrace") > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point to would be good to eliminate this check, becouse it would happen each time we create stack trace.
Based on the information where the exception in decodeFileline is thrown we should be able to guard it with runtime checks and not allow to throw the exception in the first place.

An alternative would be also to use ThreadLocal boolean set at the beginning of currentStacktrace This way if we renter this method recursively we would know that the current thread throwed the exception.

Since it is really low level part of the code, and it might
be better to use `try-catch`?

Also, skip decoding if the elem already has an filename
@tanishiking
Copy link
Member Author

The rest thing to do is to work on #3343 (comment) by using ThreadLocal boolean instead. But maybe we can work on it in another PR? Otherwise, it's ready for review :)

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. When testing this out I've made some changes in my local branch which could be potentially integrated into this PR: WojciechMazur@1cdb0c7

1 remaining thing would be the automatic invoke fo dsymutils, I've added a bigger comment below

}
}
}

buffer.toArray
if (LinktimeInfo.isMac) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add additional LinktimeInfo.hasDebugMetadata which would be set by the NativeConfig value and check it here as well.. This way we would not risk the penalty of trying to decode positions which would not be existing in the output file anyway.

.getProperty("os.name")
.toLowerCase(Locale.ROOT)
.startsWith("mac")) {
Process(s"dsymutil ${path.getAbsolutePath()}") !
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is necessary to make the positions available. Can we try to integrate it with the main workflow? We can try to run it, and then if it's not present we should log a warning for the users that dsumutil and missing and stacktraces would not contain positions, and potentially if executiono fit fails we should give another warning.
Streaminline this into the main workflow would actually make this feature the most usefull in tests where frequent exceptions would guide us in resolving bugs and slower startup is not an issue.

dsymutil is part of LLVM toolchain so we can try to discover it similary to llvm-ar here: https://github.com/WojciechMazur/scala-native/blob/1cdb0c71d9dcd1b527d70d0d36a0ce81855977af/tools/src/main/scala/scala/scalanative/build/LLVM.scala#L232

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is necessary to make the positions available. Can we try to integrate it with the main workflow? ...
dsymutil is part of LLVM toolchain so we can try to discover it similary to llvm-ar here:

Ah, that sounds better! 👍 Actually clang also integrate dsymutil into its workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note, Apple toolchain does supply dsymutil but not llvm-ar but I understand ar will work. I am thinking the archive won't work on Apple toolchain with our current setup.

Comment on lines 158 to 160
if (result != 0) {
throw new BuildException(s"Failed to run dsymutil ${path}")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dsymutil fails it should rather be a warning saying that we failed to postprocess binary and the stack traces would not work correctly.

@@ -150,6 +150,16 @@ private[scalanative] object LLVM {
copyOutput(config, buildPath)
}

def dsymutil(config: Config, path: Path) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a public method so explicit result type would be recommended

@@ -150,6 +150,16 @@ private[scalanative] object LLVM {
copyOutput(config, buildPath)
}

def dsymutil(config: Config, path: Path) = {
val dsymutil = Discover.discover("dsymutil", "LLVM_BIN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently discover is always failing if it cannot find given binary. Maybe let's introduce an additional variant of it def tryDiscover(String, String): Option[Path]. If we won't find it should not be a fatal error, but warning instead. It would be great if in case when we don't find dsymutil we point to the user that we couldn't find this program, possibly show the paths we tried, and point him up that he can use the LLVM_BIN env to try fix this issue.

.withMode(Mode.debug) // otherwise, clang O2 inlines the call stack in Linux
}

lazy val debugBuild = taskKey[Unit]("Compile and run dsymutil if exists")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be required anymore

Comment on lines 121 to 127
.map { p =>
val linked = backend.link(p)
if (Platform.isMac && config.compilerConfig.debugMetadata) {
backend.dsymutil(linked)
}
linked
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's create an additional method backend.postprocess instead which would call the dsymutil and possibly other non-crucial post processing operations.

Comment on lines 267 to 270
private def prepareDSymUtilCommand() = {
val dsymutil = Discover.discover("dsymutil", "LLVM_BIN")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused method?

@@ -172,6 +178,12 @@ object Build {
) {
LLVM.link(config, linkerResult, compiled)
}

def dsymutil(linked: Path): Unit = time(
s"Running dsymutil on ${linked.getFileName}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not require users to now what dsymutil is and what it does. Maybe let's just introduce a single logging of time spent in whole postprocessing stage (based on suggestion to move all postprocessing to seperate step). At that point can remove this method and use LLVM.dsymutil directly in postprocessing stage.

@WojciechMazur WojciechMazur merged commit 723aef6 into scala-native:main Aug 25, 2023
79 checks passed
@tanishiking tanishiking deleted the fileline-stacktrace branch August 30, 2023 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants