Skip to content

Conversation

@rjmansfield
Copy link
Contributor

The --source option was broken when using the --macho flag because DisassembleMachO() only initialized debug info when UseDbg was true, and would return early if no dSYM was found.

The --source option was broken when using the --macho flag because
DisassembleMachO() only initialized debug info when UseDbg was true,
and would return early if no dSYM was found.
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Ryan Mansfield (rjmansfield)

Changes

The --source option was broken when using the --macho flag because DisassembleMachO() only initialized debug info when UseDbg was true, and would return early if no dSYM was found.


Full diff: https://github.com/llvm/llvm-project/pull/163810.diff

2 Files Affected:

  • (modified) llvm/test/tools/llvm-objdump/MachO/disassemble-source-dsym.test (+12)
  • (modified) llvm/tools/llvm-objdump/MachODump.cpp (+25-7)
diff --git a/llvm/test/tools/llvm-objdump/MachO/disassemble-source-dsym.test b/llvm/test/tools/llvm-objdump/MachO/disassemble-source-dsym.test
index aaaf6bffe4a33..f7fb621244338 100644
--- a/llvm/test/tools/llvm-objdump/MachO/disassemble-source-dsym.test
+++ b/llvm/test/tools/llvm-objdump/MachO/disassemble-source-dsym.test
@@ -13,4 +13,16 @@
 # RUN: dsymutil -f  -oso-prepend-path=%p/../../dsymutil/ %t3 -o %t3.dSYM
 # RUN: llvm-objdump --source --prefix=%p/../../dsymutil  %t3 | FileCheck --check-prefix=SOURCE %s
 
+## Test that --source works with --macho flag
+
+## --macho w/ explicit .dSYM
+# RUN: llvm-objdump < %p/../../dsymutil/Inputs/basic.macho.x86_64 - --source --macho --dsym=%t1.dSYM --prefix=%p/../../dsymutil | \
+# RUN:   FileCheck --check-prefix=SOURCE %s
+
+## --macho w/ auto-detected .dSYM (dir)
+# RUN: llvm-objdump --source --macho --prefix=%p/../../dsymutil %t2 | FileCheck --check-prefix=SOURCE %s
+
+## --macho w/ auto-detected .dSYM (file)
+# RUN: llvm-objdump --source --macho --prefix=%p/../../dsymutil %t3 | FileCheck --check-prefix=SOURCE %s
+
 # SOURCE: ; int bar(int arg) {
diff --git a/llvm/tools/llvm-objdump/MachODump.cpp b/llvm/tools/llvm-objdump/MachODump.cpp
index 8e9c91fde544d..2d1259191c580 100644
--- a/llvm/tools/llvm-objdump/MachODump.cpp
+++ b/llvm/tools/llvm-objdump/MachODump.cpp
@@ -13,6 +13,7 @@
 #include "MachODump.h"
 
 #include "ObjdumpOptID.h"
+#include "SourcePrinter.h"
 #include "llvm-objdump.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
@@ -7415,18 +7416,24 @@ static void DisassembleMachO(StringRef Filename, MachOObjectFile *MachOOF,
   std::unique_ptr<DIContext> diContext;
   std::unique_ptr<Binary> DSYMBinary;
   std::unique_ptr<MemoryBuffer> DSYMBuf;
-  if (UseDbg) {
-    // If separate DSym file path was specified, parse it as a macho file,
-    // get the sections and supply it to the section name parsing machinery.
-    if (const ObjectFile *DbgObj =
-            getMachODSymObject(MachOOF, Filename, DSYMBinary, DSYMBuf)) {
+  const ObjectFile *DbgObj = MachOOF;
+  if (UseDbg || PrintSource || PrintLines) {
+    // Look for debug info in external dSYM file or embedded in the object.
+    // getMachODSymObject returns MachOOF by default if no external dSYM found.
+    const ObjectFile *DSym =
+        getMachODSymObject(MachOOF, Filename, DSYMBinary, DSYMBuf);
+    if (!DSym)
+      return;
+    DbgObj = DSym;
+    if (UseDbg) {
       // Setup the DIContext
       diContext = DWARFContext::create(*DbgObj);
-    } else {
-      return;
     }
   }
 
+  SourcePrinter SP(DbgObj, TheTarget->getName());
+  LiveElementPrinter LEP(*MRI, *STI);
+
   if (FilterSections.empty())
     outs() << "(" << DisSegName << "," << DisSectName << ") section\n";
 
@@ -7605,6 +7612,12 @@ static void DisassembleMachO(StringRef Filename, MachOObjectFile *MachOOF,
           outs() << SymName << ":\n";
 
         uint64_t PC = SectAddress + Index;
+
+        if (PrintSource || PrintLines) {
+          formatted_raw_ostream FOS(outs());
+          SP.printSourceLine(FOS, {PC, SectIdx}, Filename, LEP);
+        }
+
         if (LeadingAddr) {
           if (FullLeadingAddr) {
             if (MachOOF->is64Bit())
@@ -7696,6 +7709,11 @@ static void DisassembleMachO(StringRef Filename, MachOObjectFile *MachOOF,
 
         uint64_t PC = SectAddress + Index;
 
+        if (PrintSource || PrintLines) {
+          formatted_raw_ostream FOS(outs());
+          SP.printSourceLine(FOS, {PC, SectIdx}, Filename, LEP);
+        }
+
         if (DumpAndSkipDataInCode(PC, Bytes.data() + Index, Dices, InstSize))
           continue;
 

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple of people who are more familiar with the details of Mach-O format. Hopefully one of them can review this for you.

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if for llvm-objdump the options that are not explicitly Mach-O should work with Mach-O. Or should work the same than when --macho is not provided. It seems that --macho might have been introduced at a time that a lot of functionality was not existing.

I have checked, to the best of my ability, if this would change anything for llvm-otool, which output probably should be compatible with Apple's otool, and I don't think otool options give access to the same functionality that --source, --line-numbers, or -g, so the changes in the code should not be hit by otool.

if (!DSym)
return;
DbgObj = DSym;
if (UseDbg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should PrintLines also be checked here?

The docs for --line-numbers say "When disassembling, display source line numbers.", while the docs for -g say "Print line information from debug info if available". It seems to be the same thing, except that -g is exclusive for the Mach-O parser. Probably they were both introduced without knowledge of each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

Comment on lines 7434 to 7436
SourcePrinter SP(DbgObj, TheTarget->getName());
LiveElementPrinter LEP(*MRI, *STI);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very important, and I haven't double checked the code compiles/works as I think it should.

This would happen even if PrintSource or PrintLines is not used. Also FOS below is created for each line to be printed, and later destroyed (probably cheap, but…)

With a lambda, it is possible to avoid some of these problems:

std::function<void(object::SectionedAddress Address, StringRef ObjectFilename)> printSourceLineIfNeeded;
if (PrintSource || PrintLines) {
  printSourceLineIfNeeded = [DbgObj, TheTarget, &MRI, &STI](object::SectionedAddress Address, StringRef ObjectFilename) {
    static SourcePrinter SP(DbgObj, TheTarget->getName());
    static LiveElementPrinter LEP(*MRI, *STI);
    static formatted_raw_ostream FOS(outs());
    SP.printSourceLine(FOS, Address, ObjectFilename, LEP);
  }
} else {
  printSourceLineIfNeeded = [](object::SectionedAddress Address, StringRef ObjectFilename) {};
}
...
// Below, use the lambda
printSourceLineIfNeeded({PC, SectIdx}, Filename);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisassembleMachO can be called here multiple times e.g. different slices of a universal binary. The statics in the lambda might be an issue. I can use a std::optional here to avoid the unnecessary initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I thought about static being "local" to each of the outside function invocation, but it might not be like that. A small functor might work as I thought, but this should be cheap enough. No need to over-optimize if it is not a problem.

@rjmansfield
Copy link
Contributor Author

Not sure if for llvm-objdump the options that are not explicitly Mach-O should work with Mach-O. Or should work the same than when --macho is not provided. It seems that --macho might have been introduced at a time that a lot of functionality was not existing.

Yeah, I couldn't think of why --source shouldn't work with --macho. Most of the other generic objdump options are supported. There's indication what's not supported with --macho by design, or what's just been overlooked. Additionally, even if a user specifies -S without --macho and the binary is universal, objdump will use macho object parser silently and then line/source information will not be shown, which may be confusing to the user as it will work for single arch binaries.

Fix --line-numbers with --macho and add test.
Use std::optional to defer initialization of SourcePrinter/LiveElementPrinter.
@rjmansfield
Copy link
Contributor Author

If this change is OK, could someone merge on my behalf? Thanks.

@drodriguez
Copy link
Contributor

Restarted that failed job. There was no error. It might have been infra failure. Let's wait to see if we get green.

@drodriguez drodriguez merged commit a746fe5 into llvm:main Oct 22, 2025
13 of 14 checks passed
@rjmansfield rjmansfield deleted the fix_objdump_src_macho branch October 22, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants