Improve automatic symbol pairing for nameless literals #247

LagoLunatic · 2025-08-24T19:35:30Z

This changes the algorithm that pairs up nameless literals like @1234.

The current algorithm first tries to pair them up by name, and if that fails it tries to pair them up by address. The issue with pairing literals up by name is that it can result in completely unrelated symbols (not necessarily even in the same section or with the same size) being paired up (#216). The issue with pairing them up by address is that if the TU has extra stripped weak objects from some header at the start of a section, all of the literal addresses will be offset by those, as objdiff does not link the object in order to strip out the unused ones.

Here's an example of a 100% matching TU that has extra stripped weak objects at the start of the .data section on the current version of objdiff:

The new algorithm instead pairs them up only if they are in the same section, have the exact same bytes, and have the exact same data relocations within them. It completely ignores both name (fixes #216) and address. This allows out of order literals to be paired up as well.

Here's the same TU with the new literal pairing:

Although the new algorithm works better in almost all cases I tested, the one situation I've found that it's worse at is pairing up large literals like switch statement jump tables when they are less than 100% matched. The old algorithm could pair them up as long as they were at the same address and only failed when the address was wrong. The new algorithm fails to pair them up even if they're 98% matched and at the same address (old on the left):

I'm not sure if there's a good way to fix this. Originally I tried pairing literals up by whichever was closest to 100% matching, which works well when every literal on the left has an equivalent on the right. But when some of the left literals have no right literal to pair up with, they get accidentally paired with wrong literals they don't actually correspond to, which screws everything in the whole section up. So I changed it to only look at 100% matches instead to avoid this.

Maybe there's a way to pair them up by looking at the code that uses them and comparing the relocations? But I think this will start getting complicated, so for now I think it's fine for jump tables to not be paired up automatically. It's still possible for the user to manually map them by right clicking.

… symbol

LagoLunatic · 2025-08-24T20:38:47Z

Update on the issue with pairing up jumptables, I think I've found a better solution. It now does two separate passes over all symbols, first pairing up symbols that match exactly to be sure those are all paired if possible. Then the second pass allows partially-matching literals to be paired up, only with other unpaired literals that didn't have an exact match in the first pass. This allows for jump tables to be paired up even when their relocations don't fully match yet thanks to the second pass, but without interfering with the first pass that is needed to make sure the literals don't all get mispaired when one is missing.

YunataSavior · 2025-08-24T21:02:20Z

@LagoLunatic with your objdiff change, in Twilight Princess for d_a_npc_maro, does __sinit_d_a_npc_maro_cpp match? In other words, does your change also affect how asm comparisons are performed in functions without pooling?

LagoLunatic · 2025-08-24T21:06:46Z

@LagoLunatic with your objdiff change, in Twilight Princess for d_a_npc_maro, does __sinit_d_a_npc_maro_cpp match? In other words, does your change also affect how asm comparisons are performed in functions without pooling?

It doesn't affect function diffing in any way, only data diffing. __sinit_d_a_npc_maro_cpp shows as 100% matched both before and after the changes.

LagoLunatic · 2025-08-24T21:10:23Z

Also FYI __sinit_d_a_npc_maro_cpp absolutely does use data pooling, so I'm not sure what you mean by the "functions without pooling" part of your comment.

YunataSavior · 2025-08-24T21:21:35Z

@LagoLunatic sorry, maybe I should have been more specific. I wasn't referring to .data (which is pooled). I was referring to .bss. Take a look at:

The function will still show up as nonmatching in objdiff. This is both for main and your PR branch (which I synced locally):

Here is the bss section, for reference:

Are you sure you've updated your local copy of the TP repo after caseif's PCH changes?

LagoLunatic · 2025-08-24T21:31:44Z

Oh I see, you have "Function relocation diffs" set to "Name or address". In that case it's only 99% matched for me too, both before and after my changes.

You need to change "Function relocation diffs" to "data value" for it to show as 100% matching. That's basically the same change I'm making in this PR (ignore name and address, only look at if the value matches), but for the function view it already has existed as an option in objdiff since the start of this year. This PR is implementing the same thing, but for the symbol list view, which previously always diffed by "Name or address" with no option to do "Data value" until now.

YunataSavior · 2025-08-24T22:45:34Z

OK, in this case, we should put some intelligence into objdiff. The presence of ...bss.0, ...data.0, and/or ...rodata.0 should trigger objdiff to automatically select "data value" for whichever lacks said hidden symbol.

LagoLunatic · 2025-08-24T22:52:05Z

OK, in this case, we should put some intelligence into objdiff. The presence of ...bss.0, ...data.0, and/or ...rodata.0 should trigger objdiff to automatically select "data value" for whichever lacks said hidden symbol.

I think the idea Altafen came up with yesterday would solve the issue easier than that. Either way though this should go in a separate issue. Function diffing is outside the scope of this current PR as I mentioned earlier.

encounter · 2025-08-30T17:49:47Z

objdiff-core/src/diff/mod.rs

 }

+/// Check if a symbol is a compiler-generated literal like @1234.
+fn is_symbol_compiler_generated_literal(symbol: &Symbol) -> bool {


Curious if we'd want other logic for GCC or MSVC

Looking at the vs2022.o object in the tests, I see some literals like __real@3f800000, so I guess MSVC already puts the value of float literals in their symbol name. So ones like these shouldn't need any special logic, as comparing literals by value is what would be done anyway if name was ignored.

Other than float literals I'm not really sure what I'm looking at, but I see some things like:

$unwind$?Dot@Vector@@QEAAMPEAU1@@Z

_RTC_InitBase.rtc$IMZ

$pdata$?DistSq@Vector@@QEAAMPEAU1@@Z

I imagine some of these might need their own logic to pair them up properly, but it's hard to tell with just one object, someone would need both the target and the base to guess what's going on here.

objdiff-core/src/diff/mod.rs

LagoLunatic added 5 commits August 24, 2025 14:55

Improve automatic symbol pairing for nameless literals

88cc76d

Fix data reloc diffing when the reloc points to an in-function static…

f8e7478

… symbol

Only pair up literals that match perfectly

40f7791

Clippy

81163a6

Do two separate passes when pairing up literals

8e4615e

YunataSavior mentioned this pull request Aug 25, 2025

d_a_obj_mie OK zeldaret/tp#2602

Merged

Fix partially-matching literal pairups not working right

5b8009e

encounter reviewed Aug 30, 2025

View reviewed changes

LagoLunatic and others added 5 commits August 30, 2025 14:27

Remove duplicate $ splitting code

8807580

Implement $ splitting for section names too

3e74225

Merge branch 'main' into literal-matchup

f50569e

Merge branch 'main' into literal-matchup

b72355d

Minor cleanup

c6437b4

encounter merged commit f2a5913 into encounter:main Aug 31, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve automatic symbol pairing for nameless literals #247

Improve automatic symbol pairing for nameless literals #247

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025 •

edited

Loading

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

encounter Aug 30, 2025

Uh oh!

LagoLunatic Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Improve automatic symbol pairing for nameless literals #247

Improve automatic symbol pairing for nameless literals #247

Uh oh!

Conversation

LagoLunatic commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

YunataSavior commented Aug 24, 2025

Uh oh!

LagoLunatic commented Aug 24, 2025

Uh oh!

encounter Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

LagoLunatic Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LagoLunatic commented Aug 24, 2025 •

edited

Loading