-
Notifications
You must be signed in to change notification settings - Fork 41
Improve automatic symbol pairing for nameless literals #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update on the issue with pairing up jumptables, I think I've found a better solution. It now does two separate passes over all symbols, first pairing up symbols that match exactly to be sure those are all paired if possible. Then the second pass allows partially-matching literals to be paired up, only with other unpaired literals that didn't have an exact match in the first pass. This allows for jump tables to be paired up even when their relocations don't fully match yet thanks to the second pass, but without interfering with the first pass that is needed to make sure the literals don't all get mispaired when one is missing. |
@LagoLunatic with your objdiff change, in Twilight Princess for d_a_npc_maro, does |
It doesn't affect function diffing in any way, only data diffing. |
Also FYI |
@LagoLunatic sorry, maybe I should have been more specific. I wasn't referring to ![]() The function will still show up as nonmatching in objdiff. This is both for main and your PR branch (which I synced locally): ![]() Here is the bss section, for reference: ![]() Are you sure you've updated your local copy of the TP repo after caseif's PCH changes? |
Oh I see, you have "Function relocation diffs" set to "Name or address". In that case it's only 99% matched for me too, both before and after my changes. You need to change "Function relocation diffs" to "data value" for it to show as 100% matching. That's basically the same change I'm making in this PR (ignore name and address, only look at if the value matches), but for the function view it already has existed as an option in objdiff since the start of this year. This PR is implementing the same thing, but for the symbol list view, which previously always diffed by "Name or address" with no option to do "Data value" until now. |
OK, in this case, we should put some intelligence into objdiff. The presence of |
I think the idea Altafen came up with yesterday would solve the issue easier than that. Either way though this should go in a separate issue. Function diffing is outside the scope of this current PR as I mentioned earlier. |
} | ||
|
||
/// Check if a symbol is a compiler-generated literal like @1234. | ||
fn is_symbol_compiler_generated_literal(symbol: &Symbol) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if we'd want other logic for GCC or MSVC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the vs2022.o
object in the tests, I see some literals like __real@3f800000
, so I guess MSVC already puts the value of float literals in their symbol name. So ones like these shouldn't need any special logic, as comparing literals by value is what would be done anyway if name was ignored.
Other than float literals I'm not really sure what I'm looking at, but I see some things like:
$unwind$?Dot@Vector@@QEAAMPEAU1@@Z
_RTC_InitBase.rtc$IMZ
$pdata$?DistSq@Vector@@QEAAMPEAU1@@Z
I imagine some of these might need their own logic to pair them up properly, but it's hard to tell with just one object, someone would need both the target and the base to guess what's going on here.
This changes the algorithm that pairs up nameless literals like
@1234
.The current algorithm first tries to pair them up by name, and if that fails it tries to pair them up by address. The issue with pairing literals up by name is that it can result in completely unrelated symbols (not necessarily even in the same section or with the same size) being paired up (#216). The issue with pairing them up by address is that if the TU has extra stripped weak objects from some header at the start of a section, all of the literal addresses will be offset by those, as objdiff does not link the object in order to strip out the unused ones.
Here's an example of a 100% matching TU that has extra stripped weak objects at the start of the .data section on the current version of objdiff:

The new algorithm instead pairs them up only if they are in the same section, have the exact same bytes, and have the exact same data relocations within them. It completely ignores both name (fixes #216) and address. This allows out of order literals to be paired up as well.
Here's the same TU with the new literal pairing:

Although the new algorithm works better in almost all cases I tested, the one situation I've found that it's worse at is pairing up large literals like switch statement jump tables when they are less than 100% matched. The old algorithm could pair them up as long as they were at the same address and only failed when the address was wrong. The new algorithm fails to pair them up even if they're 98% matched and at the same address (old on the left):

I'm not sure if there's a good way to fix this. Originally I tried pairing literals up by whichever was closest to 100% matching, which works well when every literal on the left has an equivalent on the right. But when some of the left literals have no right literal to pair up with, they get accidentally paired with wrong literals they don't actually correspond to, which screws everything in the whole section up. So I changed it to only look at 100% matches instead to avoid this.
Maybe there's a way to pair them up by looking at the code that uses them and comparing the relocations? But I think this will start getting complicated, so for now I think it's fine for jump tables to not be paired up automatically. It's still possible for the user to manually map them by right clicking.