Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb] Fix nullptr dereference on running x86 binary with x86-disabled llvm #82603

Merged
merged 5 commits into from
Apr 16, 2024

Conversation

kovdan01
Copy link
Contributor

If LLVM_TARGETS_TO_BUILD does not contain X86 and we try to run an x86 binary in lldb, we get a nullptr dereference in LLVMDisasmInstruction(...). We try to call getDisAsm() method on a LLVMDisasmContext *DC which is null. The pointer is passed from x86AssemblyInspectionEngine::instruction_length(...) and is originally m_disasm_context member of x86AssemblyInspectionEngine. This should be filled by LLVMCreateDisasm(...) in the class constructor, but not having X86 target enabled in llvm makes TargetRegistry::lookupTarget(...) call return nullptr, which results in m_disasm_context initialized with nullptr as well.

This patch adds if statements against m_disasm_context in x86AssemblyInspectionEngine::GetNonCallSiteUnwindPlanFromAssembly(...) and x86AssemblyInspectionEngine::FindFirstNonPrologueInstruction(...) so subsequent calls to x86AssemblyInspectionEngine::instruction_length(...) do not cause a null pointer dereference.

…d llvm

If `LLVM_TARGETS_TO_BUILD` does not contain `X86` and we try to run an
x86 binary in lldb, we get a `nullptr` dereference in
`LLVMDisasmInstruction(...)`. We try to call `getDisAsm()` method on a
`LLVMDisasmContext *DC` which is null. The pointer is passed from
`x86AssemblyInspectionEngine::instruction_length(...)` and is originally
`m_disasm_context` member of `x86AssemblyInspectionEngine`. This should
be filled by `LLVMCreateDisasm(...)` in the class constructor, but not having
X86 target enabled in llvm makes `TargetRegistry::lookupTarget(...)`
call return `nullptr`, which results in `m_disasm_context` initialized
with `nullptr` as well.

This patch adds if statements against `m_disasm_context` in
`x86AssemblyInspectionEngine::GetNonCallSiteUnwindPlanFromAssembly(...)` and
`x86AssemblyInspectionEngine::FindFirstNonPrologueInstruction(...)` so
subsequent calls to `x86AssemblyInspectionEngine::instruction_length(...)` do
not cause a null pointer dereference.
@kovdan01
Copy link
Contributor Author

Please suggest ideas how this could be tested. It looks like that UnwindAssembly unittests is the right place for it - but UnwindAssemblyx86Tests cmake target is not built without X86 in LLVM_TARGETS_TO_BUILD, which is the pre-condition of the issue this PR is fixing.

@kovdan01 kovdan01 marked this pull request as ready for review February 22, 2024 09:24
@llvmbot llvmbot added the lldb label Feb 22, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 22, 2024

@llvm/pr-subscribers-lldb

Author: Daniil Kovalev (kovdan01)

Changes

If LLVM_TARGETS_TO_BUILD does not contain X86 and we try to run an x86 binary in lldb, we get a nullptr dereference in LLVMDisasmInstruction(...). We try to call getDisAsm() method on a LLVMDisasmContext *DC which is null. The pointer is passed from x86AssemblyInspectionEngine::instruction_length(...) and is originally m_disasm_context member of x86AssemblyInspectionEngine. This should be filled by LLVMCreateDisasm(...) in the class constructor, but not having X86 target enabled in llvm makes TargetRegistry::lookupTarget(...) call return nullptr, which results in m_disasm_context initialized with nullptr as well.

This patch adds if statements against m_disasm_context in x86AssemblyInspectionEngine::GetNonCallSiteUnwindPlanFromAssembly(...) and x86AssemblyInspectionEngine::FindFirstNonPrologueInstruction(...) so subsequent calls to x86AssemblyInspectionEngine::instruction_length(...) do not cause a null pointer dereference.


Full diff: https://github.com/llvm/llvm-project/pull/82603.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp (+6)
diff --git a/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp b/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
index 2032c5a68d054c..6bfaa54135a959 100644
--- a/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
@@ -909,6 +909,9 @@ bool x86AssemblyInspectionEngine::GetNonCallSiteUnwindPlanFromAssembly(
   if (!m_register_map_initialized)
     return false;
 
+  if (m_disasm_context == nullptr)
+    return false;
+
   addr_t current_func_text_offset = 0;
   int current_sp_bytes_offset_from_fa = 0;
   bool is_aligned = false;
@@ -1570,6 +1573,9 @@ bool x86AssemblyInspectionEngine::FindFirstNonPrologueInstruction(
   if (!m_register_map_initialized)
     return false;
 
+  if (m_disasm_context == nullptr)
+    return false;
+
   while (offset < size) {
     int regno;
     int insn_len;

@DavidSpickett
Copy link
Collaborator

Can this code be hit when using an x86 core file? Then you could write a shell test thatis UNSUPPORTED: x86-registered-target (whatever the proper syntax is) and assert that it does not crash.

It won't get run on the build bots, they enable all targets, but it's nice to have anyway. Someone downstream might appreciate it.

@jasonmolenda
Copy link
Collaborator

I'm a little curious the use case that led to this failure. We have a build of llvm/lldb without the X86 target, and we're using that lldb to debug an i386/x86_64 target (gdb connection or corefile)? Because we can't use instruction emulation based unwinding, if we don't have eh_frame for every function, backtraces will go poorly. Once we're off of the currently-executing stack frame, we can follow the frame pointer / pc spilled to the stack (assuming the code wasn't compiled omit-frame-pointer), but we won't be able to find any other spilled registers for printing variable contents in non-volatile / caller spilled. (and getting off of frame 0 without eh_frame is going to fail if we're in the prologue/epilogue on a thread)

@jasonmolenda
Copy link
Collaborator

tbh most of lldb would work without the llvm target compiled in, but the disassembler and the Intel assembly scanning unwind plan creator both need it, and the latter is more important.

@jasonmolenda
Copy link
Collaborator

Well, it would change source-level next and step too if we didn't have the disassembler. When we're doing a source-level step/next, we look at the stream of instructions and when we have a block of non-branching instructions, we put a breakpoint after the block and continue, instead of instruction stepping. We use the disassembler to detect instructions that can branch. source level step and next would still work, but they would be a bit slower.

@JDevlieghere
Copy link
Member

Also this is plugin code, it's probably more appropriate to disable the whole plugin if the necessary backend isn't there?

@jasonmolenda
Copy link
Collaborator

tbh I have no problems with the patch, but I think it's fixing something that I think should be reconsidered altogether, I'm interested to hear more about what the use case looks like that led to this being a problem.

@kovdan01
Copy link
Contributor Author

tbh I have no problems with the patch, but I think it's fixing something that I think should be reconsidered altogether, I'm interested to hear more about what the use case looks like that led to this being a problem.

@jasonmolenda The use case is very simple, describing it "as is". I was working on AArch64-specific stuff in lldb in downstream and revealed an x86-related issue while reading the code (see #82364). When working on the latter issue, I tried to run a random x86-64 executable inside lldb and got this error. It occurs literally on the simplest use case:

  1. run lldb /usr/bin/ls
  2. inside lldb, hit r to run
  3. get the nullptr dereference described

Yes, it's that simple.

@kovdan01
Copy link
Contributor Author

Can this code be hit when using an x86 core file?

Thanks for suggestion! I'll try that and notify here if such approach succeeded.

@kovdan01
Copy link
Contributor Author

Can this code be hit when using an x86 core file? Then you could write a shell test thatis UNSUPPORTED: x86-registered-target (whatever the proper syntax is) and assert that it does not crash.

@DavidSpickett The issue is present when loading an x86 core files via lldb --core core (nothing else, just running this command). I've implemented shell test and put that into a new directory Core - see bd9bb0a. If there is a more applicable place for that - would be glad to see suggestions

@DavidSpickett
Copy link
Collaborator

The spirit of the test is fine but I think we can avoid adding another core file.

As this plugin seems related to backtrace perhaps the test should at least run bt and check for the first line of the output. Which means we can justify putting the test in lldb/test/Shell/Commands as command-backtrace-missing-x86.test.

Now we need a corefile, the closest ones are in ./Register/Core/Inputs/, so it's a bit unorthodox, but you could load %p/../Register/Core/Inputs/x86-64-linux.core and save having to copy it.

@kovdan01
Copy link
Contributor Author

As this plugin seems related to backtrace perhaps the test should at least run bt and check for the first line of the output.

I had similar thoughts, but I'm not sure if placing the test in lldb/test/Shell/Commands as command-backtrace-missing-x86.test is a nice idea. The issue occurs during loading the core dump (or, when debugging an executable, when executing "run" command). So, running bt might be misleading - a one might think that we test an absence of nullptr dereference during bt, but we want to test that the core dump is at least loaded properly since with the issue present we don't even get to the point where we can run bt.

I would be glad to here other people's thoughts on that - I'm personally not sure which testing approach is less evil here.

@kovdan01
Copy link
Contributor Author

@jasonmolenda Just in case you've missed - I've provided the use case description leading to the issue (as you requested) above #82603 (comment). Would be glad if you let me know if it gives you enough info & if this particular PR is OK or the issue should be fixed in another way.

@DavidSpickett
Copy link
Collaborator

I will be away for a while so @jasonmolenda please merge it if it looks fine to you.

@kovdan01
Copy link
Contributor Author

@jasonmolenda Would be glad to see your feedback - see answers to your previous comments above

@kovdan01
Copy link
Contributor Author

@jasonmolenda A kind reminder regarding the PR - see answers to your previous comments above

@jasonmolenda
Copy link
Collaborator

Hi sorry @kovdan01 I missed this one in the emails. You're using an lldb which was built without the LLVM_TARGETS_TO_BUILD including X86, and running that lldb on an x86 corefile, got it. I have low confidence how well lldb will work in this situation, e.g. inferior function calls are obviously going to fail completely, and possibly not in a graceful way, but that doesn't impact corefiles. I'm less thrilled about adding a 570kb corefile to the repository to check this combination doesn't crash the unwinder. In lldb/unittest/UnwindAssembly we build the x86 directory when

if ("X86" IN_LIST LLVM_TARGETS_TO_BUILD)
  add_subdirectory(x86)
endif()

In Testx86AssemblyInspectionEngine.cpp we initialize llvm state in Testx86AssemblyInspectionEngine::SetUpTestCase and then run individual tests in the TEST_F() entries, creating a byte stream of prologues like

  // 'int main() { }' compiled for x86_64-apple-macosx with clang
  uint8_t data[] = {
      0x55,             // offset 0 -- pushq %rbp
      0x48, 0x89, 0xe5, // offset 1 -- movq %rsp, %rbp
      0x31, 0xc0,       // offset 4 -- xorl %eax, %eax
      0x5d,             // offset 6 -- popq %rbp
      0xc3              // offset 7 -- retq
  };

and run the unwind engine on those bytes.

Could we add a x86-but-no-x86-target directory, write one test to see that the unwind engine can run against a byte buffer like this and not crash instead maybe?

@kovdan01
Copy link
Contributor Author

Thanks @jasonmolenda for your feedback and suggestion! See f96989d - I've deleted the test with corefile and added the test you've mentioned. Basically, I've just left the most simple test from "normal" Testx86AssemblyInspectionEngine and checked the GetNonCallSiteUnwindPlanFromAssembly call result against false. I've also ensured that w/o the patch applied, we just fail with nullptr dereference - so the test actually covers the case it's designed for.

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please land it. Thanks for the pings and rewriting the test case!

@kovdan01 kovdan01 merged commit 7e49b0d into llvm:main Apr 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants