Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Stack Better #2

Closed
Nalen98 opened this issue Aug 14, 2019 · 14 comments
Closed

Make Stack Better #2

Nalen98 opened this issue Aug 14, 2019 · 14 comments
Labels
help wanted Extra attention is needed

Comments

@Nalen98
Copy link
Owner

Nalen98 commented Aug 14, 2019

Greetings!
eBPF has a number of nasty moments with the stack:
https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html#q-can-bpf-programs-access-instruction-pointer-or-return-address
And small citation:
«Q: Can BPF programs access instruction pointer or return address?
A: NO.
Q: Can BPF programs access stack pointer ?
A: NO. Only frame pointer (register R10) is accessible. From compiler point of view it's necessary to have stack pointer. For example, LLVM defines register R11 as stack pointer in its BPF backend, but it makes sure that generated code never uses it.»
Pretty sad, isn't it?

Register R10 is the last hope for us and Ghidra to implement access to stack of eBPF programm. I put «stackpointer» attribute on it in specification files for extension. R10 is a frameponter register, it’s unchangeable, and we use this register and offset to access the stack (such as index in array). Accordingly, the "original" SP (stack pointer register) is missing and no substitute in eBPF. This implies several problems:

=> Stack Depth field in Ghidra will be guaranteed zero, beacuse we have to appoint R10 as a stackponter in these architectural conditions.

=> Instructions which operate with R10 (Load/Store instructions) often doesn't decompile. Any ideas about this thing? Ghidra usually doesn't decompile dead code (snippets that won't execute), but it generally does not apply to the subject of Issue.

Thanks to support!

@mumbel
Copy link

mumbel commented Aug 15, 2019

No comment on SP, but I do have some suggestions which might improve your diassembler and decompiler output.

  • All your jumps are inverted logic, this adds an unnecessary goto to the instruction's PCODE.
  • off in the jumps could be a constructor to remove the temp calculation in every instruction and more info in disassembly: example in your screenshot JGT R1,R2,0x71 becomes JGT R1,R2,LBB0_11
:JEQ dst, imm, off   is imm & off & dst & op_alu_jmp_opcode=0x1 & op_alu_jmp_source=0 & op_insn_class=0x5 {	
	temp:8=inst_next + off * 8;  
	if (dst!=imm) goto <label>;
	goto [temp];	
	<label>	   
}
joff: reloc is off [ reloc = inst_next + off * 8; ] { export *:8 reloc; }

:JEQ dst, imm, joff   is imm & joff & dst & op_alu_jmp_opcode=0x1 & op_alu_jmp_source=0 & op_insn_class=0x5 {	
	if (dst==imm) goto joff;
}

@Nalen98
Copy link
Owner Author

Nalen98 commented Aug 16, 2019

Thank you, your comment is correct and competent. I will create new commit and fix all jumps in few days.
And it's interesting to solve the problem with the stack.

@emteere
Copy link

emteere commented Aug 18, 2019

Can you give me an example that doesn't decompile?

Are you using stock 9.0.4, or are you using Ghidra main dev branch? There have been some quite significant changes made to the decompiler.

Since your functions don't appear to manipulate the stack depth, changing register R11, the stack depth field would always display zero. I do see passing of pointers on the stack, where the pointer "allocates" memory on the stack by assigning a parameter variable from the value in R10, and subtracting some value, but the stack is never changed here (R10 or R11).

The same thing would occur in a program that never changed the RSP, but assigned other registers to the value of RSP and used offsets from it.

There are other languages where functions essentially have their own "stack" and can't access outside of their stack space. Java essentially has a local stack, and you can't access a global stack.

Be happy to take a look at the samples if you can point out locations that you expect different results, especially if the decompiler is not working on them.

@Nalen98
Copy link
Owner Author

Nalen98 commented Aug 19, 2019

@emteere, yes, R10 is unchangeable and the only register which can access the stack in eBPF program. But, as I said, calling it «stackpointer» is absolutely wrong but no choice.

I use 9.0.4 and also checked on the latest build of Ghidra. The same problem.

Take a look on this screenshot of eBPF program bpf_lxc.o. As you can see, STXDW (“store double word”) doesn’t decompile if using R10. Otherwise, all successful.

No decomp

Nalen98 added a commit that referenced this issue Aug 19, 2019
@emteere
Copy link

emteere commented Aug 19, 2019

I'll take a look at the example. Most likely a pcode intricacy that isn't readily apparent or something in the .cspec.

The processor is just a model, and for all intents and purposes that is a stack for the function.
There are ways to have pcode injected at the start of your functions, but I don't think it is worth the complexity here. That might change depending on inter function calls.

Are you getting all the parameters correct for your calls, or is there some sort of lookup scheme (as in the const-pool in Java bytecode)?

BTW, nice work on the processor. I haven't done a full review, but getting this far on it is great.

@emteere
Copy link

emteere commented Aug 19, 2019

The code is actually decompiling correctly, the initializations are seen as setting local variables, and since the value is never seen as changed, the decompiler gets rid of it. That is also why if statements are disappearing because they are seen as dead code. The value they are testing that was initialized on the stack is never seen to change.

This is a pointer aliasing problem. I believe can be fixed, but I'm still poking at it.

@Nalen98
Copy link
Owner Author

Nalen98 commented Aug 21, 2019

BTW, nice work on the processor. I haven't done a full review, but getting this far on it is great.

Thank you.

Are you getting all the parameters correct for your calls, or is there some sort of lookup scheme (as in the const-pool in Java bytecode)?

Yes, helper calls work correctly. I created a small eBPF program without any in-kernel calls (only arithmetic tasks and access the stack) and i found the same situation when "stack-accessing" instructions don't decompile.

@emteere
Copy link

emteere commented Aug 23, 2019

OK, I'll take a look. I believe I have a way you can implement this, and get datatype information into the in-kernel calls. I'll spec it out as soon as I get a chance and get it to you.

@Nalen98
Copy link
Owner Author

Nalen98 commented Aug 23, 2019

@emteere Thank you

@emteere
Copy link

emteere commented Sep 11, 2019

The main issue is the dataflow analysis isn't the same in the decompiler for CALLOTHER user pcodeops, and CALL pcodeop. By specifying all the calls with pcodeop's in the sleigh processor you will get the name in the decompilation, but not the good data flow. However you will still need to apply good function signatures with types, which isn't easily done of CALLOTHER pcodeops until 9.1 (current master) is released. 9.1 allows overriding the CALLOTHER to any other place by putting a CALL_OTHER_OVERRIDE reference to any other address. The decompiler will then pickup the function signature from that location.

The following should fix your issue in the current 9.1, without doing anything fancy with the new call-override references. There are other ways to accomplish this, but this is probably the easiest.

Create a eBPFHelper syscall address space, and then make calls directly to it. You can allocate the space in the .pspec file. Then at each location you can create a 1 byte function that has the correct function signature. The issue with this is you get bad bookmarks because of the disassembly, so I chose this as the lesser of evils. We should probably change the code so that you can have references to an uninitialized space that doesn't cause disassembly errors (next release...)

The calling convention you have already specified will allow the decompiler to figure out the parameters and return.

The reason you need the latest 9.1 or master release is to get the better pointer aliasing. The stack variable is initialized, but the decompiler won't see that it could be changed by one of the calls that takes a pointer to the initialized stack in 9.0.4. The decompiler has an improved value-set analysis for the stack in 9.1.

The other issue is in the .pspec, since you are loading an ELF, normally the .pspec symbols don't get loaded, so when you import an elf, you have to check the in the loader options to apply processor specific symbols. This is hard to remember to do, so we either need to allow the default to be overriden, or provide another mechanism to force the processor specific symbols, possibly a tag on the symbol entries themselves. The symbols could get applied in another manner, but I wanted to get you this mechanism since it has been a while since you posed the question.

I looked around for good header file definitions for the functions, but didn't find some easily parse-able ones. Either a pre-parsed .gdt file, or a header file that is parsed at runtime with the CParser could be added. A header file parsed is probably better in the long run because it is easy to change and human readable.

I also changed the calling convention slightly.
eBPF.cspec.txt
eBPF.pspec.txt
eBPF.sinc.txt

@Nalen98
Copy link
Owner Author

Nalen98 commented Sep 19, 2019

@emteere, Thanks for support!
Problem with ghidra's deadcode is resolved, stack-accessing instructions are successfully decompiled.

I decided to go a bit straight through and use power of Ghidra api, so custom eBPFAnalyzer was written.

Now eBPF helpers are recognized as functions, the separate memory segment (syscall as you defined) is allocated for them, you can take a look at what signatures they have, the disassembler and decompiler got a number of nice changes.

HelpersAnalyzer

A little effort, and the new release will come out soon. Thanks to you!

@emteere
Copy link

emteere commented Sep 20, 2019

Nice, and you are welcome. Will take a closer look.

Curious where you are storing and applying the data types and function signatures.

Sorry for the red bookmarks. That's a side-effect of the disassembler trying to be helpful with code that doesn't disassemble that is called. There are ways that I thought of to stop it, but it was more trouble than it was worth at the time and I wanted to get you a solution. I suppose your eBPFanalyzer could get rid of the disassembly errors, since it knows better that they are not issues.

@Nalen98
Copy link
Owner Author

Nalen98 commented Sep 20, 2019

Curious where you are storing and applying the data types and function signatures.

Analyzer parses Symbol Table (where specified "syscall" in location field). You prompted the body of pspec file, where the memory addresses for each function were defined. I rely on this.

Analyzer determines, which function is to this address, and creates cmds for applying, which contain type of the return value, the name of function and which arguments are used (their type, name, index, etc).

It would be nice to solve the problem with bad bookmarks and i think it can be solved.

@Nalen98
Copy link
Owner Author

Nalen98 commented Sep 23, 2019

Done!
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants