-
-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lift MSP430 machine language to RzIL #4379
base: dev
Are you sure you want to change the base?
Conversation
a68c9ea
to
5438a1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited about this! Feel free to reach out if you need any help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You need to update Meson files to include them to be built
- It is necessary to add per-instruction RzIL tests in
test/db/asm/msp430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Would be awesome if you finish it!
It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions. Mostly because the operands of those groups are almost always the same. So it is easier to initialize the operands at the start of the function and than use them to build the operations.
Check the ISA, if they group instructions into classes or something. Usually the grouping from the ISA makes the most sense.
Thanks a lot! I do intend to finish it, fingers crossed. The architecture is a bit obscure and the only available emulator (https://msp430.online/) requires a compilation toolchain because it expects an ELF or something like that, on the one hand this is good because adding support for obscure architectures is precisely why ESIL and RzIL were introduced in the first place, but on the other hand that means it's very easy to come up with bugs and - even worse - subtle deviations from the behavior of the CPU due to vague English descriptions. Add to that I'm not very familiar with RzIL API (e.g. the DUP bug above never crossed my mind), and that's why I'm taking it very slow and careful here.
It appears that MSP430 instructions are very orthogonal, there is almost no difference at all between instructions of the same arity, the only 2 differences is whether instructions update the flags and whether they're emulated (syntactic sugar for specific uses of other instructions) or not. I'm generally not familiar enough yet with the architecture to split by myself, so I'm just grouping by arity and whether the instruction is a jump instruction, this is the same approach taken by the disassembler, so I'm following it now but keeping my mind open if there are any patterns that hint if there is a better grouping. Unlike the disassembler, which naturally doesn't mention emulated instructions (since those instructions don't exist in the byte stream input to the disassembler), it might make sense in my case to add the emulated instruction because it's more efficient. For example it's inefficient to implement a SETC (set carry flag) using the architecture's emulation, which uses a more complex instruction BIS with 2 operands and more general logic. SETC is much simpler. But I'm implementing the bare basics and worrying about efficiency and auxiliaries later. |
Don't forget to rebase when you do your next change. |
5438a1c
to
0fa7e53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add a test
- See the warning.
0fa7e53
to
975f225
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also rebase on top of the latest dev
*dst = parse_4digit_hex(dst_str + 1); | ||
return DST_ABS; | ||
case '0': | ||
int end_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tcc -Ilibrz/arch/librz_arch.so.0.8.0.p -I. -I.. -Ilibrz -I../librz -Ilibrz/include -I../librz/include -I../librz/arch/isa -I../librz/arch/isa_gnu -Ilibrz/arch -I../librz/arch -Ilibrz/util/sdb/src -I../librz/util/sdb/src -I../librz/bin/format -I../librz/type/parser -I../subprojects/capstone-next/include -I../subprojects/capstone-next/include/capstone -Isubprojects/rzspp -I../subprojects/rzspp -Wall -O2 -g -Wimplicit-fallthrough=3 -DRZ_PLUGIN_INCORE=1 -D_GNU_SOURCE -Werror=sizeof-pointer-memaccess -fvisibility=hidden -MD -MF librz/arch/librz_arch.so.0.8.0.p/isa_msp430_msp430_il.c.o.d -o librz/arch/librz_arch.so.0.8.0.p/isa_msp430_msp430_il.c.o -c ../librz/arch/isa/msp430/msp430_il.c
In file included from ../librz/arch/isa/msp430/msp430_il.c:208:
../librz/arch/isa/msp430/msp430_instruction_parse.inc:113: error: identifier expected
librz/arch/isa/msp430/msp430_il.c
Outdated
@@ -0,0 +1,208 @@ | |||
// SPDX-FileCopyrightText: 2023 Mostafa Mahmoud <ubermenchun@gmail.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it 2024?
static ut16 parse_reg(char *reg); | ||
static ut16 parse_hex(char *num, int *end_offset); | ||
|
||
static Msp430SourceAddressingMode parse_src(char *src_str, RZ_OUT ut32 *src, RZ_OUT int *dst_offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of parsing string, I wonder if it's better to modify the librz/arch/isa/msp430/msg430_disas.c
instead and provide this kind of information as a C structure, similarly how other disassemblers do it, e.g. capstone-based ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing the operands is definitely pretty ugly, but it's way easier to do than messing with the internals of the disassembler.
Someone, very likely me, will probably touch the MSP430 plugin soon to add assembly anyway (there is no MSP430 assembler at the moment, I'm using an external Python script I found somewhere on Github to obtain the binary of instructions), that will be a suitable moment to rethink disas.c
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend focusing on improving the disassembly, analysis, and RzIL first and foremost before spending time adding the assembler. These are higher priority, and improving disassembler to provide instruction details for the analysis is much more important than writing an assembler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're actually right, the parsing code is a constant source of overflow headache and bugs, it keeps making AddressSanitizer complain about illegal reads. I decided to remove the problem from its root and delete the code, I will instead make the disassembler bubble up the information I need.
@moste00 any updates on this PR? |
Really sorry for being held up this long, many many interviews and disruptions in my irl life. What remains isn't that much. Just a rebase into a newer dev, and debugging the issue of why All in all, most of the major work is done IMO, the supported instructions pass into the lifter and come out as IL trees. It's just a matter of seeing why the CI doesn't think so from now on. |
7c45856
to
c0aa04c
Compare
@moste00 please check Mattermost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typo in the license identifier was likely the reason for the failing clang-format
628dac4
to
a85f5f5
Compare
525219e
to
4d28ad6
Compare
@moste00 please rebase |
4d28ad6
to
81c01a4
Compare
81c01a4
to
95b6070
Compare
Your checklist for this pull request
Detailed description
This PR is the first step towards lifting of the MSP430 instruction set to the RzIL representation. MSP430 is a quite simple architecture with comprehensive references here (http://elec327.github.io/assets/documents/slau144j_userguide.pdf), here (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_b.pdf) and (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_04.pdf).
In this PR, only lifting of the following 3 single-operand instruction is implemented. This is deliberate to keep the diff small. The rest of the lifting could be either new commits on this same branch and PR or future PRs.
Instructions lifted:
RRC (Rotate Right through Carry): takes a single operand which is also the destination, shifts it right by 1 position. The MSB becomes the carry flag, and the carry flag becomes the discarded LSB (from before the shift). Other flags are updated as specified in the references. Instruction has word-sized and byte-sized variants.
SWPB (Swap Bytes): takes a single operand which is also the destination, and exchanges the 2 bytes in it. Doesn't update the flags or any other CPU state. Always a word-sized operation.
SXT (Sign eXTension): takes a single operand which is also the destination, and replicates the MSB of the lower byte of the operand to the entire upper byte (e.g. if the lower byte is less than 128, the upper byte is 255, else the upper byte is 0). Always a word-sized operation that reads the lower byte and writes the upper byte of the operand.
Each instruction in the MSP430 has seven addressing modes, only one of them is implemented in this PR, namely the Register Addressing Mode, where the operand(s) are a register index. Other indexing modes including memory access and immediates should be straightforward to add later.
Test plan
...
Closing issues
...