Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift MSP430 machine language to RzIL #4379

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

moste00
Copy link
Contributor

@moste00 moste00 commented Mar 22, 2024

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

This PR is the first step towards lifting of the MSP430 instruction set to the RzIL representation. MSP430 is a quite simple architecture with comprehensive references here (http://elec327.github.io/assets/documents/slau144j_userguide.pdf), here (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_b.pdf) and (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_04.pdf).

In this PR, only lifting of the following 3 single-operand instruction is implemented. This is deliberate to keep the diff small. The rest of the lifting could be either new commits on this same branch and PR or future PRs.

Instructions lifted:
 

  • RRC (Rotate Right through Carry): takes a single operand which is also the destination, shifts it right by 1 position. The MSB becomes the carry flag, and the carry flag becomes the discarded LSB (from before the shift). Other flags are updated as specified in the references. Instruction has word-sized and byte-sized variants.

  • SWPB (Swap Bytes): takes a single operand which is also the destination, and exchanges the 2 bytes in it. Doesn't update the flags or any other CPU state. Always a word-sized operation.

  • SXT (Sign eXTension): takes a single operand which is also the destination, and replicates the MSB of the lower byte of the operand to the entire upper byte (e.g. if the lower byte is less than 128, the upper byte is 255, else the upper byte is 0). Always a word-sized operation that reads the lower byte and writes the upper byte of the operand.

Each instruction in the MSP430 has seven addressing modes, only one of them is implemented in this PR, namely the Register Addressing Mode, where the operand(s) are a register index. Other indexing modes including memory access and immediates should be straightforward to add later.

Test plan

...

Closing issues

...

Copy link
Member

@DMaroo DMaroo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about this! Feel free to reach out if you need any help.

librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You need to update Meson files to include them to be built
  2. It is necessary to add per-instruction RzIL tests in test/db/asm/msp430

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Would be awesome if you finish it!

It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions. Mostly because the operands of those groups are almost always the same. So it is easier to initialize the operands at the start of the function and than use them to build the operations.
Check the ISA, if they group instructions into classes or something. Usually the grouping from the ISA makes the most sense.

@Rot127 Rot127 added the MSP430 label Mar 23, 2024
@moste00
Copy link
Contributor Author

moste00 commented Mar 23, 2024

Very nice! Would be awesome if you finish it!

Thanks a lot! I do intend to finish it, fingers crossed. The architecture is a bit obscure and the only available emulator (https://msp430.online/) requires a compilation toolchain because it expects an ELF or something like that, on the one hand this is good because adding support for obscure architectures is precisely why ESIL and RzIL were introduced in the first place, but on the other hand that means it's very easy to come up with bugs and - even worse - subtle deviations from the behavior of the CPU due to vague English descriptions. Add to that I'm not very familiar with RzIL API (e.g. the DUP bug above never crossed my mind), and that's why I'm taking it very slow and careful here.

It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions.

It appears that MSP430 instructions are very orthogonal, there is almost no difference at all between instructions of the same arity, the only 2 differences is whether instructions update the flags and whether they're emulated (syntactic sugar for specific uses of other instructions) or not.

I'm generally not familiar enough yet with the architecture to split by myself, so I'm just grouping by arity and whether the instruction is a jump instruction, this is the same approach taken by the disassembler, so I'm following it now but keeping my mind open if there are any patterns that hint if there is a better grouping.

Unlike the disassembler, which naturally doesn't mention emulated instructions (since those instructions don't exist in the byte stream input to the disassembler), it might make sense in my case to add the emulated instruction because it's more efficient. For example it's inefficient to implement a SETC (set carry flag) using the architecture's emulation, which uses a more complex instruction BIS with 2 operands and more general logic. SETC is much simpler. But I'm implementing the bare basics and worrying about efficiency and auxiliaries later.

@XVilka
Copy link
Member

XVilka commented Mar 25, 2024

Don't forget to rebase when you do your next change.

Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add a test
  2. See the warning.

librz/arch/isa/msp430/msp430_disas.h Outdated Show resolved Hide resolved
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase on top of the latest dev

*dst = parse_4digit_hex(dst_str + 1);
return DST_ABS;
case '0':
int end_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 tcc -Ilibrz/arch/librz_arch.so.0.8.0.p -I. -I.. -Ilibrz -I../librz -Ilibrz/include -I../librz/include -I../librz/arch/isa -I../librz/arch/isa_gnu -Ilibrz/arch -I../librz/arch -Ilibrz/util/sdb/src -I../librz/util/sdb/src -I../librz/bin/format -I../librz/type/parser -I../subprojects/capstone-next/include -I../subprojects/capstone-next/include/capstone -Isubprojects/rzspp -I../subprojects/rzspp -Wall -O2 -g -Wimplicit-fallthrough=3 -DRZ_PLUGIN_INCORE=1 -D_GNU_SOURCE -Werror=sizeof-pointer-memaccess -fvisibility=hidden -MD -MF librz/arch/librz_arch.so.0.8.0.p/isa_msp430_msp430_il.c.o.d -o librz/arch/librz_arch.so.0.8.0.p/isa_msp430_msp430_il.c.o -c ../librz/arch/isa/msp430/msp430_il.c
In file included from ../librz/arch/isa/msp430/msp430_il.c:208:
../librz/arch/isa/msp430/msp430_instruction_parse.inc:113: error: identifier expected

@@ -0,0 +1,208 @@
// SPDX-FileCopyrightText: 2023 Mostafa Mahmoud <ubermenchun@gmail.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it 2024?

librz/arch/isa/msp430/msp430_instruction_parse.inc Outdated Show resolved Hide resolved
static ut16 parse_reg(char *reg);
static ut16 parse_hex(char *num, int *end_offset);

static Msp430SourceAddressingMode parse_src(char *src_str, RZ_OUT ut32 *src, RZ_OUT int *dst_offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing string, I wonder if it's better to modify the librz/arch/isa/msp430/msg430_disas.c instead and provide this kind of information as a C structure, similarly how other disassemblers do it, e.g. capstone-based ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing the operands is definitely pretty ugly, but it's way easier to do than messing with the internals of the disassembler.

Someone, very likely me, will probably touch the MSP430 plugin soon to add assembly anyway (there is no MSP430 assembler at the moment, I'm using an external Python script I found somewhere on Github to obtain the binary of instructions), that will be a suitable moment to rethink disas.c..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend focusing on improving the disassembly, analysis, and RzIL first and foremost before spending time adding the assembler. These are higher priority, and improving disassembler to provide instruction details for the analysis is much more important than writing an assembler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're actually right, the parsing code is a constant source of overflow headache and bugs, it keeps making AddressSanitizer complain about illegal reads. I decided to remove the problem from its root and delete the code, I will instead make the disassembler bubble up the information I need.

@XVilka
Copy link
Member

XVilka commented Apr 30, 2024

@moste00 any updates on this PR?

@moste00
Copy link
Contributor Author

moste00 commented Apr 30, 2024

@moste00 any updates on this PR?

Really sorry for being held up this long, many many interviews and disruptions in my irl life.

What remains isn't that much. Just a rebase into a newer dev, and debugging the issue of why git-clang-format gives the green on my machine but somehow fails in the CI even though the 2 versions match. There is a complaint by tinyCC about a switch case somewhere, but I think it's a nitpick and I have a fix in mind. I will push later today and ask for review.

All in all, most of the major work is done IMO, the supported instructions pass into the lifter and come out as IL trees. It's just a matter of seeing why the CI doesn't think so from now on.

@moste00 moste00 force-pushed the msp430_il_lifting branch 2 times, most recently from 7c45856 to c0aa04c Compare May 1, 2024 19:14
@XVilka
Copy link
Member

XVilka commented May 5, 2024

@moste00 please check Mattermost.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typo in the license identifier was likely the reason for the failing clang-format

@XVilka XVilka added this to the 0.8.0 milestone May 5, 2024
@moste00 moste00 force-pushed the msp430_il_lifting branch 4 times, most recently from 628dac4 to a85f5f5 Compare May 5, 2024 16:34
@moste00 moste00 force-pushed the msp430_il_lifting branch 2 times, most recently from 525219e to 4d28ad6 Compare May 23, 2024 16:06
@XVilka
Copy link
Member

XVilka commented Jun 1, 2024

@moste00 please rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants