Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate instruction templates as C structs #60

Merged
merged 5 commits into from
Apr 22, 2022
Merged

Generate instruction templates as C structs #60

merged 5 commits into from
Apr 22, 2022

Conversation

thestr4ng3r
Copy link
Member

@thestr4ng3r thestr4ng3r commented Apr 10, 2022

DO NOT SQUASH

Any instruction-specific info is now generated as constant C structures, rather than different executable code for each instruction. To achieve this, a lot of logic has been moved from the generator to the C code. The result is a drastic improvent wrt. compile times and code size, in both source and binary form.

Changes, like refactors, that would affect the old codegen too have been extracted into individual commits, so only the last commit in this pr is the "real" change and the previous are preparation. That is so any potential breakage in those changes can be tracked down much more easily.

Some comparison of why this change is beneficial for the compilation of hexagon_disasm.c:

before after
compile time -O0 -g 2.4s 0.3s
compile time -O3 7.2s 0.4s
object file size -O0 -g 4.0M 767K
object file size -O3 1.5M 553K
size of hexagon_disasm.c 5.1M 2.7M

Compiler is Apple clang 13.1.6 on M1.

@thestr4ng3r thestr4ng3r force-pushed the compact branch 2 times, most recently from b05ecad to a3c6bb0 Compare April 10, 2022 15:50
Immediate.py Outdated Show resolved Hide resolved
@thestr4ng3r thestr4ng3r force-pushed the compact branch 3 times, most recently from de459c1 to ab8ff6a Compare April 17, 2022 14:36
@thestr4ng3r thestr4ng3r changed the title Compact code Generate instruction templates as C structs Apr 17, 2022
@thestr4ng3r thestr4ng3r marked this pull request as ready for review April 17, 2022 15:10
@thestr4ng3r thestr4ng3r requested a review from Rot127 April 17, 2022 15:22
@thestr4ng3r thestr4ng3r force-pushed the compact branch 3 times, most recently from 038848c to e1aa083 Compare April 19, 2022 11:24
handwritten/hexagon_disas_c/functions.c Outdated Show resolved Hide resolved
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I hope it doesn't affect the runtime performance of the disassembly itself.
Nice to see you modernized Python code as well.

@thestr4ng3r
Copy link
Member Author

I hope it doesn't affect the runtime performance of the disassembly itself.

Time complexity is identical. The rest depends on many small factors like code and data caches, optimization, etc., but nothing concrete that would make one significantly faster than the other.
If performance is a concern, then the iteration over the instructions should be resolved into for example one or more hashtable lookups, but that is identical in both cases.

@Rot127
Copy link
Member

Rot127 commented Apr 21, 2022

I get an error running the tests:

======================================================================
ERROR: test_shifting_c_code (testEncoding.TestInstructionEncoding)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/repos/rz-hexagon/Tests/testEncoding.py", line 99, in test_shifting_c_code
    SparseMask(InstructionEncoding(self.json["A2_combineii"]["Inst"]).operand_masks["Ii"]).c_expr,
AttributeError: 'SparseMask' object has no attribute 'c_expr'

----------------------------------------------------------------------
Ran 20 tests in 41.994s

FAILED (errors=1)

@Rot127
Copy link
Member

Rot127 commented Apr 21, 2022

flake complains about formatting:

flake8 --max-line-length=120
./LLVMImporter.py:19:1: F401 'helperFunctions.indent_code_block' imported but unused
./LLVMImporter.py:36:1: E302 expected 2 blank lines, found 1
./LLVMImporter.py:434:9: F841 local variable 'var' is assigned to but never used
./LLVMImporter.py:435:9: F841 local variable 'signed_imm_array' is assigned to but never used
./LLVMImporter.py:595:121: E501 line too long (124 > 120 characters)
./HardwareRegister.py:142:1: W391 blank line at end of file
./Operand.py:12:1: F401 'PluginInfo' imported but unused
./Operand.py:16:1: E302 expected 2 blank lines, found 1
./Operand.py:78:1: E302 expected 2 blank lines, found 1
./InstructionTemplate.py:10:1: F401 'Register' imported but unused
./InstructionTemplate.py:11:1: F401 'HardwareRegister.HardwareRegister' imported but unused
./InstructionTemplate.py:19:31: E261 at least two spaces before inline comment
./InstructionTemplate.py:21:1: E302 expected 2 blank lines, found 1
./InstructionTemplate.py:184:9: F841 local variable 'var' is assigned to but never used
./InstructionTemplate.py:190:121: E501 line too long (128 > 120 characters)
./Register.py:11:1: F401 'Operand.SparseMask' imported but unused
./Immediate.py:10:1: F401 'PluginInfo' imported but unused
./Immediate.py:11:1: F401 'Operand.SparseMask' imported but unused
./Tests/testEncoding.py:12:1: F401 'Operand.Operand' imported but unused

Please run black -l 120 *.py && flake8 --max-line-length=120 again or change the formatting to something you find more appropriate and update the readme.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the effort! It's lovely!

Operand.py Show resolved Hide resolved
handwritten/hexagon_disas_c/types.c Show resolved Hide resolved
handwritten/hexagon_disas_c/functions.c Outdated Show resolved Hide resolved
@thestr4ng3r
Copy link
Member Author

@Rot127 How do you run the unit tests? I get module import errors using the command from the readme:
Bildschirmfoto 2022-04-21 um 19 04 15

* Op mask is now a data structure instead of final string
* This mask is injected into the Opcode's opcode_mask member
* c_opcode_parsing is a pure property of Opcode instead of the effectful
  add_code_for_opcode_parsing()
Many instructions have a `Rx` operand, followed by `Rxin` later. In such
a case, the re.sub() was replacing both `Rx` occurences in the string
with `%s`, causing the `Rxin` one to be incorrectly printed as `%sin`.
@Rot127
Copy link
Member

Rot127 commented Apr 22, 2022

@thestr4ng3r Have you installed it with pip3 install -e .?

@Rot127
Copy link
Member

Rot127 commented Apr 22, 2022

BTW just saw that the README install instructions are wrong.
Could you updated them in this PR?
They should be:

git clone https://github.com/rizinorg/rz-hexagon.git
cd rz-hexagon/
pip3 install -r requirements.txt
# If you enjoy some colors
pip3 install -r optional_requirements.txt
# Install as develop package
pip3 install -e .

Edit: Will move this in another PR.

@thestr4ng3r
Copy link
Member Author

@Rot127 Thanks, now the tests work. I have adjusted/added comments in the code according to your review. Please check again now.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
You can merge after those two flake warnings are fixed.

flake8 --max-line-length=120
./Tests/testEncoding.py:12:1: F401 'Operand.Operand' imported but unused
./Tests/testEncoding.py:95:9: F841 local variable 'hex_insn' is assigned to but never used

@Rot127
Copy link
Member

Rot127 commented Apr 22, 2022

Or better lets wait with merging until the rizin repo PR is green.

Any instruction-specific info is now generated as constant C structures,
rather than different executable code for each instruction. To achieve
this, a lot of logic has been moved from the generator to the C code.
The result is a drastic improvent wrt. compile times and code size,
in both source and binary form.
@thestr4ng3r
Copy link
Member Author

Fixed the flake warnings.

@thestr4ng3r
Copy link
Member Author

green

@thestr4ng3r thestr4ng3r merged commit 1ca78d0 into master Apr 22, 2022
@thestr4ng3r thestr4ng3r deleted the compact branch April 22, 2022 15:50
@Rot127 Rot127 mentioned this pull request Apr 24, 2022
7 tasks
@Rot127
Copy link
Member

Rot127 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants