-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disassembler #77
Comments
I think an integrated disassembler/debugger would absolutely be useful! I haven't thought much about it in the design of r68k, though, and I'm going to focus on getting the cpu part usable first, but I am led to believe that you know a little something about both disassemblers and debuggers so you are welcome to come up with some designs how that might work in/with r68k! |
Sounds good :) I will try to think of something. |
If we're going to implement a disassembler in rust at some point, it would be a requirement, in my opinion to be able to QC that towards a known good implementation, much like we did the CPU. Can't imagine trying without it, in fact. |
Yeah that would be good. Not really exactly sure how to do it though. |
Would it be possible to create something like libdissasembler.a based on a working program, set up a memory buffer with some bytes corresponding to some instruction, asking for a disassembly of that buffer and checking that both generate the same output? |
Sure. Or actually generate a huge program from the QC tests that we already have here for valid instructions. |
I can likely add capstone (slimed down to only use the m68k backend) and add a basic Rust interface for it so it can be called from QC tests. Also Capstone supports several instances which can run in parallel so that can be used to compare with. |
Yes, the optable contains useful data for the disassembler! It would be able to find the matching entry for the instruction it was looking at, but there's not enough information how to interpret the "holes" in the mask, such as X and Y, if they represent data or address registers, or something else, and also it doesn't know the addressing mode apart from the hints usually present in the function name. So more information would be needed. Not having to use semaphores to enforce single threaded access would also be great! |
True. I guess it may actually be possible to just try all combos from 1 - 65536. Now there will be a bunch of illegals in there but that would be good to validate that it all works anyway (might be bugs on both) |
I can try to get a basic version of Capstone (68k disassembler part) in over the weekend and send a PR. |
Ok, I'm thinking we should do that work in a dev-branch for now, I just created the "disassembler"-branch for this. |
Sure! |
I took a shot an an initial implementation yesterday, and got something I was not entirely unhappy with, by adding a disassembly module alongside the cpu module, but was really bugged by the fact that any trivial change there resulted in a minutes wait to recompile 12K lines of unrelated stuff (which after macro expansion seems to be more like 50K lines). I guess this is the non-incremental compilation showing its ugly head. It made me want to rip out a few constants and other stuff to depend on, and work in an unrelated project, but I hope there's some better way. You seem to have a much better grasp of cargo and crates than I have, so I wondered if there was some smarter way do divide stuff into crates or submodules in a way that would allow us to work on the disassembler, and let it use constants/enums/structs/traits that we've already defined without needing to recompile everything every time. |
Also, I could push my WIP to the disassembler branch if you want to have a peek. |
Sure! |
Pushed now. I made a few constants and other stuff public in the old stuff, in order to be able to reuse it here. Also, I reused the LoggingMem to read ops out of "memory", but I guess that interface is not really useful if you are not disassembling a current r68k session with in-memory code. Feel free to change any and all things as well, this was just to get this part going somewhere :) |
what you could do is to split it up it to three separate crates
Now it would be possible to work "inside" the Disassembler crate only running In that case it's possible to add things under the example directory inside the |
Useful command to run/visualize the test I did write; |
Yeah, I saw the |
Before release it should be a library for sure (that is the way people would use it anyway) |
Also I'm not sure if you have push the |
Oops, you are right. I'll be pushed shortly! |
Now I pushed myself to push the missing file... ;) |
👍 |
Also got some time to update the disassembler/assembler to a state where I'm happy with the design. If you're interested, take a look at either the disassembler branch, or the new library branch. I've yet to actually use capstone, but it was quite fun to get the disassembler/assembler working in concert (anything that can be disassembled should also assemble back to the starting opcode). The disassembler/assembler just knows a subset of the ADD opcodes at this point. Adding more of the same kind of instructions (formats) with already implemented encodings should be trivial. Other instruction formats will need new decode/encode fn support. The assembler is quite primitive, and extremely picky about syntax at the moment - it will basically only accept exactly the syntax that the disassembler generates. The parser is also completely regex based, which is probably not that efficient (saw extreme speedups when I started compiling complex regexes once, instead of once per opcode) |
Cool. I would suggest looking at https://github.com/Geal/nom for parsing in the assembler. |
Updated the assembler parser based on pest and it now parses the 10K lines of this basic interpreter in 68k assembly successfully(*), which is a big step forward, as the old regex-based parser was very limited, but the new parser accepts actual code. Note that while the parser is now good, the assembler itself still just supports a handful of opcodes. I looked at nom, but found pest to be much more approachable. *) Well, almost anyway. I decided to only support semicolon comments at the moment, so I edited the file slightly first, and it doesn't recognize the register lists of the movem instruction, nor the IF-statements (conditional assembly) yet. Movem needs to be supported when I get around to actually implement movem support in the disassembler/assembler - but conditional assembly is not a big priority at this point. |
Something that is very useful to have in an emulator core is the ability disassemble instructions for various reasons.
Currently r68k doesn't have one. I have implemented one (in C here) https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c
This was also based on Musashi but with a fair amount of bugs fixed. Also this version doesn't just do instruction printing but allow you to see which registers, addressing mode, etc is being used for an instruction.
Rewriting this code in Rust is possible for sure but a bunch of work. An alternative would be to rewrite this C code a bit and have a Rust wrapper around it so the user of r68k would only 'see' the Rust part.
Just wanted to hear your thoughts about it.
The text was updated successfully, but these errors were encountered: