Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating cycle tables #82

Closed
emoon opened this issue May 2, 2016 · 23 comments
Closed

Generating cycle tables #82

emoon opened this issue May 2, 2016 · 23 comments

Comments

@emoon
Copy link
Contributor

emoon commented May 2, 2016

Hi,

Given that r68k has cycle counts and a fair amount of info I wonder if it would be possible to generate tables like this?

| add.b.w  | Dn | An | (An) | (An)+ | -(An) | d(An) | d(An,Dn) | xxx.W | xxx.L |
|----------|----|----|------|-------|-------|-------|----------|-------|-------|
| Dn       |  4 |  8 |  12  |  12   |   16  |   16  |     20   |   16  |   20  |
| An       |  4 |  8 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| (An)+    |  8 | 12 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| -(An)    | 12 | 16 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| d(An)    | 12 | 16 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| d(An,Dn) | 16 | 20 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| xxx.W    | 12 | 16 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| xxx.L    | 16 | 20 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| d(Pc)    | 12 | 16 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| d(Pc,Dn) | 16 | 20 |   *  |   *   |    *  |    *  |      *   |    *  |    *  |
| #xxx     |  8 |  * |  20  |  20   |  22   |   24  |    26    |   24  |   26  |

I was about to do this manually but given that we have lots of info in the code I wonder if you think it would be possible to do it?

@marhel
Copy link
Owner

marhel commented May 2, 2016

Well, if you look at the cycle counts in the M68000UM, you'll see several different tables in various formats, with footnotes and so on. So generating comprehensible and compact output would be a lot of work I think.

Also the cycle counts are not tabulated anywhere in memory, but are just parameters to macros, and hard coded as Cycles() return values from op-funs. So there's work to be done even if you wanted to output something in a very flat format such as;

add.b.w Dn, Dn 4
add.b.w Dn, An 8
add.b.w Dn, (An) 12

or even simpler, by just listing the function name (which encodes the same info) and the cycle count...

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

Yeah I guess so.. just trying to figure out if there is a way to automate the process :)

@marhel
Copy link
Owner

marhel commented May 2, 2016

If we need/want the cycle counts in the disassember metadata about each operation, we could consider moving cycle counts to the op_entry macro, or a separate table, which has the side effect of making this data available for other purposes (like yours) as well. But even with a table, some cycle counts, like for shifts, depend on actual parameters and need some kind of footnote anyway.

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

I have actually thought about adding cycle counts to Capstone also.

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

Ok I got a crazy idea now :)

Take this table https://github.com/ezrec/vasm/blob/master/cpus/m68k/opcodes.h

And from it generate a big source file, compile it and then run it through r68k and collect all the timings :)

@marhel
Copy link
Owner

marhel commented May 2, 2016

I... don't quite follow. :) Perhaps due to looking at that table on my phone. What would be the difference compared to recording the output from a run through all opcodes? Also, just executing an op will not generate all possible timings (shift, trap cases, etc)

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

In this case I would have slightly more control and know if there are ranges for the op (for example lsl is #1-8) Also I'm not interested in a complete table but mostly when you do regular program (trap isn't often used in 'regular' code)

@marhel
Copy link
Owner

marhel commented May 2, 2016

Performance analysis / optimization support?

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

This would be a table you use when investigating code or writing new code.

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

I could of course do it manually but figured it would be easier to generate it :)

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

Here is a better looking example (text itself needs to be done manually of course)

https://gist.github.com/emoon/5d19bdcee4ff3ca8bcf030b4347d7e1d

@marhel
Copy link
Owner

marhel commented May 2, 2016

Ooh, very nice! I'm seeing a Dash docset ;)

@emoon
Copy link
Contributor Author

emoon commented May 2, 2016

Yeah I think that would be nice. I guess the question is how much info should one include? In my case I just want what it does and cycles but for more complete opcode and the bits needs to be there also.

@emoon
Copy link
Contributor Author

emoon commented Aug 19, 2016

I figured out a way to do this. It's a bit manual but I take help of by vasm by calling it also :) here is the code https://github.com/emoon/68k_cycle_table_gen It actually doesn't use r68k at all so I just define it myself and use the assembler to verify which instructions that are legal. Right now it will print out something like this for add

     Running `target/debug/68k_cycle_table_gen`
| add.w    | Dn | An | (An) | (An)+ | -(An) | d(An) | d(An,Dn) | xxx.W | xxx.L |
|----------|----|----|------|-------|-------|-------|----------|-------|-------|
| Dn       | 4  | 8  |  12  |  12   |  14   |  16   |    18    |  16   |  20   |
| An       | 4  | 8  |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| (An)     | 8  | 12 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| (An)+    | 8  | 12 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| -(An)    | 10 | 14 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| d(An)    | 12 | 16 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| d(An,Dn) | 14 | 18 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| xxx.W    | 12 | 16 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| xxx.L    | 16 | 20 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| d(Pc)    | 12 | 16 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| d(Pc,Dn) | 14 | 18 |  *   |   *   |   *   |   *   |    *     |   *   |   *   |
| #xxx     | 8  | 12 |  16  |  16   |  18   |  20   |    22    |  20   |  24   |

@emoon emoon closed this as completed Aug 19, 2016
@emoon
Copy link
Contributor Author

emoon commented Dec 19, 2016

Just a small update on this.

I generate this now: https://gist.github.com/emoon/5d19bdcee4ff3ca8bcf030b4347d7e1d

So the way it works:

  1. Descriptions has been taken from the 68k manuals. May be copyright but I doubt anyone really cares so :)
  2. I generate permutations of of possible combinations of instructions pairs according to the specs here https://github.com/emoon/68k_documentation_gen/blob/master/src/main.rs#L263
  3. Generate those to files then using vasm to compile the files (in parallel using Rayon)
  4. I gather up all the output files (generated assembly) for the cases the compiler said it was ok and generate a new file of valid (binary) 68k code.
  5. I then use Musashi to emulate the code and report back the cycle count for each instruction and then I feed that into the table.

The table needs work and some things are missing (asr/lsr/ror/etc) but still fairly good.

I went with Musashi as I didn't know the state of r68k as a crate. Is that possible to use now? Some of these numbers generated in the tables doesn't look fully correct (jsr) but it might be me doing something wrong also.

@marhel
Copy link
Owner

marhel commented Dec 19, 2016

Nifty!

Yes, r68k has been converted to a lib now (the CPU now lives in r68k-emu, the assembler, disassembler and SRecord support in r68k-tools, and some common constants in r68k-common).

There's still some work to do regarding creation of a new instance of r68k (a Core instance), as we still need to explicitly insert the real instruction set instead of a fake testing one, and I'd like to inject the memory implementation as a type parameter. This is currently hardcoded to be LoggingMem which is really only practical for testing (as it records each and every memory access) but should allow any implementation of the AddressBus trait. Especially since I think memory mapped devices should be implemented behind an implementor of that trait. I don't want the user to have to specify too many type parameters either, that wouldn't be ergonomic. I really haven't looked at the lib from the perspective of a user enough.

But once the r68k assembler is completed, I guess you could acheive the same thing without needing to shell out to vasm. But it's also a "one time thing" I guess, once the tables are good enough, they're done.

@emoon
Copy link
Contributor Author

emoon commented Dec 21, 2016

Yeah I guess you are right that this is more of a "one time thing" It's kinda weird that Musashi reports 4 cycles for move.l #x,dn but maybe I'm doing something wrong :)

@marhel
Copy link
Owner

marhel commented Dec 21, 2016

Ah, I think I know what your issue is! Vasm has a few optimizations, and I just tested and it seems that it optimizes move.l #x,Dn to moveq #x,Dn as long as #x fits in a byte. You'd want to use the -no-opt flag to vasm to avoid such surprises.

@emoon
Copy link
Contributor Author

emoon commented Dec 21, 2016

Ah yes :) that is very likely the case. Thanks for pointing that out.

@marhel
Copy link
Owner

marhel commented Dec 21, 2016

One could also imagine a 'snippet cycle cost analyzer' feature in the assembler, where you give it a snippet, and it steps that through the emulator, and outputs a listing file annotated with the cycle cost of each line, as well as a total cost. Trivial in the linear case at least, but less so with loops and branches.

Also, the emulator could potentially have some kind of performance instrumentation execution mode so that it could run a piece of code, including loops or anything, and then output cycle counts and how many times it hit each line, how many times it took a branch (or not) etc. This would help finding hotspots and where most of the time is spent in a loop and so on.

@emoon
Copy link
Contributor Author

emoon commented Dec 21, 2016

Yeah I'm planning to code a new assembler at some point. It will likely have some feature like that. For loops you can try to detect "basic blocks" (meaning loops that and what goes into them) and show an approximate cost (excluding external calls as if you have jsr it can count for the call cost but not what goes into it)

Something I want in the assembler is register allocation. A simple allocator can be found over here https://github.com/deplinenoise/deluxe68 it's quite manual but still fairly useful when writing assembly code.

@marhel
Copy link
Owner

marhel commented Dec 21, 2016

Mm, but at some point along that line you end up with C :)

And then when the translation from the language to the assembly becomes non-trivial/non-obvious, instruction cycle counts starts becoming less useful. Cost per basic block might still be a useful metric, even though you don't really care about the individual instructions in the block anymore - unless you need to figure out the reason for an extreme block cost.

In my own high-level goals, where r68k is a piece of a larger puzzle, an more featureful assembler isn't that important (I need to use existing tools, or I will reinvent too many wheels along the way), but it was just too much fun not to do a simple one now :)

@emoon
Copy link
Contributor Author

emoon commented Dec 21, 2016

Mm, but at some point along that line you end up with C :)

Depends. Deluxe68 for example doesn't emit any register spilling on the stack unless you tell it to. If you have too many registers it will fail to compile and you will have to deal with it.
In C you can have any number of variables and hope the compiler figures it out. Here you write code without any surprises.

Also depending if you write to 000 or 060 you may care more or less. 000 for example is quite slow so you need to take quite good care that an likely want to know the cycle count in detail. 060 is a different beast where you can pair instructions to have them execute in one cycle (has it has two integer pipes)

For me when writing assembly I want to be in full control but it's a bit tedious to keep track in the head all the time in which register you put a variable and having even a basic register allocator helps with that and also prevents you from using a register you though would be free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants