-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-march=...Xcustom no longer working? #190
Comments
I think @palmer-dabbelt talked to me about this a little while ago but I don't remember what the result was. |
We decided to not upstream the Xcustom support because the RoCC ISA doesn't have a proper document describing the ISA. This stuff was kind of a CS250 hack anyway, and we're trying to keep those out of the official releases. The idea was that we could support this with CPP macros instead of proper assembler support, since those don't have to go upstream (and then be supported forever). The general idea is that you'd emit a ".word" instead of an instruction. I believe something like
should do it. We'll have to resurrect RoCC support next time a class is taught, but it probably won't happen before that: Hwacha is the only accelerator we really use, and that has an assembler fork so there's proper support for the instructions that use the customN opcodes. If this approach doesn't work then I'd be amenable to bringing back some amount of RoCC support, but it'd have to be slightly cleaner. If you do make it work, I'd love to get it upstream somewhere. Sorry we broke your stuff! |
Ah, that makes total sense. Thanks for the informative answer, Palmer. Upstreaming undocumented stuff seems ill advised (though the fact that you all are getting this upstreamed is super exciting). I'll give this a shot and update all public code that I have documenting this (including basic example code for exercising the rocket-chip RoCC examples). This should be fine as I'm already using macros, it's just no longer mapping to some FWIW, if there's any way that you all can keep me in the loop of what's going on with extension / RoCC support I would really appreciate it. As you guys know, I've been managing my own stuff and trying to keep it semi-synced with rocket-chip master (possibly ill-advised, but it's not too bad once I'm running my own regressions). However, I'm now the local go-to guy for all "my rocket / accelerator doesn't work / is broken" queries. Having some additional information straight from you all would dramatically help. |
I actually had no idea anyone outside of Berkeley was using RoCC for anything. That's why I just talked to Colin about this, because he was the last 250 TA. I'll try and remember you're using it in the future, but now that we're upstream any assembler changes will have to go through the binutils mailing list. If you're doing odd RISC-V assembly things it might be worth subscribing (they have digests). Sorry, again, for the trouble! |
I'm continually evangelizing, so there may be more in the future... Also, I do know that Michael Taylor (calling @anujnr) is doing something with it though he's been using Chisel wrappers to jump to his preferred SystemVerilog. Thanks for the pointer. No problem at all. You guys are cranking out good stuff and it's fun to be in that ecosystem. |
Following your approach, @palmer-dabbelt, the following gets the job done: #define STR1(x) #x
#define STR(x) STR1(x)
#define EXTRACT(a, size, offset) (((~(~0 << size) << offset) & a) >> offset)
#define CUSTOMX_OPCODE(x) CUSTOM_##x
#define CUSTOM_0 0b0001011
#define CUSTOM_1 0b0101011
#define CUSTOM_2 0b1011011
#define CUSTOM_3 0b1111011
#define CUSTOMX(X, rd, rs1, rs2, funct) \
CUSTOMX_OPCODE(X) | \
(rd << (7)) | \
(0x7 << (7+5)) | \
(rs1 << (7+5+3)) | \
(rs2 << (7+5+3+5)) | \
(EXTRACT(funct, 7, 0) << (7+5+3+5+5))
#define CUSTOMX_R_R_R(X, rd, rs1, rs2, funct) \
asm ("mv a4, %[_rs1]\n\t" \
"mv a5, %[_rs2]\n\t" \
".word "STR(CUSTOMX(X, 15, 14, 15, funct))"\n\t" \
"mv %[_rd], a5" \
: [_rd] "=r" (rd) \
: [_rs1] "r" (rs1), [_rs2] "r" (rs2) \
: "a4", "a5"); This introduces three additional moves vs. the old rocc-support toolchain, but it works. Edit: Fixed clobbering issue with |
Great! Thanks for posting this, we'll need it as well. If the moves are a performance problem then I can try to figure something out (maybe adding something to either the assembler or compiler), I don't want to have a performance regression because of a cleanup. |
No problem.
|
Is there some way to do this without the overhead of the extra moves? We are using the ROCC interface extensively and the overhead of those extra moves will be prohibitive. We want the same affect as our current single-instruction inline assembly where the compiler can do the register allocation and we don't need to do any extra moves. |
@cbatten You should be able to use the register keyword with an asm constraint, so the compiler will only emit the moves if necessary:
A better solution would be to allow assembler plugins so that you can add custom instructions without modifying the binutils source code, but I don't know to do this in a way that the binutils maintainers would allow. @palmer-dabbelt any ideas? |
This needs to be inlined to avoid any performance overhead so I am not sure the above approach which basically still does explicit register allocation will work well. The alternative is we will have to fork the upstream repo and add the custom0 instructions back into binutils -- which is kind of annoying ... I mean we are already forking the RISC-V compiler flow for research, but it would be nice if the custom0 stuff worked out of the box since we also use that in courses so that students can download the upstream RISC-V cross-compiler (instead of our fork) and use it for projects. I guess for teaching we can use CSRs to create a register-mapped interface for our accelerators. That is actually what we were doing before, but I was thinking of moving to ROCC. Sticking with a register-mapped interface might be simpler now though ... |
I just meant to show an example... I agree you'd need to inline this or put it in a macro. We took the customX opcodes out because of a policy decision not to have any nonstandard extensions supported by the upstream tools. I think an assembler plugin, if possible, is the right way to add them back (and is potentially better than the customX approach, because it is not limited to RoCC and provides more expressive [dis]assembly). |
Makes sense -- although binutils doesn't support assembler plugins. I think we will just end up using CSRs. (as an aside: looking forward to receiving my founder's edition HiFive1 board :) |
:) |
Merging @aswaterman's suggestion does appear to produce reasonable assembly output with The above macro then becomes (source): #define STR1(x) #x
#define STR(x) STR1(x)
#define EXTRACT(a, size, offset) (((~(~0 << size) << offset) & a) >> offset)
#define CUSTOMX_OPCODE(x) CUSTOM_ ## x
#define CUSTOM_0 0b0001011
#define CUSTOM_1 0b0101011
#define CUSTOM_2 0b1011011
#define CUSTOM_3 0b1111011
#define CUSTOMX(X, rd, rs1, rs2, funct) \
CUSTOMX_OPCODE(X) | \
(rd << (7)) | \
(0x7 << (7+5)) | \
(rs1 << (7+5+3)) | \
(rs2 << (7+5+3+5)) | \
(EXTRACT(funct, 7, 0) << (7+5+3+5+5))
// Standard macro that passes rd, rs1, and rs2 via registers
#define ROCC_INSTRUCTION(X, rd, rs1, rs2, funct) \
ROCC_INSTRUCTION_R_R_R(X, rd, rs1, rs2, funct, 10, 11, 12)
// rd, rs1, and rs2 are data
// rd_n, rs_1, and rs2_n are the register numbers to use
#define ROCC_INSTRUCTION_R_R_R(X, rd, rs1, rs2, funct, rd_n, rs1_n, rs2_n) { \
register uint64_t rd_ asm ("x" # rd_n); \
register uint64_t rs1_ asm ("x" # rs1_n) = (uint64_t) rs1; \
register uint64_t rs2_ asm ("x" # rs2_n) = (uint64_t) rs2; \
asm volatile ( \
".word " STR(CUSTOMX(X, rd_n, rs1_n, rs2_n, funct)) "\n\t" \
: "=r" (rd_) \
: [_rs1] "r" (rs1_), [_rs2] "r" (rs2_)); \
rd = rd_; \
} From that, an example load into the accumulator example is then (source): #define k_DO_WRITE 0
#define XCUSTOM_ACC 0
#define doWrite(y, rocc_rd, data) \
ROCC_INSTRUCTION(XCUSTOM_ACC, y, data, rocc_rd, k_DO_WRITE); Compiling the following program with int main() {
uint64_t data [] = {0xdead, 0xbeef, 0x0bad, 0xf00d}, y;
uint16_t addr = 1;
doWrite(y, addr, data[0]);
doRead(y, addr);
assert(y == data[0]);
uint64_t data_accum = -data[0] + data[1];
doAccum(y, addr, data_accum);
assert(y == data[0]);
doRead(y, addr);
assert(y == data[1]);
uint64_t data_addr;
doTranslate(data_addr, &data[2]);
doLoad(y, addr, data_addr);
assert(y == data[1]);
doRead(y, addr);
assert(y == data[2]);
return 0;
} Gets me this assembly which seems to be great: 15290: 00f13023 sd a5,0(sp)
15294: 00e13423 sd a4,8(sp)
15298: 00d13823 sd a3,16(sp)
1529c: 00078593 mv a1,a5
152a0: 00100613 li a2,1
152a4: 00c5f50b 0xc5f50b
152a8: 00000593 li a1,0
152ac: 02c5f50b 0x2c5f50b
152b0: 04f51e63 bne a0,a5,1530c <main+0xa8>
152b4: 00050813 mv a6,a0
152b8: 40a705b3 sub a1,a4,a0
152bc: 06c5f50b 0x6c5f50b
152c0: 0b051e63 bne a0,a6,1537c <main+0x118>
152c4: 00000593 li a1,0
152c8: 02c5f50b 0x2c5f50b
152cc: 00050793 mv a5,a0
152d0: 08e51863 bne a0,a4,15360 <main+0xfc>
152d4: 01010593 addi a1,sp,16
152d8: 00000613 li a2,0
152dc: 00c5f52b 0xc5f52b
152e0: 00100613 li a2,1
152e4: 00050593 mv a1,a0
152e8: 04c5f50b 0x4c5f50b
152ec: 04f51c63 bne a0,a5,15344 <main+0xe0>
152f0: 00000593 li a1,0
152f4: 02c5f50b 0x2c5f50b
152f8: 02d51863 bne a0,a3,15328 <main+0xc4> Note, the macro above is going to always set the |
Nothing like trying to write an assembler using pre-processor macros ... |
The other solution is to write code pretending you have an extended assembler, then write a small program to process the assembly between |
If im understanding correctly, the impression is that upstream would not
accept the Custom* opcodes because the are no-standard? I don't understand
why this would be a problem.I know that the nios2 as has support for custom
instruction opcodes.
…On Wed, Nov 30, 2016 at 11:20 PM sorear ***@***.***> wrote:
The other solution is to write code pretending you have an extended
assembler, then write a small program to process the assembly between gcc
-S and as. Glasgow Haskell did this for many years to work around the
lack of tail-call optimization in GCC at the time and to implement an
ad-hoc optimization for single-entry vtables (very common in Haskell),
called the "evil mangler", nobody liked it but it worked well most of the
time.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#190 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQOaucyXxrTsKXwb_xnHOUEn_1jccPmks5rDnU0gaJpZM4Knp4M>
.
|
The reason is that we are not willing to support other people's nonstandard
extensions, so supporting Berkeley's would be unfair.
On Thu, Dec 1, 2016 at 9:09 AM, Joel Vandergriendt <notifications@github.com
… wrote:
If im understanding correctly, the impression is that upstream would not
accept the Custom* opcodes because the are no-standard? I don't understand
why this would be a problem.I know that the nios2 as has support for custom
instruction opcodes.
On Wed, Nov 30, 2016 at 11:20 PM sorear ***@***.***> wrote:
> The other solution is to write code pretending you have an extended
> assembler, then write a small program to process the assembly between gcc
> -S and as. Glasgow Haskell did this for many years to work around the
> lack of tail-call optimization in GCC at the time and to implement an
> ad-hoc optimization for single-entry vtables (very common in Haskell),
> called the "evil mangler", nobody liked it but it worked well most of the
> time.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/riscv/riscv-gnu-toolchain/issues/
190#issuecomment-264096768>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQOaucyXxrTsKXwb_
xnHOUEn_1jccPmks5rDnU0gaJpZM4Knp4M>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#190 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA-7wtKCyF94wbLORQc00JHhKLZC4vyaks5rDv87gaJpZM4Knp4M>
.
|
I'm sure others will want toolchain support for their non-standard extensions. What's the recommended way to accomplish this? Inline assembler macros as in @seldridge's comment? Or would a GCC/LLVM plugin make sense? It would be great if we could provide an example of how to make use of non-standard extensions. Maybe we should package up @seldridge's header file into a repo -- along with some documentation on how to add/use extensions? |
I am using assembler macros like the following to wrap the
This allows me to write in my assembler code statements like this:
In order to convert those macros to use |
I think this is where LLVM provides a simpler solution, since you can contain the custom instruction definitions in their own standalone file. In addition, parts of LLVM already load shared libraries for out-of-tree passes so they would seem more receptive to having a true plugin than the gnu people. |
The above macro relies on C constructs, so when writing assembly I've been using the additional assembly-only macro: #define ROCC_INSTRUCTION_RAW_R_R_R(x, rd, rs1, rs2, funct) \
.word CUSTOMX(x, ## rd, ## rs1, ## rs2, funct) With the example you give above, something like the following should work: .macro timer rd rs
CUSTOMX(0, rd, rs, 0, 5)
.endm However, that's not giving you the "nice" register names, i.e., @arunthomas, I'm fine with going that route, but this is all a giant kludge. @colinschmidt, yes, LLVM would clearly be the cleanest way forward. |
Given that RISC-V has a limited number of 32-bit instruction formats, what if we added psuedo-instructions for each format? In other words these would be the same:
|
That's good to know @colinschmidt. It may make sense for LLVM to be the default answer for assembling and compiling to non-standard extensions, and then the GCC flow can abdicate responsibility for that. |
Using LLVM for this makes a lot sense.
…On Thu, Dec 8, 2016 at 2:02 PM Andrew Waterman ***@***.***> wrote:
That's good to know @colinschmidt <https://github.com/colinschmidt>. It
may make sense for LLVM to be the default answer for assembling and
compiling to non-standard extensions, and then the GCC flow can abdicate
responsibility for that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#190 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAC0sKn2_zu9wAPgQh-V0EFm2CY2v-2vks5rGFRcgaJpZM4Knp4M>
.
|
So I still think that having register and immediate-aware xx_type pseudos would be useful to the community. Would something like that have a chance of being accepted if done? |
I'm mostly concerned about assembling at this point. Presumably people can use GCC for codegen and assemble with LLVM-AS (once our ABI issues are all sorted out). I get the impression from @colinschmidt that it isn't an especially onerous task to extend the assembler, and we can always provide a template for people adding instructions using the existing standard formats. That said, I'm not outright opposed to providing a way to assemble arbitrary R/I-type instructions (I guess you provide the opcode, funct3, and funct7 as additional arguments). This is more generic than Xcustom, and the burden on the assembler is relatively low. |
@aswaterman |
I like the generic instruction approach. Can someone add an issue to riscv-binutils-gdb that asks for this support? We're probably not going to be able to get to it right away because we're trying to get everything cleaned up for GCC upstreaming, which means we need to get the ABI stuff sorted out. I think the GCC 7 window closes in about a month, so we're going to have to focus on stability instead of features for a bit. |
+1 for the generic instruction approach where binutils supports some generic R/I-type instructions. Seems super useful for rapid prototyping new instructions where you just want to play around with a new instruction before going through the hassle of adding it to the assembler directly -- plus it seems generic enough that it doesn't fall into the "customX is too custom to support upstream". |
macro not work for llvm:
Also not work for GCC assembler
I'm using both GCC assembler and clang because 1. I want to use clang and 2. clang's assembler has bug with medium memory model. So I need a way to use customized instruction and csr for both gcc and clang... and clearly it works for none of them... wondering what's going wrong. The macro:
ASM
And inline C
The only thing works is |
You need to provide a definition of MATCH_CSRRW. I don't know why you expect the compiler to define that. The assembler provides the .insn directive that can be used to create instructions. See the gas docs. |
I thought it's defined because I also included encoding.h.. but it seems not.. thank you. I will also try |
Somewhere between 8000750 (good) and 55f8308 (bad),
-march
seems to have stopped respectingXcustom
as an option.For example, when compiling the following program in
test.c
,-march
doesn't seem to do anything.Compiling with and without the option:
Am I doing something wrong or did this support get inadvertently dropped during the
-march
changes?The text was updated successfully, but these errors were encountered: