Skip to content

Loading…

memset optimizer(?) still thinks DCPU is 8-bit addressible #14

Open
a1k0n opened this Issue · 12 comments

2 participants

@a1k0n
llvm-dcpu16 member

The optimizer seems to think a 16-bit set is enough to clear two adjacent words in some cases. The code is correct without -O.

$ cat memset3.c
void* mymemclr4(int *buf, int len) {
  int *end = buf+len;
  while(buf != end) {
    *buf++ = 0;
    *buf++ = 0;
    *buf++ = 0;
    *buf++ = 0;
  }
}

extern int buf[128];
int main() {
  mymemclr4(buf, sizeof(buf));
}

$ bin/clang -target dcpu16 -S memset3.c -O -o -
memset3.c:9:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
        ; .file "memset3.c"
        .text
        .globl  mymemclr4
        ; .align        1
:mymemclr4
        IFE     B, 0x0
        SET     PC, .LBB0_2
:.LBB0_1
        SET     [0x2+A], 0x0       ; <--- it somehow thinks set [a+2],0 and set [a], 0
        SET     [A], 0x0           ; <--- is sufficient to clear 4 words.
        ADD     A, 0x4
        ADD     B, 0xfffc    ; <-- it's also somewhat perplexing that it doesn't generate sub B, 4 here
        IFN     B, 0x0
        SET     PC, .LBB0_1
:.LBB0_2
        SET     PC, POP

        .globl  main
        ; .align        1
:main
        SET     PUSH, J
        SET     J, SP
        SET     A, SP
        SET     [A], 0x80
        SET     A, buf
        SET     B, 0x0
        SET     C, 0x0
        JSR     memset
        SET     A, 0x0
        SET     J, POP
        SET     PC, POP
@krasin
llvm-dcpu16 member

Thanks for spotting this. This is a serious issue. :(

@a1k0n
llvm-dcpu16 member

And actually, I didn't even notice this at first but it's not even calling "mymemclr4" -- it's inferring I'm doing a memset, and then generating an (incorrect) call to memset with a length of 0 per my other bug. :(

@ghost

I'm pretty sure that this is a LLVM bug and it has the same cause as issue #153 of LLVM-DCPU16.

@a1k0n
llvm-dcpu16 member

Well, you and Blei had to do a ton of work to make LLVM even familiar with the concept of 16-bit bytes, so I imagine this is going to be a continuation of that work rather than a bug in the original code per se. It looked to me like it was trying to figure out these store spans using BitsPerByte in the memset generator, so it's probably incompletely converted but I haven't found the fix.

Still, when have we last synced with upstream?

@krasin
llvm-dcpu16 member

Thanks for the reminder. Syncing to upstream now...

@krasin
llvm-dcpu16 member

The merge is completed. See llvm-dcpu16/llvm-dcpu16#165 and #15

@a1k0n
llvm-dcpu16 member

Ohho!

define i16 @main() nounwind {
entry:
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 124) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 120) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 116) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 112) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 108) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 104) to i32*), align 1
  store i32 0, i32* bitcast (i16* getelementptr inbounds ([128 x i16]* @buf, i16 0, i16 100) to i32*), align 1

It thinks it can make 32-bit stores. That's why it's doing this crazy stuff.

@a1k0n
llvm-dcpu16 member

Okay, I have a patch which fixes a bunch of these issues in LLVM (basically replacing <8> with getBitsPerByte() everywhere). However, clang needs similar patching as it generates code which assumes, for instance, a 32-bit store is 4 "bytes" wide and it just generates invalid code. I'll have a pull request ready later.

I'm starting to wonder whether this is too invasive and we should just somehow force pointer alignment to 2 8-bit bytes.

Also the memset call is 32 bits wide and it puts the lower 16 bits of the length arg on the stack via [A], which I guess is correct. I didn't notice that before. We need to implement a memset builtin to avoid that.

@krasin
llvm-dcpu16 member

Okay, I have a patch which fixes a bunch of these issues in LLVM (basically replacing <8> with getBitsPerByte() everywhere).

Sounds promising.

I'm starting to wonder whether this is too invasive and we should just somehow force pointer alignment to 2 8-bit bytes.

It might be that forcing pointer alignment to 2 8-bit bytes and tricking the assembler printer is easier, but it's clearly a hack.
getBitsPerByte approach appears to be as invasive as we feared, but it's (in the long term) might be cleaned up enough to be upstreamed. llvmdev mailing list has a number of requests to add support for non 8-bit bytes, but it has never been implemented properly. It would be nice if dcpu16 community would be able to accomplish this mission.

Also the memset call is 32 bits wide and it puts the lower 16 bits of the length arg on the stack via [A], which I guess is correct. I didn't notice that before. We need to implement a memset builtin to avoid that.

Agree.

@a1k0n a1k0n closed this
@ghost

Just for reference: How did we fixed this? With the LLVM patch from Blei? (8675f9174f35bb539082709a65b83cf8b1a376b8)

@a1k0n a1k0n reopened this
@a1k0n
llvm-dcpu16 member

Again it discarded my comment. But I shouldn't have closed this one.

It seems Blei's fix-framepointer patch doesn't fix this as I had thought. If, in the above example, you change the loop to *buf++ = 1, then it works as intended -- this is specifically a bug in clearing memory, and it seems to stem from the clang side generating incorrect llvm code.

@a1k0n
llvm-dcpu16 member

I have just verified with the latest llvm that clang -fno-builtins cures this problem so I was correct to suspect the memset built-in initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.