Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestSegv/Segv failure with 'unknown pc' on linux-riscv64-unmatched #52963

Open
bcmills opened this issue May 18, 2022 · 16 comments
Open

runtime: TestSegv/Segv failure with 'unknown pc' on linux-riscv64-unmatched #52963

bcmills opened this issue May 18, 2022 · 16 comments
Labels
arch-riscv NeedsInvestigation
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented May 18, 2022

--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/Segv (0.02s)
        testenv.go:460: [/tmp/workdir-host-linux-riscv64-unmatched/tmp/go-build3484739273/testprogcgo.exe Segv] exit status: exit status 2
        crash_cgo_test.go:611: SIGSEGV: segmentation violation
            PC=0x3fe45b3c1a m=3 sigcode=0
            
            goroutine 0 [idle]:
            runtime: g 0: unknown pc 0x3fe45b3c1a
            stack: frame={sp:0x3fb7ffe3e0, fp:0x0} stack=[0x3fb77feb60,0x3fb7ffe760)

greplogs -l -e 'FAIL: TestSegv/Segv .*(?:\n[ ]{8}.*)*unknown pc' --since=2022-01-01
2022-05-16T19:48:35-99d6300/linux-riscv64-unmatched

Compare #50979, #47537.

(attn @golang/riscv64; CC @golang/runtime)

@bcmills bcmills added NeedsInvestigation arch-riscv labels May 18, 2022
@bcmills bcmills added this to the Backlog milestone May 18, 2022
@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 20, 2022

coredump shows the command that failed is testprogcgo CgoExternalThreadSignal crash

I'm confused by the stack shows "net.(*file).getLineFromData+0x58", which is an offset to net.data.cap inside the function.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 23, 2022

@prattmic @cherrymui any advise?

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 23, 2022

The signal PC 0x3fe45b3c1a is weird. It doesn't look like a Go PC. If you have a core dump, can you check what that address is? Does it belong to (say) a memory mapping of a C shared library?

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 24, 2022

No mapping around 0x3fe45b3c1a.

Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
             0x10000           0x22c000   0x21c000        0x0 /tmp/workdir-host-linux-riscv64-unmatched/tmp/go-build3484739273/testprogcgo.exe
            0x22c000           0x22d000     0x1000   0x21b000 /tmp/workdir-host-linux-riscv64-unmatched/tmp/go-build3484739273/testprogcgo.exe
            0x22d000           0x249000    0x1c000   0x21c000 /tmp/workdir-host-linux-riscv64-unmatched/tmp/go-build3484739273/testprogcgo.exe
        0x3fc176b000       0x3fc186e000   0x103000        0x0 /usr/lib/riscv64-linux-gnu/libc-2.33.so
        0x3fc186e000       0x3fc1871000     0x3000   0x102000 /usr/lib/riscv64-linux-gnu/libc-2.33.so
        0x3fc1871000       0x3fc1874000     0x3000   0x105000 /usr/lib/riscv64-linux-gnu/libc-2.33.so
        0x3fc187c000       0x3fc1890000    0x14000        0x0 /usr/lib/riscv64-linux-gnu/libpthread-2.33.so
        0x3fc1890000       0x3fc1891000     0x1000    0x13000 /usr/lib/riscv64-linux-gnu/libpthread-2.33.so
        0x3fc1891000       0x3fc1892000     0x1000    0x14000 /usr/lib/riscv64-linux-gnu/libpthread-2.33.so
        0x3fc18a2000       0x3fc18bd000    0x1b000        0x0 /usr/lib/riscv64-linux-gnu/ld-2.33.so
        0x3fc18be000       0x3fc18bf000     0x1000    0x1b000 /usr/lib/riscv64-linux-gnu/ld-2.33.so
        0x3fc18bf000       0x3fc18c1000     0x2000    0x1c000 /usr/lib/riscv64-linux-gnu/ld-2.33.so

maintenance info sections:
...
[71]     0x3fc18a2000->0x3fc18a3000 at 0x05a8c000: load47a ALLOC LOAD READONLY CODE HAS_CONTENTS
[72]     0x3fc18a3000->0x3fc18bd000 at 0x05a8d000: load47b ALLOC READONLY CODE
[73]     0x3fc18be000->0x3fc18bf000 at 0x05a8d000: load48 ALLOC LOAD READONLY HAS_CONTENTS
[74]     0x3fc18bf000->0x3fc18c1000 at 0x05a8e000: load49 ALLOC LOAD HAS_CONTENTS
[75]     0x3fffea7000->0x3fffec8000 at 0x05a90000: load50 ALLOC LOAD HAS_CONTENTS

@prattmic
Copy link
Member

@prattmic prattmic commented May 24, 2022

Just to double check, that is from the core dump of the exact crash shown above, not a different run, correct? Because those mappings seem similar enough for 0x3fe45b3c1a to be the randomized load address in a different run.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 24, 2022

Yes, It's from coredumpctl and here is the file
core_issue_52963.gz

I've tried to run the test 100 times and no luck.

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 24, 2022

Thanks. Then the address is not a PC at all. It's unclear to me how we get there, or how it gets into the signal context.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 25, 2022

Debug logs, Something is weird.
Failed log shows frame.pc should be 3fb7ffe548, but findfunc got a 3fe45b3c1a.

            0x0000003fb7ffe3e0: <0x0000003fb7ffe480  0x00000000000f4240 <net.(*file).getLineFromData+0x0000000000000058> 
            0x0000003fb7ffe3f0:  0x0000003fb7ffe548  0x000000000010efb6 
            0x0000003fb7ffe400:  0x00000000000f4240 <net.(*file).getLineFromData+0x0000000000000058>  0x0000003fb7ffe560 

This frame.lr != 0 so it should be two defer of frame.sp 0x3fb7ffe560 but unfortunately corefile didn't dump those memory.

IMHO three possible explanations:

  1. compiler error that miscompiled two defers(will look into)
  2. frame stack had changed before frame.pc defers (highly likely)
  3. regabi misuse register (accident clobber? I double it)

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 25, 2022

frame.pc should be 3fb7ffe548

Where did you see that? I didn't see it.

PC=0x3fe45b3c1a m=3 sigcode=0

This line is printed from the signal handler, which indicates the signal context's PC is 0x3fe45b3c1a. For such a bad PC I don't think the traceback code can do anything useful. The question is why we get such a PC in the signal context.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 26, 2022

frame.pc should be 3fb7ffe548

Where did you see that? I didn't see it.

Just my understand on frame stack:

            0x0000003fb7ffe3e0: <0x0000003fb7ffe480  0x00000000000f4240 <net.(*file).getLineFromData+0x0000000000000058> 
            0x0000003fb7ffe3f0:  0x0000003fb7ffe548  0x000000000010efb6 
            0x0000003fb7ffe400:  0x00000000000f4240 <net.(*file).getLineFromData+0x0000000000000058>  0x0000003fb7ffe560 
            0x0000003fb7ffe410:  0x000000000010efb6  0x8458d8a6295c2900 

which aligned with frame struct

// stack traces
type stkframe struct {
	fn       funcInfo   // 0x0000003fb7ffe480  0x00000000000f4240
	pc       uintptr    // 0x0000003fb7ffe548
	continpc uintptr    // 0x000000000010efb6
	lr       uintptr    // 0x00000000000f4240 
	sp       uintptr    // 0x0000003fb7ffe560 
	fp       uintptr    // 0x000000000010efb6  
	varp     uintptr    // 0x8458d8a6295c2900 
        .....
}

The question is why we get such a PC in the signal context.

The code seems fine (why sigcode=0?)
The ucontext with cgo padding didn't generated by cgo, misaligned?

runtime/defs_linux_riscv64.go

type ucontext struct {
	uc_flags     uint64
	uc_link      *ucontext
	uc_stack     stackt
	uc_sigmask   usigset
	uc_x__unused [0]uint8 // according to signal.h it's 0
	uc_pad_cgo_0 [8]byte
	uc_mcontext  sigcontext
}

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 26, 2022

No. That hex dump is not the stkframe structure. They are unrelated. Also, traceback probably doesn't matter here. We start with a bad signal PC in the first place.

why sigcode=0?

sigcode=0 (i.e. _SI_USER) is expected. The test is testing a user-sent SIGSEGV.

Hmmm, if the ucontext fields are wrong, I'd think many things would go wrong, not just this failure.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 26, 2022

You're right about ucontext fields.
I tried setting this padding to 0 and can't even build it.

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 26, 2022

The LR is 0x10f0f2. Could you look at the core and see what that address is? It is not a Go PC either. But it belongs to the program's mapping. PLT stub?

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 27, 2022

The original binary had been deleted so I rebuild with the commit 99d6300
0x10f0f2 -> gcc_linux_riscv64.c:36

err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
pthread_sigmask(SIG_SETMASK, &oset, nil);
if (err != 0) {
fatalf("pthread_create failed: %s", strerror(err));
}

@cherrymui
Copy link
Member

@cherrymui cherrymui commented May 27, 2022

Thanks. Could you disassemble to function and see if it is a PC immediately after a call instruction? It looks like it is. It is calling into libc (or libpthread). Maybe the C library is doing something weird (especially when it is manipulating the signal mask)?

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented May 27, 2022

Here is the binary testprogcgo.zip

0000000000013940 <pthread_sigmask@plt>:                                                      
   13940:       00232e17                auipc   t3,0x232                                     
   13944:       340e3e03                ld      t3,832(t3) # 245c80 <pthread_sigmask@GLIBC_2.
32>                                                                                          
   13948:       000e0367                jalr    t1,t3                                        
   0x3ff7f0cbec <__GI___pthread_sigmask>           auipc   a6,0xa5                                            
   0x3ff7f0cbf0 <__GI___pthread_sigmask+4>         ld      a6,1948(a6)                                        
   0x3ff7f0cbf4 <__GI___pthread_sigmask+8>         ld      a4,0(a6)                                           
   0x3ff7f0cbf8 <__GI___pthread_sigmask+12>        addi    sp,sp,-160                                         
   0x3ff7f0cbfa <__GI___pthread_sigmask+14>        mv      a5,a1                                              
   0x3ff7f0cbfc <__GI___pthread_sigmask+16>        sd      ra,152(sp)                                         
   0x3ff7f0cbfe <__GI___pthread_sigmask+18>        sd      a4,136(sp)                                         
   0x3ff7f0cc00 <__GI___pthread_sigmask+20>        li      a1,0                                               
   0x3ff7f0cc02 <__GI___pthread_sigmask+22>        beqz    a5,0x3ff7f0cc10 <__GI___pthread_sigmask+36>        
   0x3ff7f0cc04 <__GI___pthread_sigmask+24>        ld      a3,0(a5)                                           
   0x3ff7f0cc06 <__GI___pthread_sigmask+26>        li      a4,3                                               
   0x3ff7f0cc08 <__GI___pthread_sigmask+28>        slli    a4,a4,0x1f                                         
   0x3ff7f0cc0a <__GI___pthread_sigmask+30>        and     a4,a4,a3                                           
   0x3ff7f0cc0c <__GI___pthread_sigmask+32>        bnez    a4,0x3ff7f0cc3c <__GI___pthread_sigmask+80>        
   0x3ff7f0cc0e <__GI___pthread_sigmask+34>        mv      a1,a5                                              
   0x3ff7f0cc10 <__GI___pthread_sigmask+36>        li      a7,135                                             
   0x3ff7f0cc14 <__GI___pthread_sigmask+40>        li      a3,8                                               
   0x3ff7f0cc16 <__GI___pthread_sigmask+42>        ecall                                                      
   0x3ff7f0cc1a <__GI___pthread_sigmask+46>        lui     a4,0xfffff                                         
   0x3ff7f0cc1c <__GI___pthread_sigmask+48>        sext.w  a3,a0                                              
   0x3ff7f0cc20 <__GI___pthread_sigmask+52>        mv      a5,a0                                              
   0x3ff7f0cc22 <__GI___pthread_sigmask+54>        li      a0,0                                               
   0x3ff7f0cc24 <__GI___pthread_sigmask+56>        bgeu    a4,a3,0x3ff7f0cc2c <__GI___pthread_sigmask+64>     
   0x3ff7f0cc28 <__GI___pthread_sigmask+60>        negw    a0,a5                                              
   0x3ff7f0cc2c <__GI___pthread_sigmask+64>        ld      a4,136(sp)                                         
   0x3ff7f0cc2e <__GI___pthread_sigmask+66>        ld      a5,0(a6)                                           
   0x3ff7f0cc32 <__GI___pthread_sigmask+70>        bne     a4,a5,0x3ff7f0cc7c <__GI___pthread_sigmask+144>  
   0x3ff7f0cc36 <__GI___pthread_sigmask+74>        ld      ra,152(sp)                                      
   0x3ff7f0cc38 <__GI___pthread_sigmask+76>        addi    sp,sp,160                                       
   0x3ff7f0cc3a <__GI___pthread_sigmask+78>        ret                                                     
   0x3ff7f0cc3c <__GI___pthread_sigmask+80>        addi    a1,sp,8                                         
   0x3ff7f0cc3e <__GI___pthread_sigmask+82>        mv      a4,a1                                           
   0x3ff7f0cc40 <__GI___pthread_sigmask+84>        addi    t5,a5,128                                       
   0x3ff7f0cc44 <__GI___pthread_sigmask+88>        ld      t4,0(a5)                                        
   0x3ff7f0cc48 <__GI___pthread_sigmask+92>        ld      t3,8(a5)                                        
   0x3ff7f0cc4c <__GI___pthread_sigmask+96>        ld      t1,16(a5)                                       
   0x3ff7f0cc50 <__GI___pthread_sigmask+100>       ld      a7,24(a5)                                       
   0x3ff7f0cc54 <__GI___pthread_sigmask+104>       sd      t4,0(a4)                                        
   0x3ff7f0cc58 <__GI___pthread_sigmask+108>       sd      t3,8(a4)                                        
   0x3ff7f0cc5c <__GI___pthread_sigmask+112>       sd      t1,16(a4)                                       
   0x3ff7f0cc60 <__GI___pthread_sigmask+116>       sd      a7,24(a4)                                       
   0x3ff7f0cc64 <__GI___pthread_sigmask+120>       addi    a5,a5,32                                        
   0x3ff7f0cc68 <__GI___pthread_sigmask+124>       addi    a4,a4,32                                        
   0x3ff7f0cc6c <__GI___pthread_sigmask+128>       bne     a5,t5,0x3ff7f0cc44 <__GI___pthread_sigmask+88>  
   0x3ff7f0cc70 <__GI___pthread_sigmask+132>       li      a5,-3                                           
   0x3ff7f0cc72 <__GI___pthread_sigmask+134>       slli    a5,a5,0x1f                                      
   0x3ff7f0cc74 <__GI___pthread_sigmask+136>       addi    a5,a5,-1                                        
   0x3ff7f0cc76 <__GI___pthread_sigmask+138>       and     a3,a3,a5                                        
   0x3ff7f0cc78 <__GI___pthread_sigmask+140>       sd      a3,8(sp)                                        
   0x3ff7f0cc7a <__GI___pthread_sigmask+142>       j       0x3ff7f0cc10 <__GI___pthread_sigmask+36>        
   0x3ff7f0cc7c <__GI___pthread_sigmask+144>       jal     ra,0x3ff7f5bf40 <__stack_chk_fail>              

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

4 participants