Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is an inexplicable error when running convnet_cuda, and I have no clue to solve it. Can you provide some ideas? #537

Open
pengbotter opened this issue Nov 2, 2022 · 3 comments

Comments

@pengbotter
Copy link

2022/11/02 16:14:34 Batches 600
Epoch 0 0 / 600 [------------------------------------------------------] 0.00%Exception 0xc0000006 0x0 0xe05ba6400 0x13f4632
PC=0x13f4632

gorgonia.org/gorgonia.CloneValue({0x17161e8, 0xe05ba6400})
E:/code/selfCode/gorgonia/values_utils.go:94 +0xf2 fp=0xc00052cff8 sp=0xc00052cd78 pc=0x13f4632
gorgonia.org/gorgonia.constantDV({0x17161e8, 0xe05ba6400})
E:/code/selfCode/gorgonia/dual.go:123 +0xd4 fp=0xc00052d0c8 sp=0xc00052cff8 pc=0x13640f4
gorgonia.org/gorgonia.dvUnit({0x17161e8, 0xe05ba6400})
E:/code/selfCode/gorgonia/dual.go:160 +0xe5 fp=0xc00052d150 sp=0xc00052d0c8 pc=0x1364365
gorgonia.org/gorgonia.(*execOp).exec(0xc000328730, 0xc000186000)
E:/code/selfCode/gorgonia/vm_tape_cuda.go:164 +0x18d6 fp=0xc00052d9d0 sp=0xc00052d150 pc=0x14021b6
gorgonia.org/gorgonia.(*tapeMachine).runall(0xc000186000, 0xc00031a120, 0xc00031a180)
E:/code/selfCode/gorgonia/vm_tape.go:262 +0x28a fp=0xc00052dfa0 sp=0xc00052d9d0 pc=0x13f9b0a
gorgonia.org/gorgonia.(*tapeMachine).RunAll.func2()
E:/code/selfCode/gorgonia/vm_tape.go:223 +0x47 fp=0xc00052dfe0 sp=0xc00052dfa0 pc=0x13f97e7
runtime.goexit()
D:/software/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00052dfe8 sp=0xc00052dfe0 pc=0x610a41
created by gorgonia.org/gorgonia.(*tapeMachine).RunAll
E:/code/selfCode/gorgonia/vm_tape.go:223 +0x225

goroutine 1 [select, locked to thread]:
gorgonia.org/gorgonia.(*tapeMachine).RunAll(0xc000186000)
E:/code/selfCode/gorgonia/vm_tape.go:225 +0x34d
main.main()
E:/code/selfCode/gorgonia/examples/convnet_cuda/main.go:296 +0x1d74

goroutine 20 [syscall]:
os/signal.signal_recv()
D:/software/go/src/runtime/sigqueue.go:151 +0x2f
os/signal.loop()
D:/software/go/src/os/signal/signal_unix.go:23 +0x1d
created by os/signal.Notify.func1.1
D:/software/go/src/os/signal/signal.go:151 +0x2e

goroutine 21 [IO wait]:
internal/poll.runtime_pollWait(0xc00032a000?, 0x72)
D:/software/go/src/runtime/netpoll.go:302 +0x45
internal/poll.(*pollDesc).wait(0xc0003201b8, 0x72, 0x0)
D:/software/go/src/internal/poll/fd_poll_runtime.go:83 +0x88
internal/poll.execIO(0xc000320018, 0xc0000955a0)
D:/software/go/src/internal/poll/fd_windows.go:175 +0x2d0
internal/poll.(*FD).acceptOne(0xc000320000, 0x304, {0xc00032a000, 0x2, 0x2}, 0xc000320018)
D:/software/go/src/internal/poll/fd_windows.go:942 +0xfd
internal/poll.(*FD).Accept(0xc000320000, 0xc0000959c8)
D:/software/go/src/internal/poll/fd_windows.go:976 +0x43f
net.(*netFD).accept(0xc000320000)
D:/software/go/src/net/fd_windows.go:139 +0xc5
net.(*TCPListener).accept(0xc000306090)
D:/software/go/src/net/tcpsock_posix.go:139 +0x55
net.(*TCPListener).Accept(0xc000306090)
D:/software/go/src/net/tcpsock.go:288 +0x67
net/http.(*Server).Serve(0xc000106000, {0x1713638, 0xc000306090})
D:/software/go/src/net/http/server.go:3039 +0x4c8
net/http.(*Server).ListenAndServe(0xc000106000)
D:/software/go/src/net/http/server.go:2968 +0x165
net/http.ListenAndServe({0x162f82a, 0xe}, {0x0, 0x0})
D:/software/go/src/net/http/server.go:3222 +0xf6
main.main.func1()
E:/code/selfCode/gorgonia/examples/convnet_cuda/main.go:186 +0x2d
created by main.main
E:/code/selfCode/gorgonia/examples/convnet_cuda/main.go:185 +0x1b4

goroutine 25 [select, locked to thread]:
gorgonia.org/gorgonia/cuda.(*Engine).Run(0xc0000001e0)
E:/code/selfCode/gorgonia/cuda/external.go:248 +0x2d1
created by gorgonia.org/gorgonia/cuda.(*Engine).doInit
E:/code/selfCode/gorgonia/cuda/external.go:168 +0x128c

goroutine 26 [chan receive]:
gorgonia.org/gorgonia.(*ExternMetadata).collectWork(0xc000186000, 0x0, 0xc00008e9c0)
E:/code/selfCode/gorgonia/cuda.go:283 +0x39
created by gorgonia.org/gorgonia.(*ExternMetadata).init
E:/code/selfCode/gorgonia/cuda.go:256 +0x678

goroutine 43 [select]:
main.cleanup(0xc0000e0660, 0xc0000dd180, 0x0)
E:/code/selfCode/gorgonia/examples/convnet_cuda/main.go:324 +0xb3
created by main.main
E:/code/selfCode/gorgonia/examples/convnet_cuda/main.go:258 +0x1334

goroutine 44 [select]:
gopkg.in/cheggaaa/pb%2ev1.(*ProgressBar).refresher(0xc000854000)
C:/Users/Administrator/go/pkg/mod/gopkg.in/cheggaaa/pb.v1@v1.0.28/pb.go:493 +0xbd
created by gopkg.in/cheggaaa/pb%2ev1.(*ProgressBar).Start
C:/Users/Administrator/go/pkg/mod/gopkg.in/cheggaaa/pb.v1@v1.0.28/pb.go:124 +0x14c
rax 0xc0001b4110
rbx 0x26d3eeab3e0
rcx 0xe05ba6400
rdi 0x0
rsi 0x0
rbp 0xc00052cfe8
rsp 0xc00052cd78
r8 0xc0001b4110
r9 0x1
r10 0x0
r11 0x0
r12 0xc00052cdf8
r13 0x0
r14 0xc0003196c0
r15 0x20
rip 0x13f4632
rflags 0x10202
cs 0x33
fs 0x53
gs 0x2b

Debugger finished with the exit code 0

That's the error message.
There is an inexplicable error when running convnet_cuda, and I have no clue to solve it. Can you provide some ideas?

@chewxy
Copy link
Member

chewxy commented Nov 3, 2022

This looks like some sort of driver/compute inconsistency (the dump is a clue)

What versions are you using

@pengbotter
Copy link
Author

这看起来像是某种驱动程序/计算不一致(转储是一个线索)

你用的是什么版本

CUDA 11.6
windows10
go 1.19.3

Now there is no error, but the Epoch will always be stuck at 0%

@pengbotter
Copy link
Author

现在没有错误了,但是一直卡在0%
2022/11/11 18:10:35 m.out.Shape (100, 10), y.Shape (100, 10)
2022/11/11 18:10:37 Batches 600
Epoch 0 0 / 600 [------------------------------------------------------] 0.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants