Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in ImageNet.LoadImageAndResize224 #24

Closed
Claus1 opened this issue Apr 9, 2021 · 25 comments · Fixed by #29
Closed

Memory leak in ImageNet.LoadImageAndResize224 #24

Claus1 opened this issue Apr 9, 2021 · 25 comments · Fixed by #29
Labels
bug Something isn't working

Comments

@Claus1
Copy link

Claus1 commented Apr 9, 2021

The code

t, err := imageNet.LoadImageAndResize224(fn)
if err != nil {
     log.Fatal(err)
}
t.MustDrop()

causes memory leak about 1-1.5 MB on every call.

@sugarme
Copy link
Owner

sugarme commented Apr 10, 2021

Hi @Claus1 ,

Thank you for letting me know. I have reworked to fix the issue and tested on both CPU and GPU. On GPU, memory stayed absolutely still. On CPU, it was volatile up and down in a range of dozen MB, so I hope it is fixed.

Please test it on branch image
You can do it by executing go get -u github.com/sugarme/gotch@image. If your machine is set up with GPU, that should be enough. If CPU, you need to execute setup.sh as in the gotch installation instruction.

Please let me know how you go with the fix. If everything okay, I will merge for a new version tag.

@sugarme sugarme added the bug Something isn't working label Apr 10, 2021
@Claus1
Copy link
Author

Claus1 commented Apr 10, 2021

Hi @sugarme ! Can not compile directly @image on my Radeon computer:

$ go get -u github.com/sugarme/gotch@imagego: github.com/sugarme/gotch image => v0.3.9-0.20210410094112-93dc63424aef
# github.com/sugarme/gotch/libtch
/usr/bin/ld: cannot find -lcuda
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcudnn
/usr/bin/ld: cannot find -lcaffe2_nvrtc
/usr/bin/ld: cannot find -lnvrtc-builtins
/usr/bin/ld: cannot find -lnvrtc
/usr/bin/ld: cannot find -lnvToolsExt
/usr/bin/ld: cannot find -lc10_cuda
/usr/bin/ld: cannot find -ltorch_cuda
collect2: error: ld returned 1 exit status

So I run setup.sh from the created by go folder go/pkg/mod/github.com/sugarme/gotch@v0.3.9-0.20210410094112-93dc63424aef which has updated files. After that I run my program and got the same leak. Possibly Go uses the previous library.

@Claus1
Copy link
Author

Claus1 commented Apr 10, 2021

Yep, It used old code. After 800 calls the topic code
an error appears:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1c1 pc=0x1c1]

runtime stack:
runtime.throw(0x12fa241, 0x2a)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:726 +0x269

goroutine 1 [syscall]:
runtime.cgocall(0x101fee0, 0xc0002b9ab8, 0x12f8d7f)
	/usr/local/go/src/runtime/cgocall.go:133 +0x5b fp=0xc0002b9a70 sp=0xc0002b9a38 pc=0x461e7b
github.com/sugarme/gotch/libtch._Cfunc_atg_totype(0x7f3eec2fc660, 0x7f3eec2fc680, 0x6)
	_cgo_gotypes.go:22434 +0x45 fp=0xc0002b9ab8 sp=0xc0002b9a70 pc=0xe6b905
github.com/sugarme/gotch/libtch.AtgTotype.func1(0x7f3eec2fc660, 0x7f3eec2fc680, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/libtch/c-generated.go:6146 +0xac fp=0xc0002b9b10 sp=0xc0002b9ab8 pc=0xee030c
github.com/sugarme/gotch/libtch.AtgTotype(0x7f3eec2fc660, 0x7f3eec2fc680, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/libtch/c-generated.go:6146 +0x4c fp=0xc0002b9b60 sp=0xc0002b9b10 pc=0xe958cc
github.com/sugarme/gotch/tensor.(*Tensor).Totype(0xc0002dad88, 0x13be3c0, 0x1164a40, 0x0, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/tensor/tensor-generated.go:17916 +0x106 fp=0xc0002b9c00 sp=0xc0002b9b60 pc=0xfedd46
github.com/sugarme/gotch/vision.(*ImageNet).Normalize(0xc00000f700, 0xc0002dad88, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/vision/imagenet.go:36 +0xfb fp=0xc0002b9d08 sp=0xc0002b9c00 pc=0x1006fdb
github.com/sugarme/gotch/vision.(*ImageNet).LoadImageAndResize224(0xc00000f700, 0xc0002d0cd0, 0x46, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/vision/imagenet.go:149 +0x24d fp=0xc0002b9dd8 sp=0xc0002b9d08 pc=0x10083ed
main.ProbsImage(0xc0002d0cd0, 0x46, 0xc000010218, 0x0, 0x0, 0x0)
	/home/george/Projects/unigui-go/examples/engine/main.go:220 +0xae fp=0xc0002b9e90 sp=0xc0002b9dd8 pc=0x100c28e
main.main()
	/home/george/Projects/unigui-go/examples/engine/main.go:200 +0x20d fp=0xc0002b9f88 sp=0xc0002b9e90 pc=0x100c10d
runtime.main()
	/usr/local/go/src/runtime/proc.go:204 +0x1cf fp=0xc0002b9fe0 sp=0xc0002b9f88 pc=0x49df6f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0002b9fe8 sp=0xc0002b9fe0 pc=0x4d4421

goroutine 6 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00007e7d0)
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:154 +0x19e
created by go.opencensus.io/stats/view.init.0
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:32 +0x5b

@sugarme
Copy link
Owner

sugarme commented Apr 11, 2021

Hi @Claus1 ,

Executing setup.sh seems to mess up your system. Sorry about that. In fact, I tried to make it simple for people to use gotch by creating a single shell script setup.sh, however, it causes trouble when updating gotch version.

In the next version, there will be a seperate setup-libtorch.sh and setup-gotch.sh files.

  • setup-libtorch.sh: Pytorch C++ APIs (Libtorch) will be install inside system at /usr/local/lib and leave users to update paths in their machine (e.g. .bashrc file).
  • setup-gotch.sh: to install gotch for either CPU or GPU

In your machine,

  • you can go to $GOPATH/pkg/mod/github.com/sugarme and delete all previous versions of gotch.
  • Delete unused paths for gotch in $HOME/.bashrc file.
  • Then, follow this guides to install a fresh version of gotch. I have tag a new version v0.3.9-rc1 for you to test. When running setup-gotch.sh there might have some logs
# github.com/sugarme/gotch/libtch
/usr/bin/ld: cannot find -lcuda
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcudnn
/usr/bin/ld: cannot find -lcaffe2_nvrtc
/usr/bin/ld: cannot find -lnvrtc-builtins
/usr/bin/ld: cannot find -lnvrtc
/usr/bin/ld: cannot find -lnvToolsExt
/usr/bin/ld: cannot find -lc10_cuda
/usr/bin/ld: cannot find -ltorch_cuda
collect2: error: ld returned 1 exit status

Just ignore them. Make sure you add/update these lines to your .bashrc (CPU version):

    export GOTCH_LIBTORCH="/usr/local/lib/libtorch"
    export LIBRARY_PATH="$LIBRARY_PATH:$GOTCH_LIBTORCH/lib"
    export CPATH="$CPATH:$GOTCH_LIBTORCH/lib:$GOTCH_LIBTORCH/include:$GOTCH_LIBTORCH/include/torch/csrc/api/include"
    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$GOTCH_LIBTORCH/lib"

Hope that helps.

@Claus1
Copy link
Author

Claus1 commented Apr 11, 2021

I did it in another way. The last error I got from @image branch.

@sugarme
Copy link
Owner

sugarme commented Apr 11, 2021

@Claus1 ,

Last error:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1c1 pc=0x1c1]

runtime stack:
runtime.throw(0x12fa241, 0x2a)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:726 +0x269

goroutine 1 [syscall]:
runtime.cgocall(0x101fee0, 0xc0002b9ab8, 0x12f8d7f)
	/usr/local/go/src/runtime/cgocall.go:133 +0x5b fp=0xc0002b9a70 sp=0xc0002b9a38 pc=0x461e7b
github.com/sugarme/gotch/libtch._Cfunc_atg_totype(0x7f3eec2fc660, 0x7f3eec2fc680, 0x6)
	_cgo_gotypes.go:22434 +0x45 fp=0xc0002b9ab8 sp=0xc0002b9a70 pc=0xe6b905
github.com/sugarme/gotch/libtch.AtgTotype.func1(0x7f3eec2fc660, 0x7f3eec2fc680, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/libtch/c-generated.go:6146 +0xac fp=0xc0002b9b10 sp=0xc0002b9ab8 pc=0xee030c
github.com/sugarme/gotch/libtch.AtgTotype(0x7f3eec2fc660, 0x7f3eec2fc680, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/libtch/c-generated.go:6146 +0x4c fp=0xc0002b9b60 sp=0xc0002b9b10 pc=0xe958cc
github.com/sugarme/gotch/tensor.(*Tensor).Totype(0xc0002dad88, 0x13be3c0, 0x1164a40, 0x0, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/tensor/tensor-generated.go:17916 +0x106 fp=0xc0002b9c00 sp=0xc0002b9b60 pc=0xfedd46
github.com/sugarme/gotch/vision.(*ImageNet).Normalize(0xc00000f700, 0xc0002dad88, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/vision/imagenet.go:36 +0xfb fp=0xc0002b9d08 sp=0xc0002b9c00 pc=0x1006fdb
github.com/sugarme/gotch/vision.(*ImageNet).LoadImageAndResize224(0xc00000f700, 0xc0002d0cd0, 0x46, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.8/vision/imagenet.go:149 +0x24d fp=0xc0002b9dd8 sp=0xc0002b9d08 pc=0x10083ed
main.ProbsImage(0xc0002d0cd0, 0x46, 0xc000010218, 0x0, 0x0, 0x0)
	/home/george/Projects/unigui-go/examples/engine/main.go:220 +0xae fp=0xc0002b9e90 sp=0xc0002b9dd8 pc=0x100c28e
main.main()
	/home/george/Projects/unigui-go/examples/engine/main.go:200 +0x20d fp=0xc0002b9f88 sp=0xc0002b9e90 pc=0x100c10d
runtime.main()
	/usr/local/go/src/runtime/proc.go:204 +0x1cf fp=0xc0002b9fe0 sp=0xc0002b9f88 pc=0x49df6f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0002b9fe8 sp=0xc0002b9fe0 pc=0x4d4421

goroutine 6 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00007e7d0)
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:154 +0x19e
created by go.opencensus.io/stats/view.init.0
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:32 +0x5b

Showed that you haven't installed new version and your program was still using v0.3.8 or may be mixed up. I think you should go

  • go clean -cache
  • install a fresh gotch@v0.3.9-rc1 as mentioned above (go get -u github.com/sugarme/gotch@v0.3.9-rc1 won't work as it is default to GPU) so that gotch can be linked properly to libtorch C++.
  • Then go to your own program and go get -u github.com/sugarme/gotch@v0.3.9-rc1

I have tested on CPU and GPU and fresh Google Colab machines and all worked well.

@Claus1
Copy link
Author

Claus1 commented Apr 11, 2021

@sugarme No, I just renamed image to 0.3.8 in pkg for using its sources. I checked it under debugger. It was @image sources. So the error exist if calls more 800. I checked with different picture sets, the same ~800 pictures and crash.

@sugarme
Copy link
Owner

sugarme commented Apr 11, 2021

@Claus1

To be able to reproduce the issue, please do a fresh installation of gotch v0.3.9-rc1 and go clean && go clean -cache . And please provide error message if it occurs so that we can trace up where the issue is.
FYI, I ran 2k of images in various machines without any problems. Thanks.

@Claus1
Copy link
Author

Claus1 commented Apr 13, 2021

@surgame, I followed the instructions and got an error on Step2:

engine$ export CUDA_VER=cpu && export GOTCH_VER=v0.3.9-rc1 && bash setup-gotch.sh
GOPATH:'/home/george/go'
GOTCH_VERSION: 'v0.3.9-rc1'
CUDA_VERSION: 'cpu'
go: creating new go.mod: module github.com/sugarme/gotch-test
mv: cannot stat '/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/dummy_cuda_dependency.cpp': No such file or directory
mv: cannot stat '/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/fake_cuda_dependency.cpp.cpu': No such file or directory

@sugarme
Copy link
Owner

sugarme commented Apr 13, 2021

@Claus1 ,

I thought you have aready had v0.3.9-rc1 in your system (maybe because of your hacky way of changing previous one to such name?).
Anyway, please, do

  • sudo rm -rf /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1 then
  • export CUDA_VER=cpu && export GOTCH_VER=v0.3.9-rc1 && bash setup-gotch.sh

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme , I removed 0.3.9-rc1 repeat all steps even removed and reinstalled libtorch and got the same error.

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

If I make the omitted files manually I get such error:

export CUDA_VER=cpu && export GOTCH_VER=v0.3.9-rc1 && bash setup-gotch.sh
GOPATH:'/home/george/go'
GOTCH_VERSION: 'v0.3.9-rc1'
CUDA_VERSION: 'cpu'
go: creating new go.mod: module github.com/sugarme/gotch-test
# github.com/sugarme/gotch/libtch
/tmp/go-build172016193/b033/_x010.o: In function `dummy_cuda_dependency':
/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/fake_cuda_dependency.cpp:5: multiple definition of `dummy_cuda_dependency'
/tmp/go-build172016193/b033/_x009.o:/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/dummy_cuda_dependency.cpp:11: first defined here
/tmp/go-build172016193/b033/_x009.o: In function `dummy_cuda_dependency':
/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/dummy_cuda_dependency.cpp:11: undefined reference to `at::cuda::warp_size()'
collect2: error: ld returned 1 exit status
mv: cannot stat '/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/fake_cuda_dependency.cpp.cpu': No such file or directory

Install script removes my addition files and after reports about omitting.

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

If you look at the script setup-gotch.sh you can see that the 2 lines:

sudo mv $GOTCH_PATH/libtch/dummy_cuda_dependency.cpp $GOTCH_PATH/libtch/dummy_cuda_dependency.cpp.gpu
sudo mv $GOTCH_PATH/libtch/fake_cuda_dependency.cpp.cpu $GOTCH_PATH/libtch/fake_cuda_dependency.cpp

Just two swap default GPU to CPU. If you look at the gotch v0.3.9-rc1 you will see the two files there: dummy_cuda_dependency.cpp and fake_cuda_dependency.cpp.cpu. Hence, if you check and see file with name fake_cuda_dependency.cpp (this file for CPU) and there's no dummy_cuda_dependency.cpp then it should be ok for gotch to be compiled.

I saw from your provided log that your machine user engine$ use to setup to /home/george/go GOPATH and not sure if it may be the issue? (unlikely though because other command lines were okay).

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

Please delete dummy_cuda_dependency.cpp file (such file is for GPU) and keep file fake_cuda_dependency.cpp and it should be okay.

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme How I can delete during installing setup-gotch? I removed the package and libtorch before installing, work from go directory as you advised and get this state with the error
libstate

  inflating: /usr/local/lib/libtorch/lib/libqnnpack.a  
  inflating: /usr/local/lib/libtorch/lib/libjitbackend_test.so  
  inflating: /usr/local/lib/libtorch/lib/libcaffe2_protos.a  
  inflating: /usr/local/lib/libtorch/lib/libtorchbind_test.so  
 extracting: /usr/local/lib/libtorch/build-version  
george@george-MS-7B84:~/go$ go clean && go clean -cache
george@george-MS-7B84:~/go$ export CUDA_VER=cpu && export GOTCH_VER=v0.3.9-rc1 && bash /home/george/Downloads/temp/gotch/setup-gotch.sh
GOPATH:'/home/george/go'
GOTCH_VERSION: 'v0.3.9-rc1'
CUDA_VERSION: 'cpu'
go: creating new go.mod: module github.com/sugarme/gotch-test
mv: cannot stat '/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/dummy_cuda_dependency.cpp': No such file or directory
mv: cannot stat '/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/fake_cuda_dependency.cpp.cpu': No such file or directory
george@george-MS-7B84:~/go$ 

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

Those file names look okay for me. Can you just go to your project and update to v0.3.9-rc1 by executing go get -u github.com/sugarme/gotch@v0.3.9-rc1 and run to see whether gotch can compile or not?

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme

george@george-MS-7B84:~/go$ go clean && go clean -cache
george@george-MS-7B84:~/go$ go get -u github.com/sugarme/gotch@v0.3.9-rc1
go: cannot use path@version syntax in GOPATH mode
george@george-MS-7B84:~/go$ export GO111MODULE=on
george@george-MS-7B84:~/go$ go get -u github.com/sugarme/gotch@v0.3.9-rc1
# github.com/sugarme/gotch/libtch
/usr/bin/ld: cannot find -lcuda
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcudnn
/usr/bin/ld: cannot find -lcaffe2_nvrtc
/usr/bin/ld: cannot find -lnvrtc-builtins
/usr/bin/ld: cannot find -lnvrtc
/usr/bin/ld: cannot find -lnvToolsExt
/usr/bin/ld: cannot find -lc10_cuda
/usr/bin/ld: cannot find -ltorch_cuda
collect2: error: ld returned 1 exit status

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

Please do:

  • sudo rm /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go
  • sudo mv /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go.cpu /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme, I removed all gotch export variables reinstall libtorch and gotch again from scratch.
Got the same error. After

sudo rm /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go
sudo mv /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go.cpu /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go

I switched to vs code and try to compile code (what else I can do).
In Vs debug panel I got this

# github.com/sugarme/gotch/libtch
torch_api.cpp:1:9: fatal error: torch/csrc/autograd/engine.h: No such file or directory
 #include<torch/csrc/autograd/engine.h>
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
exit status 2
Process exiting with code: 1

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

Your libtorch not in the path. Can you do printenv and check? Also, check whether libtorch is installed in the right location: /usr/local/lib?

export GOTCH_LIBTORCH="/usr/local/lib/libtorch"
export LIBRARY_PATH="$LIBRARY_PATH:$GOTCH_LIBTORCH/lib"
export CPATH="$CPATH:$GOTCH_LIBTORCH/lib:$GOTCH_LIBTORCH/include:$GOTCH_LIBTORCH/include/torch/csrc/api/include"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$GOTCH_LIBTORCH/lib"

If not, put those lines to .bashrc or whatever you setup in your machine (I guess you run on Windows) or just paste those lines as command lines to set environment for current shell.

then do sudo ldconfig to link libraries.

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme Yes, GOTCH_LIBTORCH pointed to libtorch inside gotch pkg. Fixed.
Gotch removed and installed again. go cleaned. The error:

# github.com/sugarme/gotch/libtch
/usr/bin/ld: cannot find -lcuda
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcudnn
/usr/bin/ld: cannot find -lcaffe2_nvrtc
/usr/bin/ld: cannot find -lnvrtc-builtins
/usr/bin/ld: cannot find -lnvrtc
/usr/bin/ld: cannot find -lnvToolsExt
/usr/bin/ld: cannot find -lc10_cuda
/usr/bin/ld: cannot find -ltorch_cuda
collect2: error: ld returned 1 exit status
exit status 2
Process exiting with code: 1

After I do

sudo rm /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go
sudo mv /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go.cpu /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go
sudo ldconfig

And open Vs and see the same error in Problems.

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

The error is because of lib.go is configured for CUDA. Can you just manually change content of file /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/lib.go with the following code:

package libtch

// #cgo CFLAGS: -I${SRCDIR} -O3 -Wall -Wno-unused-variable -Wno-deprecated-declarations -Wno-c++11-narrowing -g -Wno-sign-compare -Wno-unused-function
// #cgo CFLAGS: -I/usr/local/include
// #cgo CFLAGS: -D_GLIBCXX_USE_CXX11_ABI=1
// #cgo LDFLAGS: -lstdc++ -ltorch -lc10 -ltorch_cpu -L/lib64
// #cgo CXXFLAGS: -std=c++17 -I${SRCDIR} -g -O3
// #cgo CFLAGS: -I${SRCDIR}/libtorch/lib -I${SRCDIR}/libtorch/include -I${SRCDIR}/libtorch/include/torch/csrc/api/include -I${SRCDIR}/libtorch/include/torch/csrc
// #cgo LDFLAGS: -L${SRCDIR}/libtorch/lib
// #cgo CXXFLAGS: -I${SRCDIR}/libtorch/lib -I${SRCDIR}/libtorch/include -I${SRCDIR}/libtorch/include/torch/csrc/api/include -I${SRCDIR}/libtorch/include/torch/csrc
import "C"

then sudo ldconfig

The current setup seems to make it very hard for installation/updating and I am preparing a different way to improve it.

@sugarme
Copy link
Owner

sugarme commented Apr 14, 2021

@Claus1 ,

If you still struggle with getting gotch up and running, please try this new shell script:

  1. sudo rm -rf /home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1
  2. wget https://gist.githubusercontent.com/sugarme/53805b2d030ee9023e56ec366dff7fbe/raw/04d501aa9490136676da5a80fbf84f618b227980/setup-gotch.sh && sudo chmod +x setup-gotch.sh
  3. export CUDA_VER=cpu && export GOTCH_VER=v0.3.9-rc1 && bash setup-gotch.sh (make sure you delete the old setup-gotch.sh)

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

@sugarme Ignore pls the previous message. Now it compiles ok with 0.3.9 cpu as desired.
But I found the runtime in LoadImageAndResize224 error after ~800 calls :


fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x105 pc=0x7f4c5aa5c538]

runtime stack:
runtime.throw(0x130a821, 0x2a)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:726 +0x269

goroutine 1 [syscall]:
runtime.cgocall(0x102e1f5, 0xc000545ab8, 0x130935f)
	/usr/local/go/src/runtime/cgocall.go:133 +0x5b fp=0xc000545a70 sp=0xc000545a38 pc=0x461e7b
github.com/sugarme/gotch/libtch._Cfunc_atg_totype(0x7f4c005fc090, 0x7f4c002fc5e0, 0x6)
	_cgo_gotypes.go:22434 +0x45 fp=0xc000545ab8 sp=0xc000545a70 pc=0xe6b905
github.com/sugarme/gotch/libtch.AtgTotype.func1(0x7f4c005fc090, 0x7f4c002fc5e0, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/c-generated.go:6146 +0xac fp=0xc000545b10 sp=0xc000545ab8 pc=0xee030c
github.com/sugarme/gotch/libtch.AtgTotype(0x7f4c005fc090, 0x7f4c002fc5e0, 0xc000000006)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/libtch/c-generated.go:6146 +0x4c fp=0xc000545b60 sp=0xc000545b10 pc=0xe958cc
github.com/sugarme/gotch/tensor.(*Tensor).Totype(0xc0005999b8, 0x13ce9a0, 0x1175020, 0x0, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/tensor/tensor-generated.go:17916 +0x106 fp=0xc000545c00 sp=0xc000545b60 pc=0xfedd46
github.com/sugarme/gotch/vision.(*ImageNet).Normalize(0xc00000f680, 0xc0005999b8, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/vision/imagenet.go:36 +0xfb fp=0xc000545d08 sp=0xc000545c00 pc=0x1006fdb
github.com/sugarme/gotch/vision.(*ImageNet).LoadImageAndResize224(0xc00000f680, 0xc0005930e0, 0x46, 0x0, 0x0, 0x0)
	/home/george/go/pkg/mod/github.com/sugarme/gotch@v0.3.9-rc1/vision/imagenet.go:149 +0x24d fp=0xc000545dd8 sp=0xc000545d08 pc=0x10083ed
main.ProbsImage(0xc0005930e0, 0x46, 0xc000010208, 0x0, 0x0, 0x0)
	/home/george/Projects/unigui-go/examples/engine/main.go:227 +0x85 fp=0xc000545e90 sp=0xc000545dd8 pc=0x100c2c5
main.main()
	/home/george/Projects/unigui-go/examples/engine/main.go:206 +0x28f fp=0xc000545f88 sp=0xc000545e90 pc=0x100c18f
runtime.main()
	/usr/local/go/src/runtime/proc.go:204 +0x1cf fp=0xc000545fe0 sp=0xc000545f88 pc=0x49df6f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000545fe8 sp=0xc000545fe0 pc=0x4d4421

goroutine 6 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00007e7d0)
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:154 +0x19e
created by go.opencensus.io/stats/view.init.0
	/home/george/go/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:32 +0x5b

I try to figure out a reason.

@Claus1
Copy link
Author

Claus1 commented Apr 14, 2021

My jpg file causes this error although it looks normal. So the last setup-gotch.sh works ok. It fixes the building process. Memory leak fixed. Thanks a lot!

@Claus1 Claus1 closed this as completed Apr 14, 2021
@sugarme sugarme mentioned this issue May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants