-
-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterator.Chan() considered harmful #135
Comments
Note: the solution to this was to not use |
I'm getting a few compilation errors when trying to run it:
|
Yeah, that's fine.. the |
Is this intentional that With small values the second data contains the actual value instead of a pointer to the value (https://research.swtch.com/interfaces see Memory Optimization). Effectively Header.Ptr is not an |
That's not true since Go 1.4 right (because I recall being quite upset I can no longer stuff small values into interfaces)? Also I don't think |
Preliminary tests seem to agree with your information. There seems to be some code still about it https://github.com/golang/go/blob/964639cc338db650ccadeafb7424bc8ebb2c0f6c/src/runtime/typekind.go#L42... but I wasn't able to construct one nor able to find in the compiler in what cases it might be constructed. |
First interesting datapoint, changing |
With
This script should help find your own breaking point:
|
|
if Both these segfaults are interesting - they relate to strings and pointers from strings. Will read more |
So I ran this
There were no segfaults. |
|
Ran this on a macbook with 1.8.3 ... can't find the poison address either. What the hell is happening? |
I wasn't able to crash it with 1.9rc2 on Windows either, but I was able to get a few different values:
The only thing I can deduce is that something is corrupting the memory. I was trying to create boundaries to check content overwriting or misuse https://github.com/egonelbre/gorgonia/tree/debugcheck... but there's probably some unsafe use I missed. Effectively, write and check Before/After...
But, yeah... I give up for now. The only thing I can think of is to start replacing all unsafe parts and properly containing them... On a sidenote - this looks like a bug:
|
Thanks for that bug finding. :) I'll go dick around with your example and see what happens |
This issue was moved to gorgonia/tensor#5 |
sketch space for describing how to create a
chan int
of negative length, and how to reproduce itBackground/Context of the Issue
Gorgonia is a library for representing and executing mathematical equations, and performing automatic differentiation. It's like Tensorflow and PyTorch for Go. It's currently undergoing some major internal refactor (that will not affect the public APIs much)
I was improving the backend
tensor
package by splitting up the data structure into a data structure + pluggable execution engine, instead of having built in methods (see also #128). The reasons are so that it's easier to change out execution backends (CPU, GPU... even a network CPU (actual experiment I did was to run a small neural network on a Raspberry Pi and all computation is offshored to my workstation, and vice versa, which turned out to be a supremely bad idea)).Another reason was due to the fact that I wanted to do some experiments at my work which use algorithms that involve sparse tensors (see also #127) for matrix factorization tasks.
Lastly, I wanted to clean up the generics support of the
tensor
package. The current master branch of thetensor
package had a lot of code to support arbitrary tensor types. With the split of execution engines and data structure, more of this support could be offloaded to the execution engine instead. This package provides a default execution engine (type StdEng struct{}
: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/defaultengine.go), which could be extended (example: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/example_extension_test.go) . The idea was to have aninternal/execution
package which held all the code for the default execution engine.Data Structures
The most fundamental data structure is
storage.Header
, which is an analogue for a Go slice: it's a three word structure. It's chosen because it is a ridiculously simple structure can store Go-allocated memory, C-allocated memory and device-allocated memory (like CUDA).On top of
storage.Header
istensor.array
. It's essentially astorage.Header
with an additional field for the type. Thev
field will eventually be phased out once the refactor is complete.On top of
tensor.array
are the various implementations oftensor.Tensor
. Chief amongst these is thetensor.Dense
struct. Essentially it's atensor.array
coupled with some access patterns and meta information.Access to the data in the
tensor.Tensor
can be achieved by use ofIterator
s. TheIterator
basically assumes that the data is held in a flat slice, and returns the next index on the slice. There are auxiliary methods likeNextValidity
to handle special case tensors like masked tensors, where some elements are masked from operations.The bug happens in the
Chan
method of theFlatIterator
type.How to reproduce
The branch where the bug is known to exist is the
debugrace
branch, which can be found here: 1dee6d2 .git checkout debugrace
GOMAXPROCS
like so:GOMAXPROCS=1 go test -run=.
. Try it with variousGOMAXPROCS
, one of them is bound to trigger an issue.recover
statement here https://github.com/chewxy/gorgonia/blob/debugrace/tensor/dense_viewstack_specializations.go#L636. Removing the deferred function causes a index out of bounds panic.Stack
function: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/dense_matop_test.go#L768 . If only the stack test is run (for exampleGOMAXPROCS=1 go test -run=Stack
), it is unlikely the problem will show up (I wrote a tiny python script to run it as many times as possible with manyGOMAXPROCS
configurations and none of them caused an error).You should get something like this:
Environments
I've managed to reproduce the issue on OS X, with Go 1.8 and on Ubuntu 16.10 with Go 1.8.2 and Go tip (whatever gvm thinks is Go tip). I've no access to Go on a windows box so I can't test it on Windows.
Magic and Unsafe Use
As part of the refactoring, there are a few magic bits being used. Here I attempt to list them all (may not be exhaustive):
unsafe.Pointer
is used instead of the standard one likereflect.SliceHeader
which stores auintptr
. This is due to the fact that I want Go to keep a reference to the actual slice. This may affect the runtime and memory allocation.. I'm not too sure.//go:linkname
is used in some internal packages (specific example here: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/internal/execution/generic_arith_vv.go). It's basically just a rename of functions in github.com/chewxy/vecf32 and github.com/chewxy/vecf64. Those packages contain optional AVX/SSE related vector operations like arithmetics. However, those have to be manually invoked via a build tag. By default it uses go algorithms, not SSE/AVX operations.//go:linkname
is used in unsafe.go: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/unsafe.go#L105. However it should be noted thatmemmove
is never called as after some tests I decided it would be too unsafe to use (also explains why there are comments that sayTODO: implement memmove
.What I suspect
I suspect that there may be some naughty things happening in memory (because it only happens when all the tests are run). The problem is I don't know exactly where to start looking.
The text was updated successfully, but these errors were encountered: