Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mat: calling qr.Factorize leads to OOM for matrixes with many rows #1968

Open
ericwenn opened this issue Apr 17, 2024 · 5 comments · May be fixed by #1970
Open

mat: calling qr.Factorize leads to OOM for matrixes with many rows #1968

ericwenn opened this issue Apr 17, 2024 · 5 comments · May be fixed by #1970

Comments

@ericwenn
Copy link

ericwenn commented Apr 17, 2024

What are you trying to do?

After upgrading gonum to 0.15.0 calls to (VecDense).SolveVec(a,b) where a.Dims() == (10k+, 3) leads to OOM.
The stack where OOM happens:

runtime.systemstack_switch()
        /nix/store/8yw3g52r95h7cv09lcrran92n212997b-go-1.22.0/share/go/src/runtime/asm_amd64.s:474 +0x8 fp=0xc000087230 sp=0xc000087220 pc=0x4788a8
runtime.(*mheap).alloc(0x3667b52000?, 0x1b33da9?, 0x80?)
        /nix/store/8yw3g52r95h7cv09lcrran92n212997b-go-1.22.0/share/go/src/runtime/mheap.go:958 +0x5b fp=0xc000087278 sp=0xc000087230 pc=0x42fc1b
runtime.(*mcache).allocLarge(0x100?, 0x3667b50b88, 0x1?)
        /nix/store/8yw3g52r95h7cv09lcrran92n212997b-go-1.22.0/share/go/src/runtime/mcache.go:234 +0x85 fp=0xc0000872c0 sp=0xc000087278 pc=0x41ce45
runtime.mallocgc(0x3667b50b88, 0xd67320, 0x1)
        /nix/store/8yw3g52r95h7cv09lcrran92n212997b-go-1.22.0/share/go/src/runtime/malloc.go:1165 +0x597 fp=0xc000087348 sp=0xc0000872c0 pc=0x4141d7
runtime.makeslice(0xc0000124b0?, 0x18?, 0xc0000874b0?)
        /nix/store/8yw3g52r95h7cv09lcrran92n212997b-go-1.22.0/share/go/src/runtime/slice.go:107 +0x49 fp=0xc000087370 sp=0xc000087348 pc=0x459989
gonum.org/v1/gonum/mat.NewDense(...)
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/dense.go:60
gonum.org/v1/gonum/mat.(*QR).updateQ(0xc000087630)
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/qr.go:107 +0x78 fp=0xc0000874c0 sp=0xc000087370 pc=0xb5fd38
gonum.org/v1/gonum/mat.(*QR).factorize(0xc000087630, {0x1070598, 0xc0004c6540}, 0x49)
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/qr.go:101 +0x316 fp=0xc0000875e0 sp=0xc0000874c0 pc=0xb5fc36
gonum.org/v1/gonum/mat.(*QR).Factorize(...)
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/qr.go:82
gonum.org/v1/gonum/mat.(*Dense).Solve(0xc0004c6700, {0x1070598, 0xc0004c6540}, {0x1070598, 0xc0004c6740})
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/solve.go:65 +0x245 fp=0xc000087700 sp=0xc0000875e0 pc=0xb63545
gonum.org/v1/gonum/mat.(*VecDense).SolveVec(0xc000110360, {0x1070598, 0xc0004c6540}, {0x10735c0, 0xc0001103c0})
        /home/ericwenn/go/pkg/mod/gonum.org/v1/gonum@v0.15.1-0.20240413203616-1b7d9ca04ac9/mat/solve.go:118 +0x6fa fp=0xc000087930 sp=0xc000087700 pc=0xb63e9a

I've traced this down to changes in mat.QR.updateQ here.

What updateQ does is allocate a new Dense with dimensions MxM, which for our value on M would mean allocating 100M cell matrix. Based on my understanding of the code, the resulting matrix should have dimensions (MxN).

What version of Go and Gonum are you using?

$ go version
go version go1.22.0 linux/amd64

$ grep gonum.org/v1/gonum go.sum
gonum.org/v1/gonum v0.15.0 h1:2lYxjRbTYyxkJxlhC+LvJIx3SsANPdRybu1tGj9/OrQ=
gonum.org/v1/gonum v0.15.0/go.mod h1:xzZVBJBtS+Mz4q0Yl2LJTk+OxOg4jiXZ7qBoM0uISGo=

Does this issue reproduce with the current master?

Yes

@kortschak
Copy link
Member

kortschak commented Apr 17, 2024

Can you provide a minimal reproducer for testing and fix the link in the report please?

Note also from the documentation that Q is m×m.

@tvkn
Copy link

tvkn commented Apr 17, 2024

I believe the issue arises when one tries to solve very large systems.

The issue is introduced by this commit which explicitly computes and stores the Q matrix (instead of just the reflectors tau).

The Q matrix doesn't seem to be explicitly needed in the QR SolveVec, but the new update forces its computation, which in practice can limit the size of the linear systems the function can solve.

@ericwenn
Copy link
Author

ericwenn commented Apr 17, 2024

Minimal reproduction test:

func TestReproduction(t *testing.T) {
	const (
		M = 100_000
		N = 3
	)
	systemMat := mat.NewDense(M, N, make([]float64, M*N))
	measurementsVec := mat.NewVecDense(M, make([]float64, M))

	var results mat.VecDense
	err := results.SolveVec(systemMat, measurementsVec)

	fmt.Println(err)
}

On version 0.14.0 and below this correctly returns error matrix singular or near-singular with condition number +Inf, on version 0.15.0 it crashes due to OOM.

PS I've updated the link in original issue, thanks for pointing that out.

@kortschak
Copy link
Member

Thanks. Yeah, for situations where the Q is not needed, this is a problem. I think we can get the performance benefit of the single pre-calculation by lazily calculating, but storing the value so the work is only done once. I'll take a look in the weekend.

@ericwenn
Copy link
Author

Thanks. Yeah, for situations where the Q is not needed, this is a problem. I think we can get the performance benefit of the single pre-calculation by lazily calculating, but storing the value so the work is only done once. I'll take a look in the weekend.

That would be great, thanks for the quick response!

@kortschak kortschak linked a pull request Apr 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants