Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices #3

Closed
owulveryck opened this issue Jan 16, 2021 · 4 comments · Fixed by #7
Assignees
Labels
bug Something isn't working

Comments

@owulveryck
Copy link
Member

owulveryck commented Jan 16, 2021

I am trying to run the simple example of tic-tac-toe as is:

package agogo

import (
	"log"
	"time"

	dual "github.com/gorgonia/agogo/dualnet"
	"github.com/gorgonia/agogo/encoding/mjpeg"
	"github.com/gorgonia/agogo/game"
	"github.com/gorgonia/agogo/game/mnk"
	"github.com/gorgonia/agogo/mcts"

	_ "net/http/pprof"
)

func encodeBoard(a game.State) []float32 {
	board := EncodeTwoPlayerBoard(a.Board(), nil)
	for i := range board {
		if board[i] == 0 {
			board[i] = 0.001
		}
	}
	playerLayer := make([]float32, len(a.Board()))
	next := a.ToMove()
	if next == game.Player(game.Black) {
		for i := range playerLayer {
			playerLayer[i] = 1
		}
	} else if next == game.Player(game.White) {
		// vecf32.Scale(board, -1)
		for i := range playerLayer {
			playerLayer[i] = -1
		}
	}
	retVal := append(board, playerLayer...)
	return retVal
}

func ExampleAZ() {
	conf := Config{
		Name:            "Tic Tac Toe",
		NNConf:          dual.DefaultConf(3, 3, 10),
		MCTSConf:        mcts.DefaultConfig(3),
		UpdateThreshold: 0.52,
	}
	conf.NNConf.BatchSize = 100
	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
	conf.NNConf.K = 3
	conf.NNConf.SharedLayers = 3
	conf.MCTSConf = mcts.Config{
		PUCT:           1.0,
		M:              3,
		N:              3,
		Timeout:        100 * time.Millisecond,
		PassPreference: mcts.DontPreferPass,
		Budget:         1000,
		DumbPass:       true,
		RandomCount:    0,
	}

	conf.Encoder = encodeBoard
	outEnc := mjpeg.NewEncoder(300, 300)
	conf.OutputEncoder = outEnc

	g := mnk.TicTacToe()
	a := New(g, conf)

	err := a.Learn(1, 1, 10, 1) // 5 epochs, 50 episode, 100 NN iters, 100 games.
	if err != nil {
		log.Fatal(err)
	}
	// output:
}

Running the test fails with this error.

❯ go test -run=^Example
2021/01/16 17:22:18 Self Play for epoch 0. Player A 0xc00043e070, Player B 0xc00043e2a0
2021/01/16 17:22:18 Using Dummy
2021/01/16 17:22:18 Set up selfplay: Switch To inference for A. A.NN 0xc0000c9380 (*dual.Dual)
2021/01/16 17:22:18 Set up selfplay: Switch To inference for B. B.NN 0xc0000c9450 (*dual.Dual)
2021/01/16 17:22:18     Episode 0
2021/01/16 17:22:19 Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices
exit status 1
FAIL    github.com/gorgonia/agogo       1.229s

This error is triggered from:

agogo/dualnet/meta.go

Lines 71 to 73 in 05cf5f1

if matXs, err = native.MatrixF32(Xs); err != nil {
return errors.Wrapf(err, "shuffle batch failed - matX")
}

It looks like the tensor library is faulty here.

I will investigate. Meanwhile, any hint welcome.

Meanwhile, disabling the shuffleBatch method in the dualnet works.

@owulveryck owulveryck added the bug Something isn't working label Jan 16, 2021
@owulveryck owulveryck self-assigned this Jan 16, 2021
@owulveryck
Copy link
Member Author

owulveryck commented Jan 16, 2021

The origin of the error is here:

https://github.com/gorgonia/tensor/blob/d5ff158e8ba02c3e4ad3de4932959c4be1bfce94/dense.go#L630

Here are two debugging sessions:

dlv test
Type 'help' for list of commands.
(dlv) funcs test.Example
(dlv) break matXs meta.go:71
Breakpoint matXs set at 0x1d5fbc1 for github.com/gorgonia/agogo/dualnet.shuffleBatch() ./meta.go:71
(dlv) continue
> [matXs] github.com/gorgonia/agogo/dualnet.shuffleBatch() ./meta.go:71 (hits goroutine(20):1 total:1) (PC: 0x1d5fbc1)
    66:         }()
    67:         Xs.Reshape(as2D(Xs.Shape())...)
    68:         π.Reshape(as2D(π.Shape())...)
    69:
    70:         var matXs, matPis [][]float32
=>  71:         if matXs, err = native.MatrixF32(Xs); err != nil {
    72:                 return errors.Wrapf(err, "shuffle batch failed - matX")
    73:         }
    74:         if matPis, err = native.MatrixF32(π); err != nil {
    75:                 return errors.Wrapf(err, "shuffle batch failed - pi")
    76:         }
(dlv) print Xs
*gorgonia.org/tensor.Dense {
        AP: gorgonia.org/tensor.AP {
                shape: gorgonia.org/tensor.Shape len: 2, cap: 2, [32,2178],
                strides: []int len: 2, cap: 2, [2178,1],
                fin: true,
                o: 0,
                Δ: NotTriangle (0),},
        array: gorgonia.org/tensor.array {
                Header: (*"gorgonia.org/tensor/internal/storage.Header")(0xc00026db38),
                t: (*"gorgonia.org/tensor.Dtype")(0xc00026db50),},
        flag: 0,
        e: gorgonia.org/tensor.Engine(gorgonia.org/tensor.StdEng) {
                E: gorgonia.org/tensor/internal/execution.E {},},
        oe: gorgonia.org/tensor.standardEngine(gorgonia.org/tensor.StdEng) {
                E: gorgonia.org/tensor/internal/execution.E {},},
        old: gorgonia.org/tensor.AP {
                shape: gorgonia.org/tensor.Shape len: 0, cap: 0, nil,
                strides: []int len: 0, cap: 0, nil,
                fin: false,
                o: 0,
                Δ: NotTriangle (0),},
        transposeWith: []int len: 0, cap: 0, nil,
        viewOf: 0,
        mask: []bool len: 0, cap: 0, nil,
        maskIsSoft: false,}
(dlv) 
dlv test
Type 'help' for list of commands.
(dlv) funcs test.Example
(dlv) break badMatrice /Users/olivierwulveryck/GOPROJECTS/pkg/mod/gorgonia.org/tensor@v0.9.18/dense.go:630
Breakpoint badMatrice set at 0x193a8d7 for gorgonia.org/tensor.(*Dense).RequiresIterator() /Users/olivierwulveryck/GOPROJECTS/pkg/mod/gorgonia.org/tensor@v0.9.18/dense.go:630
(dlv) continue
2021/01/16 17:46:46 Self Play for epoch 0. Player A 0xc00020a000, Player B 0xc00020a070
2021/01/16 17:46:46 Using Dummy
2021/01/16 17:46:46 Set up selfplay: Switch To inference for A. A.NN 0xc000177a00 (*dual.Dual)
2021/01/16 17:46:46 Set up selfplay: Switch To inference for B. B.NN 0xc000177ad0 (*dual.Dual)
2021/01/16 17:46:46     Episode 0
> [badMatrice] gorgonia.org/tensor.(*Dense).RequiresIterator() /Users/olivierwulveryck/GOPROJECTS/pkg/mod/gorgonia.org/tensor@v0.9.18/dense.go:630 (hits goroutine(1):1 total:1) (PC: 0x193a8d7)
   625: func (t *Dense) RequiresIterator() bool {
   626:         if t.len() == 1 {
   627:                 return false
   628:         }
   629:         // non continuous slice, transpose, or masked. If it's a slice and contiguous, then iterator is not required
=> 630:         if !t.o.IsContiguous() || !t.old.IsZero() || t.IsMasked() {
   631:                 return true
   632:         }
   633:         return false
   634: }
   635:
(dlv) print t
*gorgonia.org/tensor.Dense {
        AP: gorgonia.org/tensor.AP {
                shape: gorgonia.org/tensor.Shape len: 2, cap: 2, [0,18],
                strides: []int len: 2, cap: 2, [18,1],
                fin: true,
                o: 0,
                Δ: NotTriangle (0),},
        array: gorgonia.org/tensor.array {
                Header: (*"gorgonia.org/tensor/internal/storage.Header")(0xc0000bd038),
                t: (*"gorgonia.org/tensor.Dtype")(0xc0000bd050),},
        flag: 0,
        e: gorgonia.org/tensor.Engine(gorgonia.org/tensor.StdEng) {
                E: gorgonia.org/tensor/internal/execution.E {},},
        oe: gorgonia.org/tensor.standardEngine(gorgonia.org/tensor.StdEng) {
                E: gorgonia.org/tensor/internal/execution.E {},},
        old: gorgonia.org/tensor.AP {
                shape: gorgonia.org/tensor.Shape len: 0, cap: 0, nil,
                strides: []int len: 0, cap: 0, nil,
                fin: false,
                o: 0,
                Δ: NotTriangle (0),},
        transposeWith: []int len: 0, cap: 0, nil,
        viewOf: 0,
        mask: []bool len: 0, cap: 0, nil,
        maskIsSoft: false,}
(dlv) 

@owulveryck
Copy link
Member Author

The error happens when there are fewer examples than the batch size;
therefore the batches variable is zero, and it creates a tensor with shape (0,x,y,z) which triggers the error in the training phase.

@chewxy
Copy link
Member

chewxy commented Jan 19, 2021

why is it colmajor?

@owulveryck
Copy link
Member Author

why is it colmajor?

What do you mean? Where do you see that?

@carleeto carleeto mentioned this issue Feb 7, 2021
owulveryck added a commit that referenced this issue Feb 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants