Wrong model architecture in residual network? #15

Elvenson · 2021-05-08T04:50:11Z

I wonder is there any misconfiguration in model architecture. Specifically this function: https://github.com/gorgonia/agogo/blob/master/dualnet/ermahagerdmonards.go#L67
Because based from my understanding, from the paper (link) page 8/18 it said:

Each residual block applies the following modules sequentially to its input:
(1) A convolution of 256 filters of kernel size 3 × 3 with stride 1
(2) Batch normalization
(3) A rectifier nonlinearity
(4) A convolution of 256 filters of kernel size 3 × 3 with stride 1
(5) Batch normalization
(6) A skip connection that adds the input to the block
(7) A rectifier nonlinearity

Point 6 means that the add operation should be from input to the block and each module should be in sequence. I wonder is this a correct implementation:

func (m *maebe) share(input *G.Node, filterCount, layer int) (*G.Node, batchNormOp, batchNormOp) {
	layer1, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
	layer2, l2Op := m.res(layer1, filterCount, fmt.Sprintf("Layer2 of Shared Layer %d", layer))
	added := m.do(func() (*G.Node, error) { return G.Add(input, layer2) })
	retVal := m.rectify(added)
	return retVal, l1Op, l2Op
}

The text was updated successfully, but these errors were encountered:

owulveryck · 2021-05-08T09:39:54Z

Thank you.
It looks good to me, but I am not an expert of this algo (still in the learning phase)
This would explain why I cannot get a “good player” in a reasonable time.

owulveryck · 2021-05-09T14:17:48Z

I tried to apply your suggestion (without further investigation), but it fails:

panic: 
Non Differentiable WRTs: 
    map[
       FilterLayer1 of Shared Layer 0 :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%15)_γ :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%15)_β :: Tensor-4 float32:{} 
       FilterLayer1 of Shared Layer 1 :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%32)_γ :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%32)_β :: Tensor-4 float32:{} 
       FilterLayer1 of Shared Layer 2 :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%4f)_γ :: Tensor-4 float32:{} Aᵀ{0, 3, 1, 2}(%4f)_β :: Tensor-4 float32:{}]

Elvenson · 2021-05-09T15:06:38Z

Weird it seems to work from my side 😓 Can I take a look at your modification?

owulveryck · 2021-05-11T12:19:58Z

I made this patch:

@@ -65,9 +65,9 @@ func (m *maebe) res(input *G.Node, filterCount int, name string) (*G.Node, batch
 }

 func (m *maebe) share(input *G.Node, filterCount, layer int) (*G.Node, batchNormOp, batchNormOp) {
-       layer1, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
+       _, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
        layer2, l2Op := m.res(input, filterCount, fmt.Sprintf("Layer2 of Shared Layer %d", layer))
-       added := m.do(func() (*G.Node, error) { return G.Add(layer1, layer2) })
+       added := m.do(func() (*G.Node, error) { return G.Add(input, layer2) })
        retVal := m.rectify(added)
        return retVal, l1Op, l2Op
 }

and the learning function is basically:

func learn() error {
	conf := agogo.Config{
		Name:            "Tic Tac Toe",
		NNConf:          dual.DefaultConf(3, 3, 10),
		MCTSConf:        mcts.DefaultConfig(3),
		UpdateThreshold: 0.52,
	}
	conf.NNConf.BatchSize = 100
	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
	conf.NNConf.K = 3
	conf.NNConf.SharedLayers = 3
	conf.MCTSConf = mcts.Config{
		PUCT:           1.0,
		M:              3,
		N:              3,
		Timeout:        50 * time.Millisecond,
		PassPreference: mcts.DontPreferPass,
		Budget:         1000,
		DumbPass:       true,
		RandomCount:    0,
	}

	outEnc := NewEncoder()
	go func(h http.Handler) {
		mux := http.NewServeMux()
		mux.Handle("/ws", h)
		mux.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.Dir("./htdocs"))))

		log.Println("go to http://localhost:8080/static")
		http.ListenAndServe(":8080", mux)
	}(outEnc)

	conf.Encoder = encodeBoard
	conf.OutputEncoder = outEnc

	g := mnk.TicTacToe()
	a := agogo.New(g, conf)
	reader := bufio.NewReader(os.Stdin)
	fmt.Print("press ented when ready")
	reader.ReadString('\n')

	//a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games.
	err := a.Learn(5, 50, 100, 100) // 5 epochs, 50 episode, 100 NN iters, 100 games.
	if err != nil {
		return err
	}
	err = a.Save("example.model")
	if err != nil {
		return err
	}
	return nil
}

Elvenson · 2021-05-12T10:01:13Z

I see but I think it should be like this can you help give it a try:

	layer1, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
	layer2, l2Op := m.res(layer1, filterCount, fmt.Sprintf("Layer2 of Shared Layer %d", layer))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong model architecture in residual network? #15

Wrong model architecture in residual network? #15

Elvenson commented May 8, 2021

owulveryck commented May 8, 2021

owulveryck commented May 9, 2021

Elvenson commented May 9, 2021 •

edited

Loading

owulveryck commented May 11, 2021

Elvenson commented May 12, 2021

Wrong model architecture in residual network? #15

Wrong model architecture in residual network? #15

Comments

Elvenson commented May 8, 2021

owulveryck commented May 8, 2021

owulveryck commented May 9, 2021

Elvenson commented May 9, 2021 • edited Loading

owulveryck commented May 11, 2021

Elvenson commented May 12, 2021

Elvenson commented May 9, 2021 •

edited

Loading