# 04. Faster Kibo Generation

Based on `mock5.py/gen-record` in https://github.com/lumiknit/mock5.py

## TicTacToe에서 오목으로

본격적으로 오목에 헤딩하기 전에 확인해야할 것이 있습니다.

일단 지금까지는 naive하게 Q-learning with DNN을 구현해보았고, naive하게 Monte Carlo Tree Search
를 구현하기도 했습니다.
이 둘은 오목의 판 상태의 집합 $\mathcal{B}$와
게임의 결과 집합 $R=\{0, 0.5, 1\}$에 대해
오목판을 받아서 양측이 최선을 다 할 경우의 결과를
내놓는 함수
$$E: \mathcal{B} \to I$$
에 어떤 형태로든 근사를 하고싶은 것인데,
Q-learning은 각 수에 대해서
현재 수에 대한 보상과 미래에 승부로 나올 보상에
할인율을 적용한 값으로 $Q$를 갱신하여 가는 것이며,
MCTS는 게임을 끝낸 뒤에 해당 결과를 바탕으로
기보 상에 존재하는 상태에 대해 승률 함수 $w$를
갱신해나가는 방식으로 근사를 합니다.

이 때문에 두 방식 다 함수를 경우의 수를 모두 포괄하는
유한차원의 벡터로 구현하면, 현재 경험한 상태에 대해서만
함수를 근사시켜나가게 됩니다.
(만약에 다른 모델을 사용하게 되면 단순히 해당 상태뿐만
아니라 이웃하는 다른 상태의 값도 바뀔 수 있으므로
얘기가 조금 다릅니다.)

만약에 순수하게 힌트 없이 강화학습을 하겠다고 하면
처음에 랜덤한 함수로 시작해서 자신과의 싸움을
거듭하여 나가는 것이 맞겠지만, 이 방법은 너무
오래 걸립니다.
그래서 초기에 최소한 사람이 알고 있는 전술 정도는
구사할 수 있게 초기값을 조절하는 것이
필요해보입니다.
(그리고 그것을 위해 있는 것이
mock5.py의 analysis based algorithm입니다.)

하지만 이때 발생하는 문제가
*기보를 만들어내는 것이 굉장히 오래 걸린다*는 것입니다.


## Mock5.py

일단 minified된 Mock5와 analysis based agent를 가져옵니다.
(https://github.com/lumiknit/mock5.py)

In [1]:
_C=False
_B=True
_A=None
EMPTY=0
BLACK=1
WHITE=2
_STONE_CHAR=['.','O','X']
def _digit_to_int(s,offset=0):
	v=ord(s[offset])
	if 48<=v and v<48+10:return v-48
	elif 65<=v and v<65+26:return v-65+10
	elif 97<=v and v<97+26:return v-97+10
	else:return _A
def _int_to_digit(v):
	if 0<=v and v<10:return chr(48+v)
	elif 10<=v and v<10+26:return chr(65+v-10)
	else:return _A
class Mock5:
	def __init__(self,height=15,width=15,board=_A,history=_A):
		if type(height)is not int or type(width)is not int:raise TypeError
		if height<=0 or height>36 or width<=0 or width>36:raise Exception('Mock5 board size should be between 1 and 36!')
		self.height=height;self.width=width
		if board is not _A:
			if len(board)!=height*width:raise ValueError
			self.board=list(board)
		else:self.board=[0]*(self.height*self.width)
		self.player=1;self.history=[]
		if history is not _A:
			for idx in history:r,c=self._expand_index(idx);self.place_stone(r,c)
	def __str__(self):
		A=' {}';r='=====================================';r+="\n [ Turn {:3d} ; {}P's turn (tone = {}) ]".format(len(self.history),self.player,_STONE_CHAR[self.player]);r+='\n  |'
		for i in range(self.width):r+=A.format(_int_to_digit(i))
		r+='\n--+'
		for i in range(self.width):r+='--'
		for i in range(self.height):
			r+='\n{} |'.format(_int_to_digit(i))
			for j in range(self.width):r+=A.format(_STONE_CHAR[self.board[i*self.width+j]])
		return r
	def _reduce_index(self,r,c):return r*self.width+c
	def _expand_index(self,idx):return idx//self.width,idx%self.width
	def _check_key(self,key):
		if type(key)is not tuple or len(key)!=2:raise TypeError
		if type(key[0])is not int or type(key[1])is not int:raise TypeError
		if key[0]<0 or key[0]>=self.height:raise IndexError
		if key[1]<0 or key[1]>=self.width:raise IndexError
	def __getitem__(self,key):self._check_key(key);return self.board[self._reduce_index(key[0],key[1])]
	def __setitem__(self,key,value):
		self._check_key(key)
		if type(value)is not int or value<0 or value>2:raise TypeError
		self.board[self._reduce_index(key[0],key[1])]=value
	def duplicate(self):return self.__class__(self.height,self.width,board=self.board)
	def replay(self):return self.__class__(self.height,self.width,history=self.history)
	def rotate_ccw(self):
		def rotate_idx(idx):r,c=self._expand_index(idx);c=self.width-c-1;return c*self.height+r
		new_history=list(map(rotate_idx,self.history));return self.__class__(self.width,self.height,history=new_history)
	def flip_vertical(self):
		def flip_idx(idx):r,c=self._expand_index(idx);r=self.height-r-1;return r*self.width+c
		new_history=list(map(flip_idx,self.history));return self.__class__(self.height,self.width,history=new_history)
	class _IndexIter:
		def __init__(self,game,sr,sc,dr,dc):
			if dr==0 and dc==0:raise ValueError
			self.game=game;self.r=sr;self.c=sc;self.dr=dr;self.dc=dc
			if dr>=0:self.br=self.game.height
			else:self.br=-1
			if dc>=0:self.bc=self.game.width
			else:self.bc=-1
		def __iter__(self):return self
		def __next__(self):
			if self.r==self.br or self.c==self.bc:raise StopIteration
			else:ret=self.r,self.c;self.r+=self.dr;self.c+=self.dc;return ret
	def first_of_row(self,idx):
		if idx<0 or idx>=self.height:raise IndexError
		return idx,0
	def first_of_column(self,idx):
		if idx<0 or idx>=self.width:raise IndexError
		return 0,idx
	def first_of_right_down(self,idx):
		r,c=0,0
		if idx<0:raise IndexError
		elif idx<self.height:r=self.height-idx-1
		elif idx<self.width+self.height-1:c=idx-self.height+1
		else:raise IndexError
		return r,c
	def first_of_left_down(self,idx):
		r,c=0,self.width-1
		if idx<0:raise IndexError
		elif idx<self.width:c=idx
		elif idx<self.width+self.height-1:r=idx-self.width+1
		else:raise IndexError
		return r,c
	def iter_row(self,idx):r,c=self.first_of_row(idx);return self._IndexIter(self,r,c,0,1)
	def iter_column(self,idx):r,c=self.first_of_column(idx);return self._IndexIter(self,r,c,1,0)
	def iter_right_down(self,idx):r,c=self.first_of_right_down(idx);return self._IndexIter(self,r,c,1,1)
	def iter_left_down(self,idx):r,c=self.first_of_left_down(idx);return self._IndexIter(self,r,c,1,-1)
	def slice_row(self,idx):return[self[(r,c)]for(r,c)in self.iter_row(idx)]
	def slice_column(self,idx):return[self[(r,c)]for(r,c)in self.iter_column(idx)]
	def slice_right_down(self,idx):return[self[(r,c)]for(r,c)in self.iter_right_down(idx)]
	def slice_left_down(self,idx):return[self[(r,c)]for(r,c)in self.iter_left_down(idx)]
	def can_place_at(self,r,c,player=_A):
		if type(r)is not int or type(c)is not int:raise TypeError
		if r<0 or r>=self.height or c<0 or c>=self.width:raise IndexError
		if player==_A:player=self.player
		is_empty=self[(r,c)]==0;return is_empty
	def place_stone(self,r,c,player=_A):
		if not self.can_place_at(r,c,player):return _C
		if player==_A:player=self.player;self.player=3-self.player;self.history.append(self._reduce_index(r,c))
		self[(r,c)]=player;return _B
	def place_stone_at_index(self,idx,player=_A):r,c=self._expand_index(idx);return self.place_stone(r,c,player)
	def history_depth(self):return len(self.history)
	def undo(self):
		if len(self.history)>0:
			idx=self.history.pop()
			if self.board[idx]!=3-self.player:raise Exception('Board is corrupted!')
			self.board[idx]=0;self.player=3-self.player
		return len(self.history)
	def _scan_with_iter(self,iter):
		cnt=1;p=self[iter.__next__()]
		for (r,c) in iter:
			cnt=cnt+1 if self[(r,c)]==p else 1;p=self[(r,c)]
			if cnt>=5 and p>0:return p
		return _A
	def check_win(self):
		for x in range(self.height):
			v=self._scan_with_iter(self.iter_row(x))
			if v is not _A:return v
		for x in range(self.width):
			v=self._scan_with_iter(self.iter_column(x))
			if v is not _A:return v
		for x in range(self.height+self.width-1):
			v=self._scan_with_iter(self.iter_right_down(x))
			if v is not _A:return v
			v=self._scan_with_iter(self.iter_left_down(x))
			if v is not _A:return v
		if len(self.history)>=self.width*self.height:return 0
		return _A
	def play(self,input1=_A,input2=_A,random_first=_B,print_intermediate_state=_B,print_messages=_B):
		def user_input(game):
			while _B:
				v=input("row-col (e.g. 3a, 77) ; 'gg' ; 'undo' > ").strip()
				try:
					if v=='gg':return _C,0
					elif v=='undo':return _C,1
					v=[y for x in map(list,v.split())for y in x]
					if len(v)!=2:raise Exception()
					r=_digit_to_int(v[0]);c=_digit_to_int(v[1])
					if r is _A or c is _A:raise Exception()
					if not self.can_place_at(r,c):raise IndexError()
					return r,c
				except IndexError:print('Cannot place stone at {}, {}!'.format(v[0],v[1]))
				except Exception:print('Wrong input!')
				finally:0
		user_input.name='user';import random;exchanged=_C
		if random_first and random.random()<0.5:exchanged=_B;input1,input2=input2,input1
		if input1 is _A:input1=user_input
		if input2 is _A:input2=user_input
		pif=[_A,input1,input2]
		def player_name(idx):
			A='{}p ({})'
			if hasattr(pif[idx],'name'):return A.format(idx,pif[idx].name)
			else:return A.format(idx,pif[idx])
		winner=_A
		while _B:
			if print_intermediate_state:print(str(self))
			ret=pif[self.player](self)
			if ret is _A:r,c=_C,0
			else:r,c=ret
			if r is _A or r is _C:
				if c==1:self.undo()
				elif c==0:
					if print_messages:print('{} give up!'.format(player_name(self.player)))
					winner=3-self.player;break
			elif not self.place_stone(r,c):
				if print_messages:print('{} cheats! (try to place stone at {}, {})'.format(player_name(self.player),r,c))
				winner=3-self.player;break
			winner=self.check_win()
			if winner!=_A:
				if print_messages:
					print(str(self))
					if winner==0:print('Draw!')
					else:print('{} win!'.format(player_name(winner)))
				break
		if winner is not _A and winner>0 and exchanged:winner=3-winner
		return winner
	def _map_for_player(self,player=_A):
		if player==_A:player=self.player
		return[0,player,3-player]
	def board_for(self,player=_A):m=self._map_for_player(player);return[m[x]for x in self.board]
	def one_hot_encoding(self,player=_A):
		m=self._map_for_player(player);a=[[0]*(self.height*self.width)for _ in range(3)]
		for i in range(self.height*self.width):a[m[self.board[i]]][i]=1
		return a
	def numpy(self,player=_A,one_hot_encoding=_B,rank=_A,dtype=_A):
		import numpy as np
		if dtype is _A:dtype=np.float
		if one_hot_encoding:
			a=self.one_hot_encoding(player=player);n=np.array(a,dtype=dtype)
			if rank==1:n=n.reshape(-1)
			elif rank==2:0
			else:n=n.reshape(3,self.height,self.width)
			return n
		else:
			a=self.board_for(player=player);n=np.array(a)
			if rank==2:n=n.reshape(self.width,self.height)
			return n
	def tensor(self,player=_A,one_hot_encoding=_B,rank=_A,dtype=_A):
		import torch
		if dtype is _A:dtype=torch.float
		if one_hot_encoding:
			a=self.one_hot_encoding(player=player);n=torch.tensor(a,dtype=dtype)
			if rank==1:n=n.view(-1)
			elif rank==2:0
			else:n=n.view(3,self.height,self.width)
			return n
		else:
			a=self.board_for(player=player);n=torch.tensor(a)
			if rank==2:n=n.view(self.width,self.height)
			return n
	def empty_array(self,empty=_B,non_empty=_C):m=[empty,non_empty,non_empty];return[m[self.board[idx]]for idx in range(self.height*self.width)]
	def empty_numpy(self,rank=1,empty=1.0,non_empty=0.0,dtype=_A):
		import numpy as np
		if dtype is _A:dtype=type(empty)
		em=self.empty_array(empty,non_empty);arr=np.array(em,dtype=dtype)
		if rank==2:return arr.reshape(self.height,self.width)
		else:return arr
	def empty_tensor(self,rank=1,empty=_B,non_empty=_C,dtype=_A):
		import torch
		if dtype is _A:dtype=type(empty)
		em=self.empty_array(empty,non_empty);tensor=torch.tensor(em,dtype=dtype)
		if rank==2:return tensor.view(self.height,self.width)
		return tensor

In [2]:
def agent_random(game):
  import random
  idx = random.randint(0, game.height * game.width)
  for off in range(game.height * game.width):
    rdx = (idx + off) % (game.height * game.width)
    r, c = game._expand_index(rdx)
    if game.can_place_at(r, c):
      return (r, c)

In [3]:
def _sign(x):
  if x > 0: return 1
  elif x < 0: return -1
  else: return 0

N_OVER_5 = 7
N_5 = 6
N_OPEN_4 = 5
N_OPEN_3 = 4
N_4 = 3
N_3 = 2
N_2 = 1

B_OVER_5 = 0b1000000
B_5 = 0b100000
B_OPEN_4 = 0b10000
B_OPEN_3 = 0b1000
B_4 = 0b100
B_3 = 0b10
B_2 = 0b1

def B_str(bm):
  if bm & B_OVER_5: return ">5"
  elif bm & B_5: return "=5"
  elif bm & B_OPEN_4: return "+4"
  elif bm & B_4: return "4"
  elif bm & B_OPEN_3: return "+3"
  elif bm & B_3: return "3"
  elif bm & B_2: return "2"
  return "."

class Analysis:
  """ Omok Anlyzer

  It takes a game board Mock5 and anlayze the state:
  - (TODO) (semi-)opend 4 connections
  - (TODO) 3-3, 3-4, 4-4 after one move
  """
  def __init__(self, game):
    self.game = game
    self.sz = game.height * game.width
    self.result = [None,]
    for i in range(2):
      a = [[0] * self.sz for i in range(4)]
      self.result.append(a)
    self.run_analysis()

  def print_result(self, c, dir):
    r = "====================================="
    r += "\n Analaysis, Color {}, Dir {}".format(c, dir)
    r += "\n   |"
    for i in range(self.game.width): r += " {:2d}".format(i)
    r += "\n--+"
    for i in range(self.game.width): r += "--"
    for i in range(self.game.height):
      r += "\n{:2d} |".format(i)
      for j in range(self.game.width):
        if self.game[i, j] > 0:
          r += " {:>2}".format(
              'o' if self.game[i, j] == c else 'X')
        else:
          r += " {:>2}".format(
              B_str(self.result[c][dir][i * self.game.width + j]))
    print(r)

  def is_marked(self, color, dir, idx, bm):
    return 0 != (self.result[color][dir][idx] & bm)

  def mark(self, color, dir, idx, bm):
    self.result[color][dir][idx] |= bm

  class _Window:
    def __init__(self, an, dir):
      self.board = [3] * 7
      self.dir = dir
      self.idx = [None] * 7
      self.n_color = [0, 0, 0, 5]
      self.p = 0
      self.an = an

    def color_at(self, off):
      return self.board[(self.p + off) % 7]

    def push(self, color=3, idx=None):
      self.idx[self.p] = idx
      self.board[self.p] = color
      self.p = (self.p + 1) % 7
      self.n_color[self.color_at(0)] -= 1
      self.n_color[self.color_at(5)] += 1

    def mark(self, c, off, bm):
      return self.an.mark(c, self.dir, self.idx[(self.p + off) % 7], bm)

    def _check_5(self, c):
      # Find empty idx
      j = 1
      while self.color_at(j) != 0: j += 1
      # If one of boundary has same color,
      # it is >=6-connect
      if self.color_at(0) == c or self.color_at(6) == c:
        self.mark(c, j, B_OVER_5)
      else:
        self.mark(c, j, B_5)

    def _check_4(self, c):
      for i in range(1, 6): self.mark(c, i, B_4)
      if self.color_at(1) == 0 and self.color_at(6) == 0:
        for i in range(2, 6):
          self.mark(c, i, B_OPEN_4)
      if self.color_at(0) == 0 and self.color_at(5) == 0:
        for i in range(1, 5):
          self.mark(c, i, B_OPEN_4)

    def _check_3(self, c):
      for i in range(1, 6): self.mark(c, i, B_3)
      if self.color_at(1) != 0 or self.color_at(5) != 0: return
      if self.color_at(2) == 0:
        if self.color_at(0) == 0:
          self.mark(c, 1, B_OPEN_3)
          self.mark(c, 2, B_OPEN_3)
        elif self.color_at(6) == 0:
          self.mark(c, 2, B_OPEN_3)
      elif self.color_at(4) == 0:
        if self.color_at(6) == 0:
          self.mark(c, 4, B_OPEN_3)
          self.mark(c, 5, B_OPEN_3)
        elif self.color_at(0) == 0:
          self.mark(c, 4, B_OPEN_3)
      else:
        if self.color_at(0) == 0:
          self.mark(c, 1, B_OPEN_3)
          self.mark(c, 3, B_OPEN_3)
        elif self.color_at(6) == 0:
          self.mark(c, 3, B_OPEN_3)
          self.mark(c, 5, B_OPEN_3)

    def _check_2(self, c):
      j = 1
      while self.color_at(j) == 0: j += 1
      for i in range(max(1, j - 2), min(j + 2, 5) + 1):
        self.mark(c, i, B_2)

    def check_connection(self):
      if self.n_color[3] == 0:
        c = 0
        if self.n_color[1] == 0: c = 2
        elif self.n_color[2] == 0: c = 1
        if c > 0:
          if self.n_color[c] == 4: self._check_5(c)
          elif self.n_color[c] == 3: self._check_4(c)
          elif self.n_color[c] == 2: self._check_3(c)
          elif self.n_color[c] == 1: self._check_2(c)

    def push_and_check(self, color=3, idx=None):
      self.push(color, idx)
      self.check_connection()

  def fill_result(self):
    # Row
    for i in range(self.game.height):
      w = self._Window(self, 0)
      for (r, c) in self.game.iter_row(i):
        w.push_and_check(self.game[r, c], self.game._reduce_index(r, c))
      w.push_and_check()
    # Col
    for i in range(self.game.width):
      w = self._Window(self, 1)
      for (r, c) in self.game.iter_column(i):
        w.push_and_check(self.game[r, c], self.game._reduce_index(r, c))
      w.push_and_check()
    # Diagonal
    for i in range(self.game.width + self.game.height - 1):
      w = self._Window(self, 2)
      for (r, c) in self.game.iter_right_down(i):
        w.push_and_check(self.game[r, c], self.game._reduce_index(r, c))
      w.push_and_check()
      w = self._Window(self, 3)
      for (r, c) in self.game.iter_left_down(i):
        w.push_and_check(self.game[r, c], self.game._reduce_index(r, c))
      w.push_and_check()

  def get_critical_at(self, color, dir, idx):
    bm = self.result[color][dir][idx]
    if bm & B_OVER_5: return N_OVER_5
    elif bm & B_5: return N_5
    elif bm & B_OPEN_4: return N_OPEN_4
    elif bm & B_4: return N_4
    elif bm & B_OPEN_3: return N_OPEN_3
    elif bm & B_3: return N_3
    elif bm & B_2: return N_2
    return 0

  def run_analysis(self):
    self.fill_result()

In [4]:
def agent_analysis_based(game):
  import math
  import random

  a = Analysis(game)

  my = game.player
  op = 3 - game.player

  max_s = -float('inf')
  max_i = None

  score = [0] * (game.height * game.width)
  for i in range(game.height * game.width):
    r, c = game._expand_index(i)
    if game[r, c] != 0: continue
    dr = r - game.height / 2
    dc = c - game.width / 2
    # Center is preffered
    score[i] -= math.sqrt((dr * dr) + (dc * dc))
    # Make some noise for random choice
    score[i] += random.random()
    for dir in range(4):
      m = a.get_critical_at(my, dir, i)
      o = a.get_critical_at(op, dir, i)
      score[i] += (10 ** m) + (10 ** o) * 0.7
    if score[i] > max_s:
      max_s = score[i]
      max_i = i
  if max_i is None: return None
  return game._expand_index(max_i)

이제 컴퓨터끼리 대결하는데 얼마나 걸리는지 확인합니다.

In [8]:
%%time
for i in range(10):
  Mock5(11, 11).play(agent_analysis_based, agent_analysis_based, 
                    print_intermediate_state=False, print_messages=False)

CPU times: user 3.79 s, sys: 9.94 ms, total: 3.8 s
Wall time: 3.81 s


환경에 따라서 조금 다르겠지만, Google Colab에서는
10번의 게임을 진행하는데 약 4초가 걸립니다.
보통 두 agent의 경우 적으면 20수, 많으면 70수 정도에서
승패가 갈리게 되는데, 10번이면 얻을 수 있는
수는 끽해봐야 500개입니다. 즉, 100개 정도 되는 샘플을
얻기 위해서 1초를 사용해야 합니다.
한편 GPU를 사용하는 경우에는 만개 단위의 샘플을 넣고
backtrack하는 것이 몇 초면 해결이 되지만, 이를 위해
몇 분을 기다리는 불상사가 생깁니다.

따라서 이 병목을 해결할 필요가 있습니다.

## 멀티쓰레드 지원받기

`mock5.py/gen-record` https://github.com/lumiknit/mock5.py/tree/main/gen-record

사실 여기서는 뭔가 굉장한 것은 없고,
C++ 위에서 OpenMP로 입력을 생성해냅니다.

In [39]:
%%file gen.cpp
/* gen.cpp
 * Omok record generator based on algorithmic moves for ML
 * Author: lumiknit (aasr4r4@gmail.com)
 *
 * The algorithm is based on mock5/analysis and mock5/agent_analysis_based.
 *
 * Compilation: YOU MUST ADD OPTION for OpenMP!
 * e.g. g++ gen.cpp -o gen -fopenmp
 * Usage: ./gen <OUT_FILE_NAME> <NUM_GAME> <HEIGHT> <WIDTH> <RANDOMNESS>
 *  - randomness is a float value in 0.0-1.0.
 *    larger value means agent choose more random moves
 *    0.0 means there is no random move,
 *    1.0 means every move is quite random
 *    Random moves may look like mistakes.
 *    In my experience, 0.05-0.15 is best for 11x11
 * Output Format:
 *  Each case begins with two integer '<WINNER> <NUM_MOVES>' separated by
 *  space. WINNER is 1 or 2. Following the first line, NUM_MOVES integers
 *  are given. Each moves are separated by newline. Moves are single integer,
 *  which is 'y * W + x'. e.g. In the board of width 11 and height 11,
 *  move 48 = 4 * 11 + 6 means the current player place a stone at
 *  (y, x) = (4, 6). Corrdinates are zero-based.
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#include <omp.h>

int M = 1, W = 15, H = 15;
float rnd = 0.01;

FILE *f;

#define V_5 0x400
#define V_O4 0x100
#define V_O3 0x40
#define V_4 0x10
#define V_3 0x4
#define V_2 0x1

float randf() {
  return (float) rand() / (float) RAND_MAX;
}

float dist(int idx) {
  float y = (idx / W) - (H - 1) / 2;
  y /= (float) H;
  if(y < 0) y = -y;
  float x = (idx % W) - (W - 1) / 2;
  x /= (float) W;
  if(x < 0) x = -x;
  return x > y ? x : y;
}

typedef struct State {
  char *bd;
  int *his;
  int *val[2][5];
  int n;

  int result;

  char q[7];
  char qp;
  int idx[7];
  int n_c[4];
  int dir;
} State;

int getNext(State *S) {
#define ON 8
  float ov[ON];
  int oi[ON];
  for(int i = 0; i < ON; i++) {
    ov[i] = -1000.0f;
    oi[i] = -1;
  }
  int op = 0;
  int fl = 0;
  int off = rand() % H * W;
  for(int k = 0; k < H * W; k++) {
    int i = (off + k) % (H * W);
    if(S->bd[i] == 0) {
      float score = 0;
      if(S->n <= 2) score -= dist(i) * 10;
      int m = S->val[S->n % 2][0][i];
      int o = S->val[(S->n + 1) % 2][0][i];
      score += m + o * 0.7f;
      score += randf() * 0.5f;
      if(score > ov[op]) {
        oi[op] = i;
        ov[op] = score;
        if(score >= V_O3) fl = 1;
        op = (op + 1) % ON;
      }
    }
  }
  if(randf() < rnd / 2) {
    int x = rand() % ON;
    while(oi[x] < 0) x = (x + 1) % ON;
    return oi[x];
  } else {
    if(fl || randf() >= rnd) {
      int mi = -1;
      float mv = -100.0f;
      for(int i = 0; i < ON; i++) {
        if(oi[i] >= 0 && mv < ov[i]) {
          mi = oi[i];
          mv = ov[i];
        }
      }
      return mi;
    } else {
      int x = rand() % ON;
      while(oi[x] < 0) x = (x + 1) % ON;
      return oi[x];
    }
  }
}

void resetQ(State *S) {
  for(int i = 0; i < 7; i++) {
    S->q[i] = 3;
  }
  S->qp = 0;
  S->n_c[0] = 0, S->n_c[1] = 0, S->n_c[2] = 0, S->n_c[3] = 5;
}

void mark(State *S, int idx, char c, int val) {
  int i = S->idx[(S->qp + idx) % 7];
  if(0 == (S->val[c - 1][S->dir][i] & val)) {
    S->val[c - 1][S->dir][i] |= val;
    S->val[c - 1][0][i] += val;
  }
}

#define Q_AT(X) (S->q[((X) + S->qp + 7) % 7])
void check5(State *S, char c) {
  for(int i = 1; i <= 5; i++)
    mark(S, i, c, V_5);
}

void check4(State *S, char c) {
  for(int i = 1; i <= 5; i++)
    mark(S, i, c, V_4);
  if(Q_AT(0) == 0 && Q_AT(5) == 0)
    for(int i = 1; i <= 4; i++)
      mark(S, i, c, V_O4);
  if(Q_AT(1) == 0 && Q_AT(6) == 0)
    for(int i = 2; i <= 5; i++)
      mark(S, i, c, V_O4);
}

void check3(State *S, char c) {
  for(int i = 1; i <= 5; i++)
    mark(S, i, c, V_3);
  if(Q_AT(1) != 0 || Q_AT(5) != 0) return;
  if(Q_AT(2) == 0) {
    if(Q_AT(0) == 0) {
      mark(S, 1, c, V_O3);
      mark(S, 2, c, V_O3);
    } else if(Q_AT(6) == 0) {
      mark(S, 2, c, V_O3);
    }
  } else if(Q_AT(4) == 0) {
    if(Q_AT(6) == 0) {
      mark(S, 4, c, V_O3);
      mark(S, 5, c, V_O3);
    } else if(Q_AT(0) == 0) {
      mark(S, 4, c, V_O3);
    }
  } else {
    if(Q_AT(0) == 0) {
      mark(S, 1, c, V_O3);
      mark(S, 3, c, V_O3);
    }
    if(Q_AT(6) == 0) {
      mark(S, 3, c, V_O3);
      mark(S, 5, c, V_O3);
    }
  }
}

void check2(State *S, char c) {
  for(int i = 1; i <= 5; i++) {
    mark(S, i, c, V_2);
  }
}

void pushAndCheck(State *S, int idx, char v) {
  int p = S->qp;
  S->qp = (S->qp + 1) % 7;
  S->q[p] = v;
  S->idx[p] = idx;
  S->n_c[S->q[(p + 6) % 7]]++;
  S->n_c[S->q[(p + 1) % 7]]--;

  /*
  printf("[");
  for(int i = 0; i < 7; i++) {
    printf(" %d", S->q[(i + S->qp) % 7]);
  }
  printf("] (");
  for(int i = 0; i < 4; i++) {
    printf(" %d", S->n_c[i]);
  }
  printf(")\n");
  */

  if(S->n_c[3] == 0) {
    char c = 0;
    if(S->n_c[2] == 0) c = 1;
    else if(S->n_c[1] == 0) c = 2;
    if(c > 0) {
      switch(S->n_c[c]) {
      case 5: S->result = c;
      case 4: check5(S, c); break;
      case 3: check4(S, c); break;
      case 2: check3(S, c); break;
      case 1: check2(S, c); break;
      }
    }
  }
}

void removeVal(State *S, int idx) {
  S->val[0][S->dir][idx] = 0;
  S->val[1][S->dir][idx] = 0;
  for(int i = 1; i <= 4; i++) {
    S->val[0][0][idx] += S->val[0][i][idx];
    S->val[1][0][idx] += S->val[1][i][idx];
  }
}

int placeStone(State *S, int idx) {
  int y = idx / W;
  int x = idx % W;
  S->bd[idx] = 1 + (S->n % 2);
  S->n++;

  S->result = 0;
  // Hori
  S->dir = 1;
  for(int i = 0; i < W; i++)
    removeVal(S, y * W + i);
  resetQ(S);
  for(int i = 0; i < W; i++)
    pushAndCheck(S, y * W + i, S->bd[y * W + i]);
  pushAndCheck(S, -1, 3);
  // Vert
  S->dir = 2;
  for(int i = 0; i < H; i++)
    removeVal(S, i * W + x);
  resetQ(S);
  for(int i = 0; i < H; i++)
    pushAndCheck(S, i * W + x, S->bd[i * W + x]);
  pushAndCheck(S, -1, 3);
  // RD-Diag
  S->dir = 3;
  int rd = x < y ? x : y;
  int rdx = x - rd;
  int rdy = y - rd;
  for(int i = 0; rdx + i < W && rdy + i < H; i++)
    removeVal(S, (rdy + i) * W + (rdx + i));
  resetQ(S);
  for(int i = 0; rdx + i < W && rdy + i < H; i++)
    pushAndCheck(S, (rdy + i) * W + (rdx + i),
        S->bd[(rdy + i) * W + (rdx + i)]);
  pushAndCheck(S, -1, 3);
  // LD-Diag
  S->dir = 4;
  int ld = (W - x - 1) < y ? (W - x - 1) : y;
  int ldx = x + ld;
  int ldy = y - ld;
  for(int i = 0; ldx - i >= 0 && ldy + i < H; i++)
    removeVal(S, (ldy + i) * W + (ldx - i));
  resetQ(S);
  for(int i = 0; ldx - i >= 0 && ldy + i < H; i++)
    pushAndCheck(S, (ldy + i) * W + (ldx - i),
        S->bd[(ldy + i) * W + (ldx - i)]);
  pushAndCheck(S, -1, 3);
  return S->result;
}

void runGame() {
  printf("Thread %d Start\n", omp_get_thread_num());

  State S;
  S.bd = (char*) malloc(sizeof(char) * W * H);
  S.his = (int*) malloc(sizeof(int) * W * H);
  for(int i = 0; i < 2; i++)
    for(int j = 0; j < 5; j++) {
      S.val[i][j] = (int*) malloc(sizeof(int) * W * H);
    }

  int finished = 0;

  while(1) {
    memset(S.bd, 0x00, sizeof(char) * W * H);
    for(int i = 0; i < 2; i++) {
      for(int j = 0; j < 5; j++) {
        memset(S.val[i][j], 0x00, sizeof(int) * W * H);
      }
    }
    S.n = 0;

    //printState(&S);
    while(S.n < W * H) {
      int idx = getNext(&S);
      S.his[S.n] = idx;
      finished = placeStone(&S, idx);
    //printState(&S);

      if(finished) {
#pragma omp critical(WRITE_FILE)
        {
          fprintf(f, "%d %d\n", S.result, S.n);
          for(int i = 0; i < S.n; i++) {
            fprintf(f, "%d\n", S.his[i]);
          }
        }
        goto L_DONE;
      } 
    }
  }
L_DONE:
  free(S.bd);
  free(S.his);
  for(int i = 0; i < 2; i++)
    for(int j = 0; j < 5; j++)
      free(S.val[i][j]);

  printf("Thread %d Finished\n", omp_get_thread_num());
}

int main(int argc, char **argv) {
  srand(time(NULL));
  printf("ARGC = %d\n", argc);
  if(argc < 6) {
    fprintf(stderr,
        "Usage: %s <OUT_NAME> <MAX> <HEIGHT> <WIDTH> <RND>\n",
        argv[0]);
    return 1;
  }
  M = atoi(argv[2]);
  H = atoi(argv[3]);
  W = atoi(argv[4]);
  rnd = atof(argv[5]);
  printf("OUT=%s MAX=%d H=%d W=%d RND=%f\n", argv[1], M, H, W, rnd);

  f = fopen(argv[1], "w");
  if(f == NULL) {
    fprintf(stderr, "Cannot open file %s\n", argv[1]);
    return 1;
  }

#pragma omp parallel for num_threads(32)
  for(int i = 0; i < M; i++) {
    runGame();
  }

  fclose(f);
  return 0;
}

Overwriting gen.cpp


In [40]:
!g++ -O2 -fopenmp -o gen gen.cpp

위 코드를 아래와 같이 실행하게 되면
10000개의 게임을 분석 기반에서 실행한 뒤,
`example.out`에 출력하게 됩니다.

In [44]:
%%time
!./gen example.out 40000 11 11 0.1 > /dev/null

CPU times: user 61.9 ms, sys: 28.9 ms, total: 90.8 ms
Wall time: 9.27 s


아래처럼 출력된 파일을 확인해보면,
(Google Colab 기준) 약 9.2초동안
3.8 MB, 1.2M개의 오목판을 만들어낸 것을 볼 수 있습니다.

In [46]:
!ls -lh example.out
!wc -l example.out

-rw-r--r-- 1 root root 3.8M Apr 26 06:08 example.out
1264206 example.out


파일의 구조는 아래와 같습니다.

- 각 게임의 첫 줄에는 2개의 정수 `n`, `m`이 주어지는데,
`n`은 둘 중 어느 플레이어가 이겼는지,
`m`은 총 몇 수를 두었는지 입니다.
- 이후 `m`개의 줄에는 돌을 둔 위치가 순서대로 나옵니다.
이 때 형식은 `y * W + x`로, 예를 들어서 $11 \times 11$ 오목판의 $(y, x) = (2, 7)$에 돌을 두었다면
`2 * 11 + 7 = 29`가 출력되게 됩니다.

In [47]:
!head -n 10 example.out

2 24
60
61
49
71
27
51
81
41
31


이를 이용해 `torch.tensor`로 바꾸는 함수는 다음과 같습니다.

`mock5.py/gen-record/read_record.py`

In [48]:
import torch

def read_record_from_file(h, w, filename):
  """Record (generated by gen.cpp) reader for torch

  Returns:
    X (torch.tensor(<NUM_SAMPLE>, 3, h, w)): one-hot-encoding version tensor
    Y (torch.tensor(<NUM_SAMPLE>, 1)): label. 1 = good, -1 = bad
  """
  X = []
  Y = []
  with open(filename) as file:
    left = 0
    v = 0
    for line in file:
      if left <= 0:
        z = [torch.ones(h, w, dtype=torch.float),
             torch.zeros(h, w, dtype=torch.float),
             torch.zeros(h, w, dtype=torch.float)]
        bd = torch.stack(z)
        v, left = map(int, line.split())
      else:
        idx = int(line)
        y = idx // w
        x = idx % w
        bd[0][y][x] = 0
        bd[v][y][x] = 1
        X.append(bd.clone())
        Y.append(torch.tensor([3 - v * 2], dtype=torch.float))
        v = 3 - v
        left -= 1
  Xs = torch.stack(X)
  Ys = torch.stack(Y)
  return Xs, Ys

In [49]:
%%time
X, Y = read_record_from_file(11, 11, "example.out")

CPU times: user 33 s, sys: 15.9 s, total: 48.9 s
Wall time: 48.8 s


In [50]:
print(X.shape)
print(Y.shape)

torch.Size([1224206, 3, 11, 11])
torch.Size([1224206, 1])


Google Colab 기준으로는 1.2M개의 보드판을 읽어오는데
약 49초 정도 걸렸습니다.
즉, 처음에 python에서 약 6k개의 데이터를 생성하는 동안에,
1.2M 보드판을 생성하고 torch.tensor에 올리는 것이 가능할 정도이므로, 대략 200배는 빠르다고 볼 수 있겠습니다.

파일에 쓰지 않더라도 별개의 프로세스로 생성해서
파이프로 내용을 전달받을 수도 있겠지만,
파일로 쓰면 추후 재활용하는 것도 가능하므로,
일단 이렇게 진행하도록 하겠습니다.