## Bubble Breaker in Python / Javascript

The key 'board' data structure is a ```numpy``` array, which is (for efficiency) stored on its side (with the bottom-right phone cell being the board[0,0] cell):

In [1]:
import numpy as np
from model.game import crush
from model.game import crush_ui
reload(crush)
reload(crush_ui)

crush.new_board(10, 14, n_colours=5)

Key trick for this notebook is the ability for Jupyter to go 'round-trip' from Python back-end to Javascript in the browser, and back again.  There's a block of helper javascript in the (Python) file ```crush-ui```:


In [2]:
from IPython.display import HTML

#HTML(crush_ui.javascript_test)
HTML(crush_ui.javascript_base)

Having imported that base code, we can now create UI elements for javascript to manipulate :

In [3]:
javascript = """
<div id="board_10_14_trial"></div>
<script type="text/Javascript">create_board("#board_10_14_trial",10,14,5);</script>
"""
HTML(javascript)

And, now initialise a board and display it:

In [4]:
board = crush.new_board(10, 14, n_colours=5)
HTML(crush_ui.display_via_javascript_script("#board_10_14_trial", board))

But - because of the Python-javascript-Python roundtripping - you can now play the game (click on linked cells)!

Once you run out of moves to do, the game is over.  You can restart it by refreshing the board generation cell above.

## Smaller Board (for training NOW)

In [34]:
width, height, n_colours = 5,8,4
javascript = """
<div id="board_small"></div>
<script type="text/Javascript">create_board("#board_small",%d,%d,%d);</script>
""" % (width, height, n_colours)
HTML(javascript)

In [35]:
board = crush.new_board(width, height, n_colours=n_colours)
HTML(crush_ui.display_via_javascript_script("#board_small", board))

Now, there's quite a lot of machinery required to do Q() learning.  So we'll take it one step at a time.

### Convert a Board to Features

In [9]:
def make_features_in_layers(board):
  feature_layers = [] # These are effectively 'colours' for the CNN
  
  mask     = np.greater( board[:, :], 0 )*1.
  feature_layers.append( mask.astype('float32') )
  
  # This works out whether each cell is the same as the cell 'above it'
  for shift_down in [1,2,3,4,5,]:
    sameness = np.zeros_like(board, dtype='float32')
    sameness[:,:-shift_down] = np.equal( board[:, :-shift_down], board[:, shift_down:] )*1.
    feature_layers.append( sameness )
  
  # This works out whether each cell is the same as the cell in to columns 'to the left of it'
  for shift_right in [1,2,3,]:
    sameness = np.zeros_like(board, dtype='float32')
    sameness[:-shift_right,:] = np.equal( board[:-shift_right, :], board[shift_right:, :] )*1.
    feature_layers.append( sameness )
  
  stacked = np.dstack( feature_layers )
  return np.rollaxis( stacked, 2, 0 )

features_shape = make_features_in_layers(board).shape
print("('feature layers', width, height) : %s" %(features_shape, ))

### Build the CNN to Evaluate the Board's Features

In [27]:
import theano

import lasagne
import pickle

def build_cnn(input_var, features_shape):
    # Create a CNN of two convolution layers and a fully-connected hidden layer in front of the output layer
    
    lasagne.random.set_rng( np.random )

    # input layer
    network = lasagne.layers.InputLayer(shape=(None, features_shape[0], features_shape[1], features_shape[2]), input_var=input_var)

    # Two convolutional layers (no dropout, no pooling)
    network = lasagne.layers.Conv2DLayer(
      network, num_filters=32, filter_size=(3,3),
      nonlinearity=lasagne.nonlinearities.rectify,
      W=lasagne.init.GlorotUniform(),
    )
    
    network = lasagne.layers.Conv2DLayer(
      network, num_filters=16, filter_size=(3,3),
      nonlinearity=lasagne.nonlinearities.rectify,
    )

    # Two fully-connected layers - leading to ONE output value : the Q(features(board))
    network = lasagne.layers.DenseLayer(
      network, num_units=32,
      nonlinearity=lasagne.nonlinearities.rectify,
    )

    network = lasagne.layers.DenseLayer(
      network, num_units=1,
      nonlinearity=lasagne.nonlinearities.linear,
    )

    return network

board_input = theano.tensor.tensor4('inputs')
board_score = theano.tensor.vector('targets')

np.random.seed(0) # This is for the initialisation inside the CNN
model=build_cnn(board_input, features_shape)

# This is for running the model (training, etc)
estimate_q_value = lasagne.layers.get_output(model)  # 'running'
model_evaluate_features = theano.function([board_input], estimate_q_value)

# This is for repeatedly testing the model (deterministic)
predict_q_value  = lasagne.layers.get_output(model, deterministic=True)
model_evaluate_features_deterministic = theano.function([board_input], predict_q_value)

# This is used for training
model_squared_error = lasagne.objectives.squared_error(estimate_q_value.reshape( (-1,) ), board_score).mean()

model_params  = lasagne.layers.get_all_params(model, trainable=True)

#model_updates = lasagne.updates.nesterov_momentum( model_squared_error, model_params, learning_rate=0.01, momentum=0.9 )
model_updates = lasagne.updates.adam( model_squared_error, model_params )
#model_updates = lasagne.updates.rmsprop( model_squared_error, model_params ) 

model_train = theano.function([board_input, board_score], model_squared_error, updates=model_updates)

model_params

### Logic to run 1 game

And a 'test()' function that can evaluate the current network, by running a set of 10 fixed games deterministically.

In [11]:
def play_game(game_id, model, per_step_discount_factor=0.95, prob_exploration=0.1, capture_boards=None):
  training_data = dict( board=[], target=[])
  
  np.random.seed(game_id)
  board = crush.new_board(width, height, n_colours) # Same as portrait phone  1 screen~1k,  high-score~14k

  score_total, new_cols_total, moves_total, game_step = 0,0,0,0
  while True: 
    if capture_boards:
      capture_boards(board)

    moves = crush.potential_moves(board)
    moves_total += len(moves)
    
    if len(moves)==0:
      # Need to add a training example : This is a zero-score outcome
      training_data['board'].append( make_features_in_layers(board) )
      training_data['target'].append( 0. )
      
      break

    # Let's find the highest-scoring of those moves:  First, get all the features
    next_step_features = []
    next_step_target = []
    for (h,v) in moves:  # [0:2]
      b, score, n_cols = crush.after_move(board, h,v, -1)  # Added columns are unknown
      
      next_step_features.append( make_features_in_layers(b) )
      #next_step_target.append( score )
      next_step_target.append( n_cols )
      
    # Now evaluate the Q() values of the resulting postion for each possible move in one go
    all_features = np.array(next_step_features)  # , dtype='float32'
    
    remember_training, i = False, -1
    if prob_exploration<0:  # This is testing only - just need to pick the best move
      next_step_q = model_evaluate_features_deterministic( all_features )
    else:
      if np.random.uniform(0.0, 1.0)<prob_exploration:
        ## Choose a random move, and do it
        i = np.random.randint( len(moves) )
      else:
        next_step_q = model_evaluate_features( all_features )
        remember_training=True

    if i<0:
      next_step_aggregate = ( np.array( next_step_target, dtype='float32') + 
                              per_step_discount_factor * next_step_q.flatten() )
      i = np.argmax( next_step_aggregate )
    
    (h,v) = moves[i]
    
    #print("Move : (%2d,%2d)" % (h,v))
    #crush.show_board(board, highlight=(h,v))
    
    if remember_training:  # Only collect training data if not testing
      training_data['board'].append( make_features_in_layers(board) )
      # This value includes a Q() that looks at the 'blank cols', rather than the actuals
      training_data['target'].append( next_step_aggregate[i] )   
    
    board, score, new_cols = crush.after_move(board, h,v, n_colours)  # Now we do the move 'for real'
    
    score_total += score
    new_cols_total += new_cols
    
    #print("Move[%2d]=(%2d,%2d) -> Score : %3d, new_cols=%1d" % (i, h,v, score,new_cols))
    #crush.show_board(board, highlight=(0,0))

    game_step += 1
    
  stats=dict( 
    steps=game_step, av_potential_moves=float(moves_total) / game_step, 
    score=score_total, new_cols=new_cols_total 
  )
  return stats, training_data

def stats_aggregates(log, prefix, last=None):
  stats_cols = "steps av_potential_moves new_cols score model_err".split()
  if last:
    stats_overall = np.array([ [s[c] for c in stats_cols] for s in log[-last:] ])
  else:
    stats_overall = np.array([ [s[c] for c in stats_cols] for s in log ])

  print(prefix + "    #steps   #moves_av  new_cols   score   model_err")
  print(" Min  : ", ["%6.1f" % (v,) for v in np.min(stats_overall, axis=0).tolist()] )
  print(" Max  : ", ["%6.1f" % (v,) for v in np.max(stats_overall, axis=0).tolist()] )
  print(" Mean : ", ["%6.1f" % (v,) for v in np.mean(stats_overall, axis=0).tolist()] )
  
def run_test(i):
  # Run a test set of 10 games (not in training examples
  stats_test_log=[]
  for j in range(0,10):
    stats_test, _ = play_game(1000*1000*1000+j, model, prob_exploration=-1.0)  
    stats_test['model_err'] = -999.
    stats_test_log.append( stats_test )

  stats_aggregates(stats_test_log, "=Test[%5d]" % (i,))

# Initial run, testing the score of an untrained network
run_test(0)

### Ready to Train the Network...

In [13]:
import datetime
t0,i0 = datetime.datetime.now(),0
t_start=t0

#n_games=50*1000
n_games=1*1000
batchsize=512

stats_log=[]
training_data=dict( board=[], target=[])
for i in range(0, n_games):
  stats, training_data_new = play_game(i, model)
  
  if False:
    print("game[%d]" % (i,))
    print("  steps         = %d" % (stats['steps'],))
    print("  average moves = %5.1f" % (stats['av_potential_moves'], ) )
    print("  new_cols      = %d" % (stats['new_cols'],))
    print("  score_total   = %d" % (stats['score'],))
  
  training_data['board'] += training_data_new['board']
  training_data['target'] += training_data_new['target']

  # This keeps the window from growing too big
  if len(training_data['target'])>batchsize*2:
    training_data['board'] = training_data['board'][-batchsize:]
    training_data['target'] = training_data['target'][-batchsize:]

  for iter in range(0,1):
    err = model_train( training_data['board'][-batchsize:], training_data['target'][-batchsize:] )
  
  stats['model_err'] = err
  
  stats_log.append( stats )
  
  if ((i+1) % 10)==0:
    t_now = datetime.datetime.now()
    t_elapsed = (t_now - t0).total_seconds()
    t_end_projected = t0 + datetime.timedelta( seconds=(n_games-i0) * (t_elapsed/(i-i0)) )
    print("    100 games in %6.1f seconds, Projected end at : %s, stored_data.length=%d" % 
           (100.*t_elapsed/(i-i0), t_end_projected.strftime("%H:%M"), len(training_data['target']), ))
    t0, i0 = datetime.datetime.now(), i
    
  if ((i+1) % 100)==0:
    stats_aggregates(stats_log, "Train[%5d]" % (i,), last=1000)

  if ((i+1) % 100)==0:
    run_test(i)

stats_aggregates(stats_log, "FINAL[%5d]" % (n_games,) )

### Using the model - let's see how it plays

In [17]:
javascript = """
<div id="board_small_watch"></div>
<script type="text/Javascript">create_board("#board_small_watch",%d,%d,%d);</script>
""" % (width, height, n_colours)
HTML(javascript)

In [25]:
seed = 5
board = crush.new_board(width, height, n_colours=n_colours)
boards_for_game=[]
def capture_boards(b): boards_for_game.append(b)
stats, _ = play_game(seed, model, capture_boards=capture_boards)
HTML( crush_ui.display_gameplay("#board_small_watch", boards_for_game, 0.1) )

## Now for a Pretrained Network
This was trained in ~5 hours using a Titan X GPU.  

However, the GPU speed-up factor isn't very large here (2-3x), since a lot of the training time is spent in the Python game-play and feature generation code.  Moreover, the network isn't very large, so the GPU speed is dominated by PCI transfer times.

### Loading the pre-trained model

In [26]:
with open('data/game/rl_10x14x5_2016-06-21_03-27.049999.pkl', 'rb') as f:
  param_dictionary=pickle.load(f)

#width, height, n_colours = 10,14,5
width, height, n_colours = ( param_dictionary[k] for k in 'width height n_colours'.split() )
board = crush.new_board(width, height, n_colours=n_colours)
features_shape = make_features_in_layers(board).shape
print("('feature layers', width, height) : %s" %(features_shape, ))

Now re-execute the cell above that defines ```model=``` (in the cell 'Build the CNN to...'), since the other variables are dependent on it, and it just changed size.

### Loading the Parameters into the resized model

In [28]:
lasagne.layers.set_all_param_values(model, param_dictionary['param_values'])

In [29]:
javascript = """
<div id="board_10_14"></div>
<script type="text/Javascript">
create_board("#board_10_14",%d,%d,%d);
</script>
""" % (width, height, n_colours)
HTML(javascript)

In [33]:
seed = 1
board = crush.new_board(width, height, n_colours=n_colours)
boards_for_game=[]
def capture_boards(b): boards_for_game.append(b)
stats, _ = play_game(seed, model, capture_boards=capture_boards)
HTML( crush_ui.display_gameplay("#board_10_14", boards_for_game, 0.1) )