# Extracting Game Features

*RS Turley 8/3/2020*

A key feature in any strategy concerns the worth of a ship. In order to make decisions regarding spawning, attack, sacrifice, etc., we need some concept of the value of a ship as measured by its halite production or accomplishments in attacking or defending. This notebook is built to extract features from high level games played by others to estimate the value of a ship given various conditions.

### Libraries

In [1]:
import pandas as pd
import numpy as np
from scipy import signal
import os
import json
#from kaggle_environments.envs.halite.helpers import *

In [2]:
# to better display 21x21 matrices of halite
np.set_printoptions(linewidth=180,precision=1)

### Data

In [3]:
SCRAPED_DIR = '../kaggle_games/'
MIN_AVG_RATING = 950
episodes_df_file = 'episodes_df.pkl'

In [4]:
episode_files = []
episode_files_full = []
for dirpath, subdirs, files in os.walk(SCRAPED_DIR):
    for x in files:
        if x.endswith(".json"):
            episode_files_full.append(os.path.join(dirpath, x))
            episode_files.append(x)

episode_files = [ef.split('.')[0] for ef in episode_files]
episode_files_df = pd.DataFrame(episode_files_full,index=episode_files)
episode_files_df.drop( episode_files_df.index[[not episode_files.isdigit() for episode_files in episode_files_df.index]],inplace=True)
episode_files_df.index = [int(f) for f in episode_files_df.index]
episode_files_df = episode_files_df.loc[~episode_files_df.index.duplicated(keep='first')]
print('{} games in existing library'.format(len(episode_files_df)))

21214 games in existing library


In [5]:
all_episodes_df = pd.read_pickle(SCRAPED_DIR+episodes_df_file)
library_episodes_df = all_episodes_df.loc[all_episodes_df.id.isin(episode_files_df.index)]
select_episodes_df = library_episodes_df.drop(library_episodes_df.loc[library_episodes_df['avg_score'].values < MIN_AVG_RATING].index, inplace = False)
select_episodes_df.sort_values('avg_score', inplace = True)
select_episodes_df['file_location'] = episode_files_df.loc[select_episodes_df['id']].values
print('{} games with qualifying avg_score>'.format(len(select_episodes_df)))

19272 games with qualifying avg_score>


# Helpful Functions

The following functions will be helpful in analyzing game boards and player behavior:

* **cardinal_distance(start_point, end_point)** calculates the distance between two points with the output being the tuple `(dist_west2east , dist_north2south)`, where the first element represents the number of steps going eastward (negative values mean westward) and the number of steps southward (or negative meaning to the north).

* **make_cardinal_distance_list()** creates a list with precalculated cardinal distances for all points on the board. Very fast!

* **cardinal_distance_list[start_point][end_point]** is a list, not a function. It is created by `make_cardinal_distance_list` and outputs the cardinal distances as steps east and steps south as explained above.

* **destination_cell(start_point, move_distance)** is the reverse of the above functions. The inputs are a start point and a tuple representing steps eastward and southward, `(dist_west2east , dist_north2south)`, and the output is the end point on the map. All points are in the raw index fashion used on the game board.

* **distance_kernel(maxdist)** creates a matrix that can be used in a convulution with the game board that sums all squares with distance `maxdist`

* **halite_sum_matrix(halitedata, maxdist)** with an input list of halite values for the 441 cells on the board, this function returns the sum of all halite within `maxdist` of each cell 


In [6]:
# make a distance list that preserves the shortest travel route


def cardinal_distance(start_point,end_point,boardsize=21):
    # returns the distance needed to travel across a wrapped board of size [boardsize] where the 
    # first output is the west to east distance (or a negative value if faster to travel westbound)
    # and the second output is the north to south distance (or a negative value if shorter to 
    # travel southbound.
    #
    # The inputs, start_point and end_point are expected to be integers where value zero is the northwest
    # point on the board and value boardsize*boardsize-1 is the southeast point on the board.
    
    # Calculate the distance traveling east (1st element) or west (2nd element)
    dist_west2east = ((end_point - start_point) % boardsize, 
                      (boardsize - ( (end_point - start_point) % boardsize) ))
    # return the signed minimum distance, negative values means travel west
    dist_west2east = min(dist_west2east)*(-1)**dist_west2east.index(min(dist_west2east))

    # Calculate the distance traveling south (1st element) or north (2nd element)
    dist_north2south = ((end_point//boardsize - start_point//boardsize) % boardsize, 
                        ( boardsize - ( (end_point//boardsize - start_point//boardsize) % boardsize) ))
    # return the signed minimum distance, where negative values mean traveling north
    dist_north2south = min(dist_north2south)*(-1)**dist_north2south.index(min(dist_north2south))

    return dist_west2east, dist_north2south

def make_cardinal_distance_list(boardsize=21):
    startpoints = np.arange(boardsize**2)
    endpoints = np.arange(boardsize**2)
    cardinal_distance_list = []
    for start_point in startpoints:
        cardinal_distance_list.append([cardinal_distance(start_point,end_point) for end_point in endpoints])
    return cardinal_distance_list

cardinal_distance_list = make_cardinal_distance_list()

def destination_cell(start_point, move_distance = (0,0), boardsize=21):
    # returns the destination cell for a move distance tuple orderd in terms of
    # (move_west2east, move_north2south) so that a value of (1,3) moves on cell east and 3 cells
    # south and a value of (-3,-2) represents the cell that is 3 cells west and 2 cells to the north
    return ((start_point + move_distance[0]) % boardsize + 
            ((start_point//boardsize + move_distance[1])%boardsize) * boardsize)
    


In [7]:
def distance_kernel(maxdist):
    # Creates a matrix of ones where the distance from center is less than
    # or equal to maxdist 
    # e.g. distance_kernel(2) = [[0,1,0],[1,1,1],[0,1,0]]
    kernelmat = np.zeros((1+2*maxdist,1+2*maxdist))
    for i in range(1+2*maxdist):
        for j in range(1+2*maxdist):
            if abs(i-maxdist)+abs(j-maxdist)<=maxdist:
                kernelmat[i,j] = 1
    return kernelmat

def halite_sum_matrix(halite_data, maxdist = 0, boardsize = 21):
    # Creates a matrix from the stepdata representing the board
    # where each cell has the sum of all halite within maxdist.
    # The halite data is the raw format =(stepdata[0]['observation']['halite'])
    # halite_sum_matrix(stepdata, maxdist = 0, boardsize = 21)
    halite_matrix = np.reshape(halite_data,(boardsize,boardsize))
    return signal.convolve2d(halite_matrix, 
                             distance_kernel(maxdist), 
                             mode='same', boundary='wrap', fillvalue=0)

def halite_sum_array(halite_data, maxdist = 0, boardsize = 21):
    return np.reshape(halite_sum_matrix(halite_data, maxdist, boardsize),(boardsize*boardsize))

### Create pattern recognition scheme for distance of 3 around a ship

The space around a ship has 24 cells within a distance of three. We will number them in the order below so that the smallest numbers represent the closest distance. The chart below shows the array indices used in saving the information.

 

|   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|
|   |   |   |13 |   |   |   |
|   |   |24 | 5 |14 |   |   |
|   |23 |12 | 1 | 6 |15 |   |
|22 |11 | 4 | 0 | 2 | 7 |16 |
|   |21 |10 | 3 | 8 |17 |   |
|   |   |20 | 9 |18 |   |   |
|   |   |   |19 |   |   |   |


The features in this pattern will define the ships or shipyards that could be present:

0. None
1. Friendly shipyard
2. Friendly ship
3. Enemy shipyard
4. "Fat" enemy ship carrying more halite
5. Lean enemy ship carrying less or equal halite

Since location zero has the ship in question, we will focus on the contents of the other 24 spaces within a distance of three. Every pattern can be represented by an integer between zero and 6^24 (approximately equal to the maximum 64 bit "big int" 2^63). We note that these patterns are functionally identical by symmetry and reflection. The following functions make these calculations:

* **cells_in_distance(location, max_distance)** returns the index references for all cells within a specific distance from the location
* **nearby_features(location, max_distance)** returns a 25-element `nearby_features_array` with the features labeled in the description above. For example, a value of 2 in array index 3 indicates that a friendly ship in immediately south
* **rotational_equivalents(nearby_features_array)** returns the 8 features arrays that are rotationally equivalent to the `nearby_features_array`
* **nearby_feature_code(nearby_features_array)** transforms a 25-element `nearby_features_array` to a unique 64-bit integer

In [8]:
def cells_in_distance(location, max_distance=1):
    # returns the index references for all cells within the specific distance
    celllist = [location]
    if max_distance>0:
        # for each distance from 1 to max_distance, find all cells
        for dist in range(1,1+max_distance):
            # rotate around all the combinations of west-east and north-wouth adjustments that equal distance=dist
            west2east = np.concatenate( (np.arange(0,dist,step=1),np.arange(dist,-dist,step=-1),np.arange(-dist,0,step=1)))
            north2south = np.concatenate( (np.arange(-dist,dist,step=1),np.arange(dist,-dist,step=-1)))
            for idx in range(len(west2east)):
                celllist.append(destination_cell(location,( west2east[idx],north2south[idx])))
    return celllist

In [9]:
def nearby_features(ship_id, team, max_distance=3):
    # outputs the pattern value for the distance around the ship
    ship_location = stepdata[0]['observation']['players'][team][2][ship_id][0]
    ship_halite = stepdata[0]['observation']['players'][team][2][ship_id][1]
    nearby_cells = cells_in_distance(ship_location,max_distance)
    nearby_features_array = np.zeros(shape=len(nearby_cells),dtype=np.uint64)
    
    # 1 - Friendly shipyard
    for shipyard_location in list(stepdata[0]['observation']['players'][team][1].values()):
        if shipyard_location in nearby_cells:
            nearby_features_array[nearby_cells.index(shipyard_location)] = 1
        
    # 2 - Friendly ships
    friendly_ship_locations = [shipdata[0] for shipdata in list(stepdata[0]['observation']['players'][team][2].values())]
    for friendly_ship_location in friendly_ship_locations:
        if friendly_ship_location in nearby_cells and friendly_ship_location is not ship_location:
            nearby_features_array[nearby_cells.index(friendly_ship_location)] = 2
    
    
    for other_team in range(4):
        if other_team is not team:
    
            # 3 - Enemy shipyards
            for shipyard_location in list(stepdata[0]['observation']['players'][other_team][1].values()):
                if shipyard_location in nearby_cells:
                    nearby_features_array[nearby_cells.index(shipyard_location)] = 3
                    
            # 4 - Enemy fat ships
            enemy_fleet_data = list(stepdata[0]['observation']['players'][other_team][2].values())
            for enemy_ship_data in enemy_fleet_data:
                if (enemy_ship_data[0] in nearby_cells) and (enemy_ship_data[1] > ship_halite):
                    nearby_features_array[nearby_cells.index(enemy_ship_data[0])] = 4
                    
            # 5 - Enemy lean ships
            for enemy_ship_data in enemy_fleet_data:
                if (enemy_ship_data[0] in nearby_cells) and (enemy_ship_data[1] <= ship_halite):
                    nearby_features_array[nearby_cells.index(enemy_ship_data[0])] = 5
                    
    return nearby_features_array

In [10]:
def rotational_equivalents(nearby_features_array):
    # returns indices for "nearby_features" indices that are strategically equivalent, 
    # simply rotated or reflected game boards
    # order of manipulations is: rotation counter-clockwise once, twice, three-times, 
    # mirror left/right, mirror up/down, mirror nortwest/southest, mirror northeast/southwest
    
    rotations = np.array([[0,0,0,0,0,0,0,0],
                            [1,2,3,4,1,3,2,4],
                            [2,3,4,1,4,2,1,3],
                            [3,4,1,2,3,1,4,2],
                            [4,1,2,3,2,4,3,1],
                            [5,7,9,11,5,9,7,11],
                            [6,8,10,12,12,8,6,10],
                            [7,9,11,5,11,7,5,9],
                            [8,10,12,6,10,6,12,8],
                            [9,11,5,7,9,5,11,7],
                            [10,12,6,8,8,12,10,6],
                            [11,5,7,9,7,11,9,5],
                            [12,6,8,10,6,10,8,12],
                            [13,16,19,22,13,19,16,22],
                            [14,17,20,23,24,18,15,21],
                            [15,18,21,24,23,17,14,20],
                            [16,19,22,13,22,16,13,19],
                            [17,20,23,14,21,15,24,18],
                            [18,21,24,15,20,14,23,17],
                            [19,22,13,16,19,13,22,16],
                            [20,23,14,17,18,24,21,15],
                            [21,24,15,18,17,23,20,14],
                            [22,13,16,19,16,22,19,13],
                            [23,14,17,20,15,21,18,24],
                            [24,15,18,21,14,20,17,23]])
    
    rotational_indices = []
    for r in range(8):
        rotational_indices.append( [nearby_features_array[rotations[idx,r]] for idx in range(25)] )
    
    return rotational_indices

In [11]:
# transform the nearby cells into a single integer
def nearby_feature_code(nearby_features_array):
    return np.sum( np.multiply(nearby_features_array[1:],     # skip first element which contains the ship itself
                               np.power(6, np.arange(0,24,dtype=np.uint64))))

### Examples of the helpful functions

In [12]:
# Examples of cardinal_distance_list

# start at point (0,19), aka index #19, and end at point (2,3), aka index #45
start_point = 19
end_point = 45
print('The distance from cell#19 to cell#45')
print(cardinal_distance_list[start_point][end_point])
print(' ')

# start at point (2,23), aka index #45, and end at point (0,19), aka index #19
start_point = 45
end_point = 19
print('The distance from cell#45 to cell#19')
print(cardinal_distance_list[start_point][end_point])

The distance from cell#19 to cell#45
(5, 2)
 
The distance from cell#45 to cell#19
(-5, -2)


In [13]:
# Example of destination_cell

start_point = 19 # start at cell 19
move_dist = (5,2) # move 5 cells eastward and 2 cells southward

print('Destination that starts from cell 19 and moves 5 cells east and 2 cells south:')
print(destination_cell(start_point,move_dist))

Destination that starts from cell 19 and moves 5 cells east and 2 cells south:
45


In [15]:
print('Using game file:')
print(select_episodes_df.iloc[0]['file_location'])
print(' ')

with open(select_episodes_df.iloc[0]['file_location']) as json_file:
    gamedata = json.load(json_file)

print('At step 109 the halite looks like this')
stepnum = 109
boardsize = 21
stepdata = gamedata['steps'][stepnum]
halite_data = stepdata[0]['observation']['halite']
print(np.reshape(halite_data,(boardsize,boardsize)))
print(' ')
print('The sum of halite within distance 1')
print(halite_sum_matrix(halite_data, maxdist = 1, boardsize = 21))

Using game file:
../kaggle_games/run_31/1176808.json
 
At step 109 the halite looks like this
[[ 27.6   0.    0.   16.4   0.   15.5  15.9   0.    0.    0.    0.    0.    0.    0.   66.3  42.    0.   17.7   0.    0.   26.6]
 [ 24.   25.8   0.   18.2  17.5   0.    0.    0.    0.   17.3   0.   16.7   0.    0.    0.    0.   18.5  16.4   0.   19.3  18.5]
 [ 27.5  23.8   0.    0.   18.   16.4   0.    0.   58.1   0.    0.    0.   19.5   0.    0.   14.4  18.    0.    0.   21.7  60.9]
 [  0.    0.   54.3  14.2   0.    0.    0.   69.3   0.    0.    0.    0.    0.   16.2   0.    0.    0.   73.9  70.3   0.    0. ]
 [  0.    0.    0.    0.    0.    0.    0.    0.    0.   63.    0.   25.7   0.    0.    0.    0.    0.    0.    0.    0.    0. ]
 [  0.    0.   26.8   0.   23.8   0.    0.   22.   49.    0.   31.4   0.   21.5  18.2   0.    0.   78.4   0.   70.4   0.    0. ]
 [  0.   23.6   0.    0.    0.    0.    0.   16.3   0.    0.   33.2   0.    0.   17.    0.    0.    0.    0.    0.   65.7   0. ]
 [ 

# Download data for one game

To test out the code, here is an example of a download of the steps and ships for one game.

In [None]:
np.power(6, np.arange(0,24,dtype=np.uint64))

In [None]:
board = Board(gamedata['steps'][stepnum][0]['observation'],gamedata['configuration'])
print(board)

In [None]:
len(nearby_cells_contents[1:])

In [None]:
np.sum( np.multiply(nearby_cells_contents[1:],
                               np.power(6, np.arange(0,24,dtype=np.uint64))))

In [None]:
[shipdata[0] for shipdata in list(stepdata[0]['observation']['players'][team][2].values())]

In [None]:
friendly_ship_locations = [shipdata[0] for shipdata in list(stepdata[0]['observation']['players'][team][2].values())]
for friendly_ship_location in friendly_ship_locations:
    if friendly_ship_location in nearby_cells and friendly_ship_location is not ship_location:
        nearby_cells_contents[nearby_cells.index(friendly_ship_location)] = 2
nearby_cells_contents

In [None]:
stepdata[0]['observation']['players']

In [None]:
ship_data = list(stepdata[0]['observation']['players'][0][2].values())[0]
print(ship_data)

In [None]:
stepnum = 109
stepdata = gamedata['steps'][stepnum]


#old_ship_names = new_ship_names.copy()
#old_ship_location = new_ship_location.copy()
#old_ship_halite = new_ship_halite.copy()
new_ship_data = list(stepdata[0]['observation']['players'][0][2].values())
new_ship_names = list(stepdata[0]['observation']['players'][0][2].keys())
new_ship_location = [sdata[0] for sdata in new_ship_data]
new_ship_halite = [sdata[1] for sdata in new_ship_data]

In [None]:
new_ship_data

## Build data for each episode

The target variable we hope to predict is ship value. This is defined as the quantity of halite delivered to shipyards, with an additional credit of +500 for destroying a shipyard or saving a friendly shipyard

We could derive a formula or calibrate this to actual game data as a function of the following factors:

Factors that affect ship value
* steps left in game
* total halite on the board
* halite in reachable distance on board
* location of the ship
* ability to defend loss of shipyard
* expected life before being destroyed by enemy
* ability to destroy enemy shipyard (+500)
* ability to protect home base (+500)


### characteristics each step
* add one if alive another step
* add as much halite was produced
* add enemy ship kills
* add shipyard protection kill
* mark if killed enemy shipyard
* given halite in distance 1,2,3,4,5
* given enemy ships in distance 1,2,3
* give friendly ships in distance 1,2,3
* give closest distance to friendly shipyard
* total halite on board
* give board patterns

| Type of Variable           | Name               | type   | description                                              |
| -------------------------- | ------------------ | ------ | -------------------------------------------------------- |
| Identification             | episode            | int    | episode ID number                                        |
|                            | step               | it     | step number (0-399)                                      |
|                            | agentID            | int    | agentID number for this step                             |
|                            | shipID             | string | shipID string used in game                               |
|                            | altShipID          | int    | alternate numerical ID for each ship                     |
| Achievement                | futureLife         | int    | number of future steps ship stays alive                  |
|                            | haliteDelivered    | float  | quantity of halite delivered to friendly base            |
|                            | shipKills          | int    | number of enemy ships destroyed                          |
|                            | shipyardKill       | int    | \=1 if destroyed enemy shipyard                          |
|                            | shipyardProtect    | int    | number of times killed enemy attacking friendly shipyard |
| Current: Halite on Board   | haliteBoard        | float  | sum of all halite on board                               |
|                            | haliteDist0        | float  | total quantity of halite at distance 0                   |
|                            | haliteDist1        | float  | total quantity of halite at distance 1                   |
|                            | haliteDist2        | float  | total quantity of halite at distance 2                   |
|                            | haliteDist3        | float  | total quantity of halite at distance 3                   |
|                            | haliteDist4        | float  | total quantity of halite at distance 4                   |
|                            | haliteDist5        | float  | total quantity of halite at distance 5                   |
|                            | haliteDist6        | float  | total quantity of halite at distance 6                   |
| Current: Fat Enemies Near  | enemyFatDist1      | int    | number enemy ships carrying more halite at distance 1    |
|                            | enemyFatDist2      | int    | number enemy ships carrying more halite at distance 2    |
|                            | enemyFatDist3      | int    | number enemy ships carrying more halite at distance 3    |
| Current: Lean Enemies Near | enemyLeanDistance1 | int    | number enemy ships carrying more halite at distance 1    |
|                            | enemyLeanDist2     | int    | number enemy ships carrying more halite at distance 2    |
|                            | enemyLeanDist3     | int    | number enemy ships carrying more halite at distance 3    |

In [None]:
select_episodes_df.iloc[0]['file_location']

In [None]:
with open(select_episodes_df.iloc[0]['file_location']) as json_file:
    gamedata = json.load(json_file)

In [None]:
gamedata['configuration']

In [None]:
for stepnum,stepdata in enumerate(gamedata['steps']):
    #stepdata[0]['observation']['players'][3]  # player 4 ships
    #p4shiplist.append()

In [None]:
stepnum = 109
stepdata = gamedata['steps'][stepnum]
old_ship_names = new_ship_names.copy()
old_ship_location = new_ship_location.copy()
old_ship_halite = new_ship_halite.copy()
new_ship_data = list(stepdata[0]['observation']['players'][0][2].values())
new_ship_names = list(stepdata[0]['observation']['players'][0][2].keys())
new_ship_location = [sdata[0] for sdata in new_ship_data]
new_ship_halite = [sdata[1] for sdata in new_ship_data]

In [None]:
halite_matrix = np.reshape(gamedata['steps'][stepnum][0]['observation']['halite'], 
                           (gamedata['configuration']['size'],
                            gamedata['configuration']['size']))

In [None]:
# we can use the SDK if helpful
environment = make("halite", configuration={"size": 21, "startingHalite": 500})
agent_count = 4
environment.reset(agent_count)
state = environment.state[0]

board = Board(gamedata['steps'][stepnum][0]['observation'],gamedata['configuration'])


In [None]:
print(board)

In [None]:
print(board.ships)
print(board.ships['3-1'].position)

In [None]:
boardsize = 21
enemy_ship_carry = np.zeros(boardsize*boardsize,dtype=int)
playerlist = [0,1,2,3]
current_player = 2
enemy_players = playerlist.copy()
enemy_players.remove(current_player)
for player in enemy_players:
    for shipdata in list(stepdata[0]['observation']['players'][0][2].values()):
        enemy_ship_carry[shipdata[0]] = shipdata[1]

In [None]:
enemy_carry_matrix = np.reshape(enemy_ship_carry,(boardsize,boardsize))
enemy_carry_matrix

In [None]:
# CODE TESTING

In [None]:
A = np.array([[1,2,3,4,5],[0,1,0,1,0],[0,2,4,6,8]])
B = np.array([[0,1,0],[1,1,1],[0,1,0]])
print(A)
print(B)

In [None]:
signal.convolve2d(A, B, mode='same', boundary='wrap', fillvalue=0)

board patterns

how many squares at distane one could be covered by lean enemies?



In [None]:
def cells_within_dist(cell_index,maxdist=1,boardsize=21):
    close_indices = np.zeros(maxdist**2 + (maxdist-1)**2)
    for distance in range(maxdist):
        
    

In [None]:
stepdata[1]['action']

In [None]:
stepdata[2]['action']

In [None]:
stepdata[3]['action']

In [None]:
stepdata[0]['observation']['players'][3]

In [None]:
halite_maps = np.zeros((21,21,400),dtype=int)

In [None]:
stepnum = 108
stepdata = gamedata['steps'][stepnum]

In [None]:
# list of ships
list(stepdata[0]['observation']['players'][3][2].keys())

In [None]:
stepdata_old = gamedata['steps'][6]

In [None]:
stepdata[0]['observation']['players'][3][2]

In [None]:
gamedata['steps'][4][0]['observation']['players'][3][2]