In [None]:
from IPython.display import Image
Image("../input/image1/Picture1.png")


## Table of Content

* [Credits](#tag1)


* [Idea](#tag2)


* [Helper functions](#tag3)


* [Execution](#tag4)


* [Improvement](#tag5)



<a id='tag1'></a>

## Credits to...

* [Swarm Intelligence with SDK](https://www.kaggle.com/kwabenantim/swarm-intelligence-with-sdk) I used the Controller() class except a few changes in assigning the actions for the agents. I like its organized structure and the way it keeps track of cells to avoid collision is very smart. Please make sure to check it out!
* [Getting started with halite](https://www.kaggle.com/alexisbcook/getting-started-with-halite) I used functions from this notebook as well.

<a id='tag2'></a>

## Idea

Lots of algorithms in reinforcement learning are derived from the idea of **balancing between exploration and exploitation**(https://en.wikipedia.org/wiki/Multi-armed_bandit). In this game, there are advantages and distadvantages of both strategies:

* Exploration: the ship can look up the surrounding area and find the cells with larger number of halites, but it costs time to travel to the desired area, also it risks getting collision with other ships
* Exploitation: the ship choose to either stay at the current position to mine or move a little bit to the surround area, but it might find nothing by just looking around aimlessly

There are differeny ways to achieve the transition between exploration and exploitation, here I used a probability paramter EPSILON to decide which way to go. It is a bit similar with [epsilon-greedy algorithm](https://imaddabbura.github.io/post/epsilon-greedy-algorithm/)

<a id='tag3'></a>

## Helper function

These functions are all under the Controller() class from [this notebook](https://www.kaggle.com/kwabenantim/swarm-intelligence-with-sdk). So if you want to test the agent you can just plug them in. 

In [None]:
## variables declared inside Controller().__init__()

# epsilon: used as a lever to switch inbetween explore and exploit
self.EPSILON = .4

# search range: used as a search parameter for neighbours halite
self.SEARCH_RANGE = 4

In [None]:
def get_neighbour_halite(self, point1, search_range, board):
    """get a list of halite from neighbours within a search_range"""
    point_x_neighbours = np.arange(max(0, point1.x-search_range), min(point1.x+search_range+1,20))
    point_y_neighbours = np.arange(max(0, point1.y-search_range), min(point1.y+search_range+1, 20))    
    neighbour_list = np.array( np.meshgrid(point_x_neighbours, point_y_neighbours)).reshape(-1,2)
    halite_list = [board.cells[i[0], i[1]].halite for i in neighbour_list]
    return neighbour_list, halite_list

def find_large_halite(self, neighbour_list, halite_list ):
    """Find the cell with the largest halite """
    if halite_list.index(np.max(halite_list)):
        return Point(neighbour_list[halite_list.index(np.max(halite_list))][0], \
                    neighbour_list[halite_list.index(np.max(halite_list))][1])
    else:
        result =  choice(neighbour_list)
        return Point(result[0], result[1])

def getDirTo(self, fromPos, toPos, size):
    """from https://www.kaggle.com/alexisbcook/getting-started-with-halite"""
    fromX, fromY = divmod(fromPos[0],size), divmod(fromPos[1],size)
    toX, toY = divmod(toPos[0],size), divmod(toPos[1],size)
    if fromY < toY: return ShipAction.NORTH
    if fromY > toY: return ShipAction.SOUTH
    if fromX < toX: return ShipAction.EAST
    if fromX > toX: return ShipAction.WEST

#### explore option
def explore(self, ship, current_prob):
    """explore the surrounding area and move towards the largest halite neighbour"""
    if current_prob < self.EPSILON:
        # get list of neighbours and halites within a search range
        neighbours, halites = self.get_neighbour_halite(ship.position, self.SEARCH_RANGE, self.board)
        # get the destination from the cells with the largest halites
        destination = self.find_large_halite(neighbours, halites)
        # if this largest cell is not ship's position
        if destination and destination != ship.position:          
            direction = self.getDirTo(ship.position, destination, BOARD_SIZE)
            d_cell = ship.cell.neighbor(direction.to_point())
            # Check if direction is safe
            if (self.clear(d_cell) and not self.hostile_ship_near(d_cell, ship.halite)):
                self.move(ship, direction)
                return True
        # if it is the current ship's position
        elif destination and destination == ship.position:
            if self.stay_on_cell(ship):
                return True
    return False

#### exploit option
def exploit(self, ship, current_prob):
    # check lever
    if current_prob >= self.EPSILON:
        # if current position has halite, then stay
        if self.board[ship.position].halite > 0:
            if self.stay_on_cell(ship):
                return True
    return False


<a id='tag4'></a>

## Execution

Again, this is just minor changes in the Controller class in order to call explore() and exploit() function 

In [None]:
def actions_of_ships(self):
    """ Next actions of every ship """
    for ship in self.player.ships:


        # Act according to first acceptable tactic
        if self.endgame(ship):
            continue
        if self.build_shipyard(ship):
            continue
        if self.safety_convert(ship):
            continue
        if self.crash_shipyard(ship):
            continue
        if self.unload_halite(ship):
            continue

        # USE epsilon to choose between explore and exploit
        current_prob = uniform(0, 1)
        if self.explore(ship, current_prob):
            #print("EXPLORE")
            continue  
        if self.exploit(ship, current_prob):
            #print("EXPLOIT")
            continue

        # Default random action
        self.move(ship, choice(DIRECTIONS + [None]))

<a id='tag5'></a>

## Improvement

* Compute a reward function: find the future reward and number of steps to get the reward, use it to decide explore or exploit

* Dynamic epsilon: depending on the steps of the game, at the early stage have a higher Epsilon (agents tend to explore more) and the late stage a lower Epsilon (agents tend to exploit).

#### Please comment, feedback & upvote if you like it 

In [None]:
Image("../input/image2/Picture2.jpg")
