# my-notebook

This notebook contains my initial approach and familiarization with Pommerman.

I will be building my solution for the Team Radio competition ([competitions page](https://www.pommerman.com/competitions)).

## TODOS

- Check difference between `radio_v2` and `radio_competition` envs. --> No difference. Two IDs lead to the same environment.
- Figure out how to get the information from the envs.
- Figure out if there's any way a human could play.
- Ask direction, position and flames values on Discord.

## Importing packages

The most important is the `pommerman` package that is in the root directory, and from which we will be importing modules that are necessary.

In [1]:
# Just checking that we're running the Python version we want
import sys
print(sys.executable)

import os
import sys
import numpy as np

import pommerman
from pommerman.agents import SimpleAgent, RandomAgent, PlayerAgent, BaseAgent
from pommerman.configs import radio_v2_env, radio_competition_env
from pommerman.envs.v0 import Pomme
from pommerman.characters import Bomber
from pommerman import utility

/home/jazz/Projects/Personal/playground/env/bin/python3


## Initializing Radio environment

I believe the available environments are based on OpenAI Gym environments.

There is a `configs.py` file with the configurations of each environment. To create a `PommeRadioCompetition-v2` environment, we execute the following:

In [2]:
# Print all possible environments in the Pommerman registry
print(pommerman.REGISTRY)

# Create a set of agents (exactly four)
agent_list = [
    RandomAgent(),
    RandomAgent(),
    RandomAgent(),
    RandomAgent(),
    # agents.DockerAgent("pommerman/simple-agent", port=12345),
]

# Make the "Free-For-All" environment using the agent list
env = pommerman.make('PommeRadioCompetition-v2', agent_list)

['PommeFFACompetition-v0', 'PommeFFACompetitionFast-v0', 'PommeFFAFast-v0', 'PommeFFA-v1', 'OneVsOne-v0', 'PommeRadioCompetition-v2', 'PommeRadio-v2', 'PommeTeamCompetition-v0', 'PommeTeamCompetitionFast-v0', 'PommeTeamCompetition-v1', 'PommeTeam-v0', 'PommeTeamFast-v0']


In [3]:
# Seed and reset the environment
env.seed(0)
obs = env.reset()

# Run the random agents until we're done
done = False
while not done:
    env.render()
    actions = env.act(obs)
    obs, reward, done, info = env.step(actions)
env.render(close=True)
env.close()

## Actions and Observations

As described [here](https://www.pommerman.com/about), [here](https://docs.pommerman.com/environment/) and [here](https://github.com/MultiAgentLearning/playground/blob/master/docs/environment.md).

When playing in the `PommeRadioCompetition-v2` an observation variable comes with two observations - one for each agent we control.

### Alive 

A list containing the IDs of the agents that are still alive (10 is Agent0, 11 is Agent1, ...)

### Board

The board is a 11x11 numpy array, where each value corresponds to a representation:

- 0 = Passage
- 1 = Rigid Wall -- cannot be broken
- 2 = Wooden Wall -- can be broken and half of them have power-ups
- 3 = Bomb
- 4 = Flames
- 5 = Fog (only applicable in partially observed scenarios like 2v2 Team Radio)
- 6 = Extra Bomb Power-Up -- adds ammo
- 7 = Increase Range Power-Up -- increases the `blast_strength`
- 8 = Can Kick Power-Up -- a player can kick bombs by touching them
- 9 = Agent Dummy
- 10 = Agent0
- 11 = Agent1
- 12 = Agent2
- 13 = Agent3

### Bomb_Blast_Strength


A 11x11 numpy matrix that contains every position a bomb of the agent might be in. It's `0` if there's no bomb there. Any other number represents the blast strength of the current agent's bombs. 

A value different than `0` represents how many squares the flame of the bomb occupies. An agent starts with a bomb strength of `3`.

### Bomb_Life

A 11x11 numpy matrix that contains the life of all bombs in the agent's field of view. A bomb has a life time of **10 timesteps**. The number in the matrix indicates the number of timesteps left until it blows.

### Bomb_Moving_Direction

A 11x11 numpy array that contains the moving direction of all the bombs in the agent's field of view. A bomb travels at **1 unit per timestep** in the direction it was kicked.

TODO: I am unsure about what values correspond to which direction.

### Flame_Life

A 11x11 numpy array that contains the life of the flame in a given position.

TODO: I do not the value of the flame life when the bomb blows up.

### Game_Type

TODO: Dunno

### Game_Env

A string with the ID of the environment being used for the game.

### Position

A tuple with 2 ints. The position of the current agent in the board. Each integer is between **[0, 10]**.

TODO: Where is (0,0)?

### Blast_Strength

An int with the agent's current blast strength.

### Can_Kick

A boolean which says if the agent can kick bombs or not.

### Teammate

TODO: Not sure about the type.

### Ammo

An int with the agent's current ammo (number of bombs it can drop right now).

### Enemies

TODO: Not sure about the type.

### Step_Count

Number of steps since beggining of the game.

### Message

A list of two Ints, each in [0, 8]. The message being relayed from the teammate. Both ints are zero when a teammate is dead or it's the first step. Otherwise they are in [1, 8].

In [4]:
# Example of an observation

# Seed and reset the environment
env.seed(0)
obs = env.reset()

print (env.action_space)

# Run the random agents until we're done
done = False
it = 0
while not done:
    env.render()
    actions = env.act(obs)
    print(actions)
    obs, reward, done, info = env.step(actions)
    
    it = it + 1
    if it > 10:
        print (obs)
        break
env.render(close=True)
env.close()

print(info)

Tuple(Discrete(6), Discrete(8), Discrete(8))
[(5, 3, 4), (5, 1, 5), (3, 0, 0), (0, 5, 5)]
[(1, 0, 4), (1, 6, 5), (5, 3, 1), (0, 0, 4)]
[(3, 3, 1), (5, 7, 5), (0, 0, 7), (1, 2, 0)]
[(3, 6, 7), (5, 1, 1), (4, 0, 5), (0, 3, 2)]
[(3, 2, 4), (3, 6, 6), (5, 5, 7), (3, 0, 3)]
[(0, 4, 2), (3, 5, 5), (4, 7, 0), (2, 3, 3)]
[(1, 7, 5), (3, 6, 4), (5, 5, 3), (3, 7, 1)]
[(1, 1, 7), (3, 6, 2), (2, 4, 3), (1, 2, 3)]
[(5, 7, 7), (5, 5, 1), (4, 2, 2), (0, 1, 5)]
[(0, 7, 4), (0, 3, 2), (5, 6, 5), (0, 6, 0)]
[(5, 1, 1), (0, 3, 0), (3, 0, 4), (4, 0, 1)]
[{'alive': [10, 11, 13], 'board': array([[10,  4,  1,  1,  1,  5,  5,  5,  5,  5,  5],
       [ 4,  4,  4,  0,  2,  5,  5,  5,  5,  5,  5],
       [ 1,  4,  0,  2,  0,  5,  5,  5,  5,  5,  5],
       [ 1,  0,  2,  0,  2,  5,  5,  5,  5,  5,  5],
       [ 1,  2,  0,  2,  0,  5,  5,  5,  5,  5,  5],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5],
  