In [1]:
import numpy as np
import PIL
import neat
import cv2

import mxnet as mx
import numpy as np
import torch
from mxnet import gluon


### **SOW-MKI49-2020-SEM1-V: NeurIPS**
#### Project: Neurosmash

This is the info document on the Neurosmash environment that you will be using for your Final Assignment. It contains background info and skeleton code to get you started.

### Project

During the 2nd period, you will be working exclusively on this final group project in the practicals. You are expected to form groups of 4-5 people. The goal is to take what has been discussed in class and what you have already worked on in the earlier practicals, and apply them on a RL problem in a novel environment. This  project will constitute 25% of your final grade.

Your project grade will be based on the following components:
- Online demonstration
- Source code
- Written report (a 4-page report in NeurIPS workshop paper format: https://www.overleaf.com/latex/templates/neurips-2020/mnshsmqkjsqz)

These components will be evaluated based on performance, creativity, elegance, rigor and plausibility.

While you can use the material from earlier practicals (e.g., REINFORCE, DQN, etc.) as a boilerplate, you are also free to take any other approach be it imitation learning or world models for your project.

As a deep learning library, use of mxnet is preferred. Still, you are free to use whatever you want.

In addition to the practical sessions, we will provide additional support in the coming weeks. You can email any of us to set up an appointment for discussing your project.

### Environment

Briefly, there are two agents: Red and Blue. Red is controlled by you. Blue is controlled by the environment "AI".* Both agents always run forward with a speed of 3.5 m/s*. If one of them gets within the reach of the other (a frontal sphere with 0.5 m radius), it gets pushed away automatically with a speed of 3.5 m/s. The only thing that the agents can do is to turn left or right with an angular speed of 180 degrees/s. This means that there are three possible discrete actions that your agent can take every step: Turn nowhere, turn left and turn right. For convenience, there is also a fourth built-in action which turns left or right with uniform probability. An episode begins when you reset the environment and ends when one of the agents fall off the platform. At the end of the episode, the winning agent gets a reward of 10 while the other gets nothing. Therefore, your goal is to train an agent who can maximize its reward by pushing the other agent off the platform or making it fall off the platform by itself.

* Note that all times are simulation time. That is, 0.02 s per step when timescale is set to one.

* Basically, Blue is artificial but not really intelligent. What it does is that every 0.5 s, it updates its destination to the current position of Red plus some random variation (a surrounding circle with a radius of 1.75 m) and smoothly turns to that position.

Note to macOS users: You should first make the environment executable* in the terminal and run it from the context menu (i.e., not by double clicking)

* chmod -R +x [Path of Mac.app (which is in the .zip file)]/Contents/MacOS

### Skeleton code

- You should first add the Neurosmash file to your working directory or Python path.
- Next you should start the Neurosmash app 
- Make sure to set the right values in the Ip, Port, Size and Timescale fields (see below). These must correspond to the values you specify in the python script
- Start the server by pressing the play button
- The fastest simulations can be obtained by turning off rendering (x button)

In [2]:
import Neurosmash

# These are the default environment arguments. They must be the same as the values that are set in the environment GUI.
ip         = "127.0.0.1" # Ip address that the TCP/IP interface listens to (127.0.0.1 by default)
port       = 13000       # Port number that the TCP/IP interface listens to (13000 by default)

# This is the size of the texture that the environment is rendered.
# This is set to 784 by default, which will result in a crisp image but slow speed.
# You can change the size to a value that works well for your environment but should not go too low.
size       = 240

# This is the simulation speed of the environment. This is set to 1 by default.
# Setting it to n will make the simulation n times faster.
# In other words, less (if n < 1) or more (if n > 1) simulation time will pass per step.
# You might want to increase this value to around 10 if you cannot train your models fast enough
# so that they can sample more states in a shorter number of steps at the expense of precision.
timescale  = 20

# This is an example agent.
# It has a step function, which gets reward/state as arguments and returns an action.
# Right now, it always outputs a random action (3) regardless of reward/state.
# The real agent should output one of the following three actions:
# none (0), left (1) and right (2)
agent = Neurosmash.Agent() 

# This is the main environment.
# It has a reset function, which is used to reset the environment before episodes.
# It also has a step function, which is used to which steps one time point
# It gets an action (as defined above) as input and outputs the following:
# end (true if the episode has ended, false otherwise)
# reward (10 if won, 0 otherwise)
# state (flattened size x size x 3 vector of pixel values)
# The state can be converted into an image as follows:
# image = np.array(state, "uint8").reshape(size, size, 3)
# You can also use to Neurosmash.Environment.state2image(state) function which returns
# the state as a PIL image
environment = Neurosmash.Environment(ip, port, size, timescale) 



In [4]:
class autoencoder(gluon.Block):
    def __init__(self):
        super(autoencoder, self).__init__()
        with self.name_scope():
            self.encoder = gluon.nn.Sequential('encoder_')
            with self.encoder.name_scope():
                self.encoder.add(gluon.nn.Dense(2, in_units = 4))

            self.decoder = gluon.nn.Sequential('decoder_')
            with self.decoder.name_scope():
                self.decoder.add(gluon.nn.Dense(4, in_units = 2))

    def encode(self, x):
        return self.encoder(x)

    def decode(self, x):
        return self.decoder(x)

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

In [64]:
action_dict = {0: 'n', 1: 'l', 2: 'r'}
def get_color_coords(image):
    boundaries = [
    ([17, 15, 130], [80, 100, 200]), #Blue
    ([86, 31, 4], [220, 88, 50]), #Red
    ]
    coords = []
    for (lower, upper) in boundaries:
        # create NumPy arrays from the boundaries
        lower = np.array(lower, dtype = "uint8")
        upper = np.array(upper, dtype = "uint8")
        # find the colors within the specified boundaries and apply
        # the mask
        mask = cv2.inRange(image, lower, upper)
        output = cv2.bitwise_and(image, image, mask = mask)
        coords.append(np.mean(np.array(list(set(zip(np.nonzero(output)[0],np.nonzero(output)[1])))), axis = 0))
    for idx in range(len(coords)):
        if np.isnan(coords[idx]).all():
            coords[idx] = np.array([0,0])
    return np.array(coords)-size/2

In [6]:
sample_coords = []

In [134]:
def fitness(genomes, config):
    # init ae model
    model = autoencoder()
    ctx = mx.cpu()
    model.load_parameters('model/simple_autoencoder.params', ctx=ctx)
    
    for g_id, g in genomes:
        g.fitness = 0
        net = neat.nn.FeedForwardNetwork.create(g, config)
        
        action_ls = []
        
        end, reward, state = environment.reset()
        
        # calculate previous coords for first iteration
        image = np.array(state, "uint8").reshape(size, size, 3)
        coords = get_color_coords(image).flatten()
        encoding_coords = model.encode(mx.nd.array([coords])).asnumpy()[0]
        
        while (end == 0):
            image = np.array(state, "uint8").reshape(size, size, 3)
            coords = get_color_coords(image).flatten()
            sample_coords.append(coords)
#             print('coords and encoding')
#             print(coords)
#             print(encoding_coords)
#             print('-'*64)
            # add encoding of previous coords to input
            inputs = np.append(coords, encoding_coords)
#             print('inputs')
#             print(inputs)
            # Encode current state coords
            encoding_coords = model.encode(mx.nd.array([coords])).asnumpy()[0]

            output = net.activate(inputs)
   
            action = np.argmax(output)
            action_ls.append(action_dict[action])
            end, reward, state = environment.step(action)
            g.fitness -=0.01
        g.fitness+=reward
       # print(action_ls)

In [135]:
def run(config_file):
    # Load configuration.
    config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                         neat.DefaultSpeciesSet, neat.DefaultStagnation,
                         config_file)

    # Create the population, which is the top-level object for a NEAT run.
    p = neat.Population(config)

    # Add a stdout reporter to show progress in the terminal.
    p.add_reporter(neat.StdOutReporter(True))
    stats = neat.StatisticsReporter()
    p.add_reporter(stats)

    # Run for up to 25 generations.eval_genomes
    winner = p.run(fitness, 25)

#     # Display the winning genome.
#     print('\nBest genome:\n{!s}'.format(winner))

#     # Show output of the most fit genome against training data.
#     print('\nOutput:')
#     winner_net = neat.nn.FeedForwardNetwork.create(winner, config)
#     for xi, xo in zip(xor_inputs, xor_outputs):
#         output = winner_net.activate(xi)
#         print("input {!r}, expected output {!r}, got {!r}".format(xi, xo, output))

#     node_names = {-1:'A', -2: 'B', 0:'A XOR B'}
#     visualize.draw_net(config, winner, True, node_names=node_names)
#     visualize.plot_stats(stats, ylog=False, view=True)
#     visualize.plot_species(stats, view=True)

#     p = neat.Checkpointer.restore_checkpoint('neat-checkpoint-4')
#     p.run(eval_genomes, 10)

In [136]:
config_path =  'config-feedforward'
run(config_path)


 ****** Running generation 0 ****** 

Population's average fitness: 9.60667 stdev: 0.18373
Best fitness: 9.86000 - size: (3, 18) - species 1 - id 1
Average adjusted fitness: 0.177
Mean genetic distance 0.823, standard deviation 0.425
Population of 3 members in 1 species:
   ID   age  size  fitness  adj fit  stag
     1    0     3      9.9    0.177     0
Total extinctions: 0
Generation time: 10.247 sec

 ****** Running generation 1 ****** 

Population's average fitness: 2.78000 stdev: 4.93032
Best fitness: 9.73000 - size: (3, 18) - species 1 - id 1
Average adjusted fitness: 0.363
Mean genetic distance 0.638, standard deviation 0.335
Population of 3 members in 1 species:
   ID   age  size  fitness  adj fit  stag
     1    1     3      9.7    0.363     1
Total extinctions: 0
Generation time: 14.283 sec (12.265 average)

 ****** Running generation 2 ****** 

Population's average fitness: 6.23667 stdev: 5.04652
Best fitness: 9.85000 - size: (3, 18) - species 1 - id 1
Average adjusted fitne

Population's average fitness: 6.18000 stdev: 4.48645
Best fitness: 9.72000 - size: (4, 15) - species 1 - id 19
Average adjusted fitness: 0.641
Mean genetic distance 0.544, standard deviation 0.283
Population of 3 members in 1 species:
   ID   age  size  fitness  adj fit  stag
     1   18     3      9.7    0.641    18
Total extinctions: 0
Generation time: 10.092 sec (25.273 average)

 ****** Running generation 19 ****** 

Population's average fitness: 2.46000 stdev: 5.16444
Best fitness: 9.71000 - size: (3, 13) - species 1 - id 22
Average adjusted fitness: 0.377
Mean genetic distance 0.909, standard deviation 0.281
Population of 3 members in 1 species:
   ID   age  size  fitness  adj fit  stag
     1   19     3      9.7    0.377    19
Total extinctions: 0
Generation time: 17.803 sec (26.238 average)

 ****** Running generation 20 ****** 

Population's average fitness: 5.43333 stdev: 4.32586
Best fitness: 8.69000 - size: (4, 14) - species 1 - id 20
Average adjusted fitness: 0.652
Mean ge

In [23]:
# save coordinate data
np.savetxt("sample_coords.csv", sample_coords, delimiter=",")
