# Maze Harvest

 Maze Harvest is an environment where an agent is placed in a 2D grid with randomly spawning fruits. The agent can collect two types of fruits, red fruits which have a power of 10, and green fruits which have a power of 5. 
 
 If the agent does not have down syndrome, when it consumes a red fruit it can grow its body size up to the floor value of (X+Y)/2, where X and Y are the shape of the 2D grid.

The agent has limited visibility and can only see the fruits and walls within a window of n units in four directions - up, down, left, and right. However, the agent has the ability to smell the fruits over the grid in these directions: front left, front right, back left & back right (Like four Venn diagrams, each intersecting with two adjacent sets but not center). 


The goal of the agent is to eat as many fruits as possible and to survive the maze.

### Environment and its limits
> $Avg = \lfloor (X+Y)/2 \rfloor$


- Environment size limit: $10\le X,Y \le 50$, if not under limit size set to 10
- Maximum Fruit Spawn: $Avg$
- Number of Walls: limit $\le 30\%$ of the total cells ($20 - 25\%$ is the best range)
- Maximum Body size: $Avg$
- Default Maximum Moves Alloweded: 10000 (we can change it)
- Action Space: 4, 0 left, 1 up, 2 right, 3 down.
- Default Window Size: 10, window size should be less than or equal to $Avg$, or else default will be used.
- During reset, we can set the wall proportion and enable or disable down syndrome(nds - no down syndrome), default 0.25 and falls.
- State Size 16, One hot encoded direction (4), Danger and Food (8) each 4 directions, Smell of fruits (4).

### Reward System
- default $0 \to Reward$
- if agent hit body or wall or max moves reached $-10 \to Reward$
- else if agent ate a fruit $10*power + Reward \to Reward$
- then the reward poisoned by powers of fruits already in the environment $(Reward - \frac {1}{10}\sum^F_i power_i)  \to Reward$

- If it reaches maximum moves or, body or wall hit, the game is done. 


**File:** `maze_harvest.py`
- `Environment`: Initialize new environment with given parameters.
- `play_frames`: Require lamda function to clear shell, input: recorded frames.
    - play frames example: `play_frames(frames,lambda : clear_output(wait=True),sleep=0.3)`
    
 - **Utils**:
     - Class: `ActionSpace`.
     - Functions: `euclidean`,`gaussian_kernel` & `nxt_direction`
     - Variables: `color_map` & `directions`
     
 **File:** `dqn_tf.py`
  - `DQN`: Requires Architecture & Activation Functions (other parameters are set to default values)
  - **Utils**:
      - Classes: `ReplayMemory` & `QNetwork`

In [1]:
from maze_harvest import Environment, play_frames
from dqn_tf import DQN
import numpy as np

In [2]:
from time import sleep
from IPython.display import clear_output
def play(net,env,slow=0.1,walls=.2,nds=False,record=False,print_now=True):
    nxt_state = env.reset(walls=walls,nds=nds)
    done = False
    if record: env.record(True)
    env.render(print_now)
    while not done:
        state = nxt_state
        sleep(slow)
        action = np.argmax(net(np.array([state])))
        nxt_state,r,done = env.step(action)
        clear_output(wait=True)
        env.render(print_now)
    
    if record:
        return env.record(False)

In [3]:
def train(agent,env,num_episodes=100,batch_size=32,C=100,ep=10,walls=.2,nds=False):
    steps=0
    for i in range(1,num_episodes+1):
        try:
            episode_loss = 0
            t = 0

            # Sample Phase
            agent.decay_epsilon()
            nxt_state = env.reset(walls=walls,nds=nds)
            done = False
            while not done:
                state = nxt_state
                action = agent.e_greedy(state,env)
                nxt_state,reward,done = env.step(action)

                # Learning Phase
                episode_loss += agent.learn((state,action,reward,nxt_state,done),batch_size)
                steps +=1
                t+=1

                if steps % C == 0: agent.update_target_network()

            if i%ep==0: print(f"Episode:{i} Score:{env.score} Moves:{env.move_count} Loss:{episode_loss/t}")
        except KeyboardInterrupt:
            print(f"Training Terminated at Episode {i}")
            return 

## Agent Network Init

Architecture: 16->12($Lin$)->6($reLU$)->4($Lin$)

HyperParameters: `eta = 5e-4, epsilon=0.7,epsilon_min=0.01`

In [4]:
arch = [16,12,8,4]
af = ["linear","relu","linear"]
agent = DQN(arch,af,eta=5e-4,epsilon=0.7,epsilon_min=0.01)

## Two different Environments

env1 = 10x10

env2 = 20x20

In [5]:
env1 = Environment(max_moves=50)

In [6]:
env2 = Environment(20,20,max_moves=100)

## Agent Training

In [7]:
before_train = play(agent.Q,env1,record=True,walls=.25) # before training

[1;32;42m ! [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;35;47m @ [0m[0;30;40m   [0m[0;30;40m   [0m[1;31;41m ! [0m[1;31;41m ! [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[0;30;40m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;31;41m ! [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;32;42m ! [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47

### 1) walls 1% (to give more chance to eat fruits)

In [8]:
train(agent,env1,500,32,C=20,ep=100,walls=.01)

Episode:100 Score:9 Moves:50 Loss:996.9736236572265
Episode:200 Score:12 Moves:50 Loss:1072.8812066650391
Episode:300 Score:3 Moves:14 Loss:1157.971444266183
Episode:400 Score:4 Moves:21 Loss:1433.3675391787574
Episode:500 Score:13 Moves:50 Loss:1294.1780529785156


In [9]:
play(agent.Q,env1,walls=0.05)

[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47

### 2) walls 20% (to learn how to avoid walls and to eat)

In [10]:
train(agent,env1,4000,42,C=50,ep=100,walls=.20)

Episode:100 Score:0 Moves:2 Loss:2453.2225341796875
Episode:200 Score:0 Moves:1 Loss:955.8154907226562
Episode:300 Score:0 Moves:2 Loss:2321.2350463867188
Episode:400 Score:0 Moves:2 Loss:1181.4935913085938
Episode:500 Score:0 Moves:2 Loss:1683.8015747070312
Episode:600 Score:0 Moves:1 Loss:931.844482421875
Episode:700 Score:0 Moves:2 Loss:1386.17919921875
Episode:800 Score:1 Moves:5 Loss:1029.0156127929688
Episode:900 Score:0 Moves:3 Loss:1204.8654378255208
Episode:1000 Score:0 Moves:2 Loss:1068.2022399902344
Episode:1100 Score:1 Moves:9 Loss:1178.8419121636284
Episode:1200 Score:0 Moves:2 Loss:1151.8776550292969
Episode:1300 Score:0 Moves:1 Loss:757.3826293945312
Episode:1400 Score:1 Moves:7 Loss:697.5479082380023
Episode:1500 Score:8 Moves:27 Loss:799.1183347348814
Episode:1600 Score:0 Moves:1 Loss:939.2730712890625
Episode:1700 Score:0 Moves:2 Loss:513.3000030517578
Episode:1800 Score:14 Moves:46 Loss:796.6308613652768
Episode:1900 Score:10 Moves:50 Loss:757.2430194091797
Episode:2

In [11]:
play(agent.Q,env1,walls=0.2)

[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;32;42m ! [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;32;42m ! [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47

In [12]:
# Saving weights

agent.Q.save_weights("networks/maze_harvest/Qc1.h5")
agent.Q_target.save_weights("networks/maze_harvest/Qtc1.h5")

In [14]:
env1.max_moves = 500

In [16]:
play(agent.Q,env1,walls=0.2,nds=True)

[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m
[1;32;42m ! [0m[1;32;42m ! [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40

### 3) walls 20%, nds=True

In [17]:
train(agent,env1,2000,42,C=50,ep=100,nds=True)

Episode:100 Score:0 Moves:3 Loss:835.4608764648438
Episode:200 Score:7 Moves:30 Loss:1056.6042338053385
Episode:300 Score:7 Moves:24 Loss:1063.4635111490886
Episode:400 Score:1 Moves:9 Loss:1163.044650607639
Episode:500 Score:0 Moves:1 Loss:1246.7003173828125
Episode:600 Score:0 Moves:6 Loss:854.3956298828125
Episode:700 Score:11 Moves:45 Loss:983.4987813313802
Episode:800 Score:2 Moves:18 Loss:990.4209628634983
Episode:900 Score:7 Moves:30 Loss:878.3282623291016
Episode:1000 Score:17 Moves:77 Loss:918.2928070464692
Episode:1100 Score:2 Moves:9 Loss:801.5443725585938
Episode:1200 Score:12 Moves:44 Loss:945.5208282470703
Episode:1300 Score:11 Moves:40 Loss:1016.7399688720703
Episode:1400 Score:0 Moves:2 Loss:1005.1677856445312
Episode:1500 Score:5 Moves:28 Loss:1011.4890888759068
Episode:1600 Score:1 Moves:8 Loss:1055.7688446044922
Episode:1700 Score:0 Moves:3 Loss:949.4847208658854
Episode:1800 Score:1 Moves:6 Loss:1011.351318359375
Episode:1900 Score:4 Moves:19 Loss:812.1162478798315


In [25]:
play(agent.Q,env1,walls=0.2,nds=True)

[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;35;47m @ [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;32;42m ! [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47

### 4) walls 30% nds= True

In [26]:
train(agent,env1,3000,42,C=50,ep=100,walls=.3,nds=True)

Episode:100 Score:0 Moves:2 Loss:1142.634521484375
Episode:200 Score:0 Moves:8 Loss:933.158203125
Episode:300 Score:3 Moves:15 Loss:993.9643778483073
Episode:400 Score:1 Moves:6 Loss:989.9650014241537
Episode:500 Score:6 Moves:28 Loss:962.7591912405832
Episode:600 Score:0 Moves:5 Loss:1000.1607299804688
Episode:700 Score:1 Moves:6 Loss:848.4299062093099
Episode:800 Score:10 Moves:52 Loss:782.9282326331505
Episode:900 Score:0 Moves:2 Loss:835.8400421142578
Episode:1000 Score:4 Moves:17 Loss:949.1423483455883
Episode:1100 Score:0 Moves:2 Loss:903.3777465820312
Episode:1200 Score:1 Moves:8 Loss:823.1120910644531
Episode:1300 Score:3 Moves:13 Loss:724.2413846529447
Episode:1400 Score:0 Moves:2 Loss:761.7478942871094
Episode:1500 Score:7 Moves:32 Loss:821.6943225860596
Episode:1600 Score:0 Moves:6 Loss:654.2755533854166
Episode:1700 Score:0 Moves:2 Loss:917.6478576660156
Episode:1800 Score:0 Moves:4 Loss:755.2197113037109
Episode:1900 Score:0 Moves:3 Loss:1122.2071126302083
Episode:2000 Sco

In [28]:
play(agent.Q,env1,walls=0.3,nds=True)

[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m
[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m
[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;31;41m ! [0m[0;30;40m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m[1;32;42m ! [0m[0;30;40

In [31]:
play(agent.Q,env1,walls=0.2,nds=True)

[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;35;47m @ [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;31;41m ! [0m[1;37;47m   [0m
[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m
[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m
[0;30;40m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40

In [44]:
play(agent.Q,env2,walls=0.2,nds=True)

[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m  

### Loading weights

In [45]:
agent.Q.load_weights("networks/maze_harvest/Qc1.h5")
agent.Q_target.load_weights("networks/maze_harvest/Qtc1.h5")

In [67]:
play(agent.Q,env1,walls=0.2)

[1;37;47m   [0m[1;31;41m ! [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m
[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m
[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42m ! [0m
[1;37;47m   [0m[0;30;40m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;37;47m   [0m[1;32;42

Next:
- train seperate network for nds=True.
- sigmoid for first layer

Agent Training Notebook V1