# Relational Deep Reinforcement Learning

**Plan:**
1. Architecture
2. Agent
3. Environment
4. Training cycle

## Architecture

**Input: (b,n,n,3)** = (batch length, linear size, linear size, RGB)

**Extract entities: (b,n,n,3) -> (b, m, m, 2k)** 
* convolutional_layer1(kernel_size = (2,2), input_filters = 3, output_filters = k, stride = 1, pad = (1,1))
* convolutional_layer2(kernel_size = (2,2), input_filters = k, output_filters = 2k, stride = 1, pad = (1,1))

**Relational block: (b, m, m, 2k) -> (b,d_m)**
* Positional Encoding: (b, m, m, 2k) -> (b, m^2, d_m)
* N Multi-Headed Attention blocks: (b, m^2, d_m) -> (b, m^2, d_m)

**Feature-wise max pooling: (b, m^2, d_m) -> (b, d_m)**

**Multi-Layer Perceptron: (b, d_m) -> (b, d_m)**
* 4 fully connected layers (d_m,d_m) with ReLUs

**Actor output: (b,2k+2) -> (b,a)** [a = number of possible actions]
* Single linear layer with softmax at the end

**Critic output: (b,2k+2) -> (b,1)** 
* Single linear layer without activation function

In [1]:
import numpy as np
import torch 

import torch.nn as nn
import torch.nn.functional as F

In [2]:
import RelationalNetworks as rnet

In [44]:
from importlib import reload
reload(rnet)

<module 'RelationalNetworks' from '/home/nicola/Nicola_unipd/MasterThesis/RelationalDeepRL/RelationalNetworks.py'>

In [45]:
get_entities = rnet.ExtractEntities(k_out = 24)
pe = rnet.PositionalEncoding(24, 256)
encoder = rnet.EncoderBlock(256, 2)
rel = rnet.RelationalModule(24, 256, 4, 2)

In [46]:
# single frame-like input
x = torch.rand((1,3,12,12))

# Convolutional pass
y = get_entities(x)

# Positional encoding
z = pe(y)

# MHA
w = encoder(z)

In [47]:
# Positional encoding + multiple MHA
w2 = rel(y)
print("w2.shape: ", w2.shape)

w2.shape:  torch.Size([100, 1, 256])


In [48]:
# All together
full_net = rnet.BoxWorldNet()
out = full_net(x)
out.shape

torch.Size([1, 256])

In [49]:
def f(a):
    pass

def g(a, **k):
    f(**k)

In [50]:
d = {'a':1}
g(a=2, **d)

TypeError: g() got multiple values for keyword argument 'a'

In [52]:
help(F.log_softmax)

Help on function log_softmax in module torch.nn.functional:

log_softmax(input, dim=None, _stacklevel=3, dtype=None)
    Applies a softmax followed by a logarithm.
    
    While mathematically equivalent to log(softmax(x)), doing these two
    operations separately is slower, and numerically unstable. This function
    uses an alternative formulation to compute the output and gradient correctly.
    
    See :class:`~torch.nn.LogSoftmax` for more details.
    
    Arguments:
        input (Tensor): input
        dim (int): A dimension along which log_softmax will be computed.
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
          If specified, the input tensor is casted to :attr:`dtype` before the operation
          is performed. This is useful for preventing data type overflows. Default: None.

