# Keras- writing custom layers
```Here you will experience with writing custom keras layers. We will have two stages: in the first stage we will implement a simple layer. In the second you will implement a more complicated layer.```

## Stage 1
```Implement an unpooling layer, that acts on matrices as follow:```
```
A = array([[0, 1, 3, 1, 0],
           [2, 0, 1, 2, 4],
           [3, 2, 1, 4, 3],
           [4, 0, 3, 2, 0],
           [4, 1, 2, 0, 2]])
       
unpooling(A) = array([[0, 0, 1, 1, 3, 3, 1, 1, 0, 0],
                      [0, 0, 1, 1, 3, 3, 1, 1, 0, 0],
                      [2, 2, 0, 0, 1, 1, 2, 2, 4, 4],
                      [2, 2, 0, 0, 1, 1, 2, 2, 4, 4],
                      [3, 3, 2, 2, 1, 1, 4, 4, 3, 3],
                      [3, 3, 2, 2, 1, 1, 4, 4, 3, 3],
                      [4, 4, 0, 0, 3, 3, 2, 2, 0, 0],
                      [4, 4, 0, 0, 3, 3, 2, 2, 0, 0],
                      [4, 4, 1, 1, 2, 2, 0, 0, 2, 2],
                      [4, 4, 1, 1, 2, 2, 0, 0, 2, 2]])
```
```Use the following example to do so, which is taken from https://keras.io/layers/writing-your-own-keras-layers/.```

```Note: you can't use numpy's functions in your layer's logic. You will have to use functions that are accessed through the backend you use (Theano or Tensorflow).```

```~Ittai Haran```

In [None]:
from keras import backend as K
from keras.engine.topology import Layer
import numpy as np

class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(name='weight_variable_name', 
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        super(MyLayer, self).build(input_shape)  # Be sure to call this somewhere!

    def call(self, x):
        
        return K.dot(x, self.kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

## Stage 2
```Consider the following simple attention mechanism:```

```Given a vector compute Dense(v), while Dense(v).shape = v.shape
Multiply v and Dense(v) element-wise
Return the result```

```What is the purpose of this mechanism? Can you think what can be achieved using this kind of mechanism?```

```Implement the attention mechanism as a keras layer.```

## Stage 3
```Here you will try solving a problem I once  struggled with. The problem is the following:
You are given a set of sequences of symbols. All sequences contain the same "core sequence", but have extra noise in the form of other symbols between the symbols of the core sequence. For example, the sequences could be```


**1**-**3**-2-**4**-3-**2**-4-**1**-3-2-4

**1**-2-**3**-3-**4**-1-2-**2**-**1**-3-4-2-1-1

**1**-4-4-4-**3**-**4**-1-1-**2**-**1**-1-2

```while the core sequence is 1-3-4-2-1```
```Your task is, given a dataset of such sequences, to find the core sequence. You may speak to me to learn about the context of this question and the reasons led to facing it.```

```Generate a dataset that will simulate this problem. Follow the instructions:```
- ```Use a 4-letter alphabet.```
- ```Generate a core sequence with 10 symbols.```
- ```Create a new sequence symbol by symbol: for each symbol you add to the sequence, put the next letter of the sequence with probability p and a random symbol with a probability 1-p. choose p to be 0.5.```
- ```Generate a 10,000 examples dataset.```

```Try solving the problem with simple means.```

## Stage 4
```A possible solution for the problem could be done as follow:```
- ```Given a dataset of sequences as such, generate a new dataset of random sequences.```
- ```Train a classifier that will determine whether a sequence belongs to the original dataset or the generated dataset. Make sure that this problem is solvable.```
- ```Now train a specific model, containing an attention layer. We can hope that the attention mechanism will learn to use the core sequence when classifying.```
- ```Use the attention visualization to find the symbols of the core sequence.```

```What are the advantages of this solution? Do you think you can make it work? You certainly will need a different kind of attention mechanism for the task, rather than the simple one you already have.```

```Read the paper Neural Machine Translation by Jointly Learning to Align and Translate by Bahanau, Cho and Bengio. The paper concerns with an attention mechanism implemented in the context of machine translation. Implement the attention mechanism the authors suggest as a keras layer. Use the source code of the keras.layers.recurrent class. You can find the paper and the class source code in the current directory.```

```Basic instructions:```
- ```Use your tutor. A lot. This is a hard exercise.```
- ```Open the source code of recurrent neural networks. You would like to implement a layer that inherits from Recurrent.```
- ```Understand the code's flow and the functions you would like to write.```
- ```Start by writing a mechanism that would be a little bit simpler: don't return a sequence, but rather return a single vector.```
- ```Try solving the above problem using your attention mechanism. What problems do you encounter?```
- ```Complete the full mechanism. Assuming Yoshua Bengio didn't lie in his paper, how do you think their architecture overcomes the problem you found?```