# Improvement of Guided Backprop  on R-LSTM
Previously, when applying GB, gradient of sigmoid units were not controled. This results in poor performance of GB on R-LSTM. In particular, the percentage of GB was SA or slightly worse. In this version, I set gradient of sigmoids to be zero for GB. This gradient overridng seems to give much better results for GB. Below is the plot comparing between Deep and R-LSTM with the sigmoid gradient being overriden.

![](https://i.imgur.com/6xcib8I.png)

# A better evaluation metric for measuring properly distributed relevance
When inspecting experiment results, I found that there are some heatmaps where the right most digit/item in the majority group are primarily highlighted, while the other one barely has relevance assigned to. When such cases happen, **the percentage of relevance in data region** that we currently use to quantify the improvement is very close to 1, although the other digit/item is not highlighted. The figure below is one of such cases.
![](https://i.imgur.com/RwOcuQk.png)

As a result, I am thinking to slightly change the way we compute **the percentage**. More precisely, each block of digit/item  cannot have the percentage greater than $\tau$. This new calculation will not let architectures that dominantly distribute relevance to only one region get too high percentage. Below is the figure comparing the adjusted percentage between Deep and R-LSTM with $\tau=0.8$.
![](https://i.imgur.com/97kL2tj.png)
![](https://i.imgur.com/qxtZLP0.png)

# ConvDeep with Literal connections
For the last part, there was a critical mistake in my implemenation of ConvDeep with literal connections, denoted as Conv$^+$Deep. After correcting the problems, it turns out that Conv$^+$Deep gives much worse result than ConvDeep.
![](https://i.imgur.com/6c8cZPw.png)
![](https://i.imgur.com/Izh8m4a.png)

# Cosine Similarity 


![](https://i.imgur.com/e3y9ZCw.png)

![](https://i.imgur.com/E8ZUAi9.png)

# End

In [2]:
import numpy as np

In [72]:
marks = np.array([
    [1,0,1], 
#     [1,1,0], 
#     [0,1,1],
#     [0,1,1],
#     [0,1,1],
#     [0,1,1],
])
rels = np.array([
    [0.0, 0.1, 0.9],
#     [0.01, 0.9, 0.09],
#     [0.2, 0.3, 0.5],
#     [0.05, 0.05, 0.9],
#     [0.05, 0.15, 0.8],
#     [0, 0, 0]
])
print(rels.shape)

def compute_length(x):
    dist = np.sqrt(np.sum(x * x, axis=1))
    return dist


def cosine_similarity(u, v):

    dot_prod = np.sum(u * v, axis=1)

    length = compute_length(u) * compute_length(v)

    cosine_sim = dot_prod / length
    cosine_sim[length==0] = 0

    return cosine_sim



cosine_similarity(marks, rels)

(1, 3)


array([ 0.70278193])

In [49]:
from scipy.spatial.distance import cosine

In [48]:
np.array([0.01, 0.9, 0.09]).dot(np.array([1,1,0])) / (np.)

0.91000000000000003

In [53]:
for i in range(marks.shape[0]):
    print(1 - cosine(marks[i, :], rels[i, :]))

1.0
0.711371819486
0.917662935482
0.744097427489
0.823754471048
nan


  dist = 1.0 - np.dot(u, v) / (norm(u) * norm(v))


In [54]:
import re

In [69]:
re.match('.+fold-(\d+)$', './final-models-group/shallow-fashion-mnist-3-items-maj-seq-12-fold-5').group(1)

'5'