<a href="https://colab.research.google.com/github/mvirag2000/Machine-Learning/blob/master/Softmax_Study.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Softmax study
First, produce a plausible vector of ReLU outputs, Z.  I am used to (classes x cases), i.e. Z is a column vector.

In [0]:
import numpy as np
from numpy.random import default_rng
np.random.seed(1)
features = 6
rng = default_rng()
Z = rng.standard_normal(features)**2
Z = Z.reshape((Z.shape[0],1))
print(Z)

Next, run Z through Softmax to produce A.

In [0]:
def softmax(x):
  x_shift = x - np.max(x)
  return np.exp(x_shift) / np.sum(np.exp(x_shift)) 
A = softmax(Z)
print(A)

Practice this transform with integers first

In [0]:
def derivative(A): # This is how Eli says to backprop SoftMax 
  # https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ 
  X = np.outer(A, A)
  I = np.identity(X.shape[0])
  D = A * I - X 
  return D
R = rng.integers(7, size=(1,5))
print(R)
print(derivative(R))

Generate a simulated one-hot Y vector.

In [0]:
hot = rng.integers(features)
Y = np.zeros((features,1))  
Y[hot] = 1
print(Y) 

Now this should be the back propagation:

In [0]:
dA = A - Y
print(dA)
dZ = np.sum(derivative(dA), axis=0)
print(dZ)

That worked great but what if A has multiple cases?  There seems to be no pythonic way to do this, i.e. multiply with axis, so I give up and use a loop.

In [0]:
dA_mult = np.hstack((dA, dA, dA)) 
feat = dA_mult.shape[0]
print(dA_mult)  
dZ = np.ones((feat, 1))  
for case in range(dA_mult.shape[1]):
  C = dA_mult[:, case] 
  dZ_temp = np.sum(derivative(C), axis=0)
  dZ_temp = dZ_temp.reshape((feat, 1))   
  dZ = np.hstack((dZ, dZ_temp))
dZ = np.delete(dZ, 0, axis=1)
print(dZ)

Actually that did not work so well because the whole exercise only succeeds in setting dZ = dA.  That can't be right. 