# CTC loss Python examples
Test python script to generate various examples for CTC loss and gradien calculation
Let's start with importing tensorflow

In [125]:
import tensorflow as tf
from tensorflow.python.ops import math_ops
from tensorflow.python.framework import dtypes

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.11.0


Let's define two vaiables: one with the hypothetical truth, and the other, the logist (basically this is what comes out of your model). Please note, that python's ctc calculation requires the labels to be provided as a list of numbers that correspond to the embedding's index, whereas the TFJS implementation does not allow you to do that, since the labels' shape must be tha same as the logits' shape. So the thing in TFJS, you would have a tensor like this:
``` JS
[[
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1]
]]
```
whereas in Python, you would have this:
``` python
[[0, 1, 2, 3]]
```
Also, take notice, that the default structure of the logits is a Tensor of shape [frames, batch_size, num_labels]. If logits_time_major == False, shape is [batch_size, frames, num_labels]. The JS implementeation goes with the latter structure.

In [202]:
label = [[0, 1, 0, 1, 3]]
#	[batch_size, frames, num_labels].
logits = [[
    [1.0, 0.0, 0.0, 0.0], 
    [0.0, 0.0, 0.0, 1.0], 
    [1.0, 0.0, 0.0, 0.0], 
    [0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 0.0, 1.0]
]]
labels_length = 4
logits_length = 5

Let's convert everything to tensors, because that's what TF likes

In [203]:
labels_tensor = tf.convert_to_tensor(label, dtype=tf.int32)
logits_tensor = tf.Variable(tf.convert_to_tensor(logits, dtype=tf.float32))
labels_length_tensor = tf.convert_to_tensor([labels_length], dtype=tf.int32)
logits_length_tensor = tf.convert_to_tensor([logits_length], dtype=tf.int32)

Playing around with TF features used by the python implementation: the `tf.nn.ctc_loss` performs a log_softmax on the input to normalize the inputs. This approach comes from the implementation, since it generates the gradients relative to the *unnormalized* inputs. So, the softmax must be inside of the loss calculation. 
For future reference: log_softmax is just softmax, but then the logarithm is calculated elementwise.
The alg. frequently calculates with infinities. So I just put in there some of the measures they are using here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/ctc_ops.py#L880-L1033 The main thing is, if you see losses in the 700's region, it is practically infinity.

In [204]:
print("softmax(logits):", tf.nn.softmax(logits_tensor).numpy())
print("log(softmax(logits)):", tf.math.log(tf.nn.softmax(logits_tensor)).numpy())
print("log_softmax logits:", tf.nn.log_softmax(logits_tensor).numpy())
print("log(zero):", math_ops.log(0.0))
print("casted logZero for TPU:", math_ops.cast(math_ops.log(math_ops.cast(0, dtypes.float64) + 1e-307), dtypes.float32))

softmax(logits): [[[0.47536692 0.17487772 0.17487772 0.17487772]
  [0.17487772 0.17487772 0.17487772 0.47536692]
  [0.47536692 0.17487772 0.17487772 0.17487772]
  [0.17487772 0.17487772 0.17487772 0.47536692]
  [0.17487772 0.17487772 0.17487772 0.47536692]]]
log(softmax(logits)): [[[-0.7436683 -1.7436683 -1.7436683 -1.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]
  [-0.7436683 -1.7436683 -1.7436683 -1.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]]]
log_softmax logits: [[[-0.7436683 -1.7436683 -1.7436683 -1.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]
  [-0.7436683 -1.7436683 -1.7436683 -1.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]
  [-1.7436683 -1.7436683 -1.7436683 -0.7436683]]]
log(zero): tf.Tensor(-inf, shape=(), dtype=float32)
casted logZero for TPU: tf.Tensor(-706.8936, shape=(), dtype=float32)


Let's calculate the CTC loss, and the gradient. Your other implementation must be able to generate these numbers.

In [205]:
with tf.GradientTape() as tape:
    tape.watch(logits_tensor)
    loss = tf.nn.ctc_loss(
        labels_tensor, 
        logits_tensor, 
        labels_length_tensor, 
        logits_length_tensor, 
        logits_time_major=False,
        blank_index=-1,
        name="test"
    )
    grads = tape.gradient(loss, logits_tensor)

    print("CTC loss:", loss.numpy())
    print("CTC gradients: ", grads.numpy())

CTC loss: [4.448741]
CTC gradients:  [[[-0.51064587  0.17487772  0.17487772  0.16089036]
  [ 0.12286875 -0.6697599   0.17487772  0.37201348]
  [-0.29322746 -0.01850624  0.17487772  0.13685611]
  [-0.15988137 -0.20941935  0.17487772  0.1944232 ]
  [ 0.17487772 -0.5441786   0.17487772  0.19442326]]]


This is just another test run on a random uniform distribution.

In [206]:
test_label = tf.random.uniform(
    [1, labels_length],
    minval=1, 
    maxval=logits_length, 
    dtype=tf.int64
)
print('test_label', test_label.numpy())
# [num_frames, batch_size, num_labels]
test_logits = tf.random.uniform([4, 1, 4])
print('test_logits', test_logits.numpy())

test_label [[1 2 4 3]]
test_logits [[[0.84265196 0.7786586  0.4704218  0.3375883 ]]

 [[0.00791407 0.37151897 0.74040926 0.21611357]]

 [[0.4604901  0.52297366 0.22432959 0.20178378]]

 [[0.5362582  0.7950176  0.8498398  0.67813313]]]
