Softmax in Model Output, then using CE Loss #23

kerrgarr · 2021-12-09T18:53:56Z

Thank you for the interesting work here.

I've just encountered one issue with the code. The ConvLSTM model outputs softmax as the last layer, but then in the training script CrossEntropyLoss is performed. CE Loss already performs a softmax on the input, so you do not want to do softmax on a softmax twice. Instead, the ConvLSTM should output the classification (Linear) layer prior to the Softmax to put into CE loss. The softmax probabilities can be computed later in the test set evaluation step to determine the test accuracy.

Please let me know if others agree with this small change to the code.

Also, what type of Attention is being used? Is it the dot-product?

takekbys · 2022-09-12T07:53:41Z

Same understanding. Softmax in the ConvLSTM.output_layers is not necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax in Model Output, then using CE Loss #23

Softmax in Model Output, then using CE Loss #23

kerrgarr commented Dec 9, 2021

takekbys commented Sep 12, 2022

Softmax in Model Output, then using CE Loss #23

Softmax in Model Output, then using CE Loss #23

Comments

kerrgarr commented Dec 9, 2021

takekbys commented Sep 12, 2022