- Softmax is an activation function and is used in the last layer in the neural network like CNN, DNN,...
- Input of softmax function is real vector z = {z1...zC} with C is number of class.
- Output of softmax function is real(probability) vector a = {a1...aC} and sum of vector a is equal to 1.
- The simplest way is to directly implement the initial softmax expression. But it has some problems.
+ Firstly: Since zi is a real number, it can lead to the value of exp(zi) becoming too large, thus consuming more resouces to store this value.
+ Next: In this mathetical model, there is a division operator, which also consumes a large mount of resouces to perform.
- Improving the aforementioned problems.
+ Downscaling value of exp(zi) to exp(zi - zmax) with zmax is max value of vector input z.
+ Transfroming the expression for removing the division operator.
- Block diagram of softmax
- This module captures the 32-bit floating point single precision input, converting it to 16-bit.
- The design was synthesized and implemented on Xilinx's Zedboard using Vivado 2018.3.
- Implement report (constraint clock with 14ns cycle).
- The module was simulated with the input vector X = {-4,541; -4,22; -0,464; 4,684; 3,524}
- The image below shows the softmax hardware and softmax software output.
- Max error with a sample input above is 3.e-3.
6. Experiment - Project Source
- RTL code was packaged with slave AXI4-Lite, slave and master AXI4-Stream into a IP core. And it was intergated into a SOC.
- The image below shows the SOC with Zynq PS, DMA IP, Softmax IP and some other blocks.
- SOC EXECUTION FLOW:
+ Input data will be initialized on DDR.
+ DMA IP reads it and sends to Softmax IP through AXI4-Stream Master.
+ After Softmax IP computes completely, the output data will be sent to DMA IP through AXI4-Stream as well.
+ DMA IP transfers that data into DDR.
- ILA (Intergrated Logic Analyzer) was used to monitor AXI4-Stream interface in Softmax IP.
- Input in Simulation
- Input in ILA tool.
- Output in Simulation.
- Output in ILA tool.
- The data line of AXI4-Stream has 32 bits but only the first 16 bits was used to represent data.
- With all of the above results, We can conclude that the module hardware perform the Softmax function correctly.













