Conversational Bot using Transformer Model
The core idea behind the Transformer model is self-attention—the ability to attend to different positions of the input sequence to compute a representation of that sequence.
Transformer creates stacks of self-attention layers and is explained in the sections Scaled dot product attention and Multi-head attention.