Just a bunch of end to end architectures for audio classification. The input to the architecture is a vector of the raw audio signal. The output is the softmax layer which classifies the audio to 10 classes. Models derived from following papers:
RawNet: Jung, Jee-weon, et al. "Rawnet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification." arXiv preprint arXiv:1904.08104 (2019).
1DCNN, 1D Gammatone: Abdoli, Sajjad, Patrick Cardinal, and Alessandro Lameiras Koerich. "End-to-end environmental sound classification using a 1D convolutional neural network." Expert Systems with Applications 136 (2019): 252-263.
ENVNETV2: Y. Tokozume, Y. Ushiku, and T. Harada, “Learning from between-class examples for deep sound recognition,”arXiv preprint1711.10282, 2017.