Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.48 KB

efficient-char-level-document-classification-cnn-rnn.md

File metadata and controls

17 lines (12 loc) · 1.48 KB

TLDR; The authors use a CNN to extract features from character-based document representations. These features are then fed into a RNN to make a final prediction. This model, called ConvRec, has significantly fewer parameters (10-50x) then comparable convolutional models with more layers, but achieves similar to better performance on large-scale document classification tasks.

Key Points

  • Shortcomings of word-level approach: Each word is distinct despite common roots, cannot handle OOV words, many parameters.
  • Character-level Convnets need many layers to capture long-term dependencies due to the small sizes of the receptive fields.
  • Network architecture: 1. Embedding 8-dim 2. Convnet: 2-5 layers, 5 and 3-dim convolutions, 2-dim pooling, ReLU activation, 3. RNN LSTM with 128d hidden state. Dropout after conv and recurrent layer.
  • Training: 96 characters, Adadelta, batch size of 128, Examples are padded and masked to longest sequence in batch, gradient norm clipping of 5, early stopping
  • Models tends to outperform large CNN for smaller datasets. Maybe because of overfitting?
  • More convolutional layers or more filters doesn't impact model performance much

Notes/Questions

  • Would've been nice to graph the effect of #params on the model performance. How much do additional filters and conv layers help?
  • hat about training time? How does it compare?