Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
💫 Add experimental ULMFit/BERT/Elmo-like pretraining #2931
Add support for a new command,
My solution is to instead load in a pre-trained vectors file, and use the vector-space as the objective. This means we only need to predict a 300d vector for each word, instead of trying to softmax over 10,000 IDs or whatever. It also means the vocabulary we can learn is very large, which is quite satisfying.
To make it easier to use the pre-trained weights,
I thought of this trick a long time ago, and had it implemented in a half-finished state in the
In preliminary tests, I've already achieved pretty strong improvements for text classification over small training sizes. Training on 1000 documents from the IMDB corpus, with pre-training I'm able to reach 87% accuracy, which is roughly what Jeremy and Sebastian report in their ULMFit paper (Figure 3). Without pre-training, the best I could get to was 85%.
Interestingly, the technique seems to work better if the vectors are also used as part of the input. I find this completely surprising -- I expected the opposite. I don't know what's going on with this, but I've currently hard-coded that the vectors should be used as features.
The obvious thing to try is running something like ULMFit or BERT as the target for the CNN to learn from, rather than just using static vectors. I expect that should work better.
Nov 15, 2018
referenced this pull request
Nov 24, 2018
Quick update in case people revisit this: