Bug fix update to load the pretrained `TransfoXLModel` from s3, added fallback for OpenAIGPTTokenizer when SpaCy is not installed
Mostly a bug fix update for loading the TransfoXLModel from s3:
- Fixes a bug in the loading of the pretrained
TransfoXLModelfrom the s3 dump (which is a convertedTransfoXLLMHeadModel) in which the weights were not loaded. - Added a fallback of
OpenAIGPTTokenizeron BERT'sBasicTokenizerwhen SpaCy and ftfy are not installed. Using BERT'sBasicTokenizerinstead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use thenever_splitoption to avoid splitting special tokens like[CLS], [SEP]...which is easier than adding the tokens after tokenization. - Updated the README on the tokenizers options and methods which was lagging behind a bit.