Skip to content

Better model/tokenizer serialization, relax network connection requirements, new scripts and bug fixes

Compare
Choose a tag to compare
@thomwolf thomwolf released this 25 Apr 19:47

General updates:

  • Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with best practices for saving/loading in readme and examples.
  • Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)

Breaking changes:

  • warmup_linear method in OpenAIAdam and BertAdam is now replaced by flexible schedule classes for linear, cosine and multi-cycles schedules.

Bug fixes and improvements to the library modules:

  • add a flag in BertTokenizer to skip basic tokenization (@john-hewitt)
  • Allow tokenization of sequences > 512 (@CatalinVoss)
  • clean up and extend learning rate schedules in BertAdam and OpenAIAdam (@lukovnikov)
  • Update GPT/GPT-2 Loss computation (@CatalinVoss, @thomwolf)
  • Make the TensorFlow conversion tool more robust (@marpaia)
  • fixed BertForMultipleChoice model init and forward pass (@dhpollack)
  • Fix gradient overflow in GPT-2 FP16 training (@SudoSharma)
  • catch exception if pathlib not installed (@potatochip)
  • Use Dropout Layer in OpenAIGPTMultipleChoiceHead (@pglock)

New scripts and improvements to the examples scripts: