Release Better model/tokenizer serialization, relax network connection requirements, new scripts and bug fixes · huggingface/transformers

General updates:

Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with best practices for saving/loading in readme and examples.
Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)

warmup_linear method in OpenAIAdam and BertAdam is now replaced by flexible schedule classes for linear, cosine and multi-cycles schedules.

add a flag in BertTokenizer to skip basic tokenization (@john-hewitt)
Allow tokenization of sequences > 512 (@CatalinVoss)
clean up and extend learning rate schedules in BertAdam and OpenAIAdam (@lukovnikov)
Update GPT/GPT-2 Loss computation (@CatalinVoss, @thomwolf)
Make the TensorFlow conversion tool more robust (@marpaia)
fixed BertForMultipleChoice model init and forward pass (@dhpollack)
Fix gradient overflow in GPT-2 FP16 training (@SudoSharma)
catch exception if pathlib not installed (@potatochip)
Use Dropout Layer in OpenAIGPTMultipleChoiceHead (@pglock)

Add BERT language model fine-tuning scripts (@Rocketknight1)
Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (@ananyahjha93, @jplehmann)
GPT-2 generation fixes (@CatalinVoss, @spolu, @dhanajitb, @8enmann, @SudoSharma, @cynthia)