transformer models for language model training and tag prediction instead of LSTM's #68

mittalsuraj18 · 2018-08-15T07:12:48Z

I recently read the generative pretraining paper of openAI.
According to the benchmarks, fine-tuning the openAI model on a custom dataset takes a very less amount of time compared to a LSTM based approach.
Also the model has shown to improve SOTA in a lot of tasks.
So I was wondering if it is possible to replace the pipeline by a transformer based model implemented by OpenAI.

alanakbik · 2018-08-15T07:31:32Z

Great idea - we've been discussing this internally and really want to try it out, and compare the two approaches! Any help / pointers are appreciated :)

mittalsuraj18 · 2018-08-15T07:35:50Z

https://github.com/huggingface/pytorch-openai-transformer-lm has an implementation of transformer model in Pytorch and scripts to load openai transformer weights.
Will have a look at it this weekend and check out the feasibility of the implementation.

alanakbik · 2018-08-15T07:47:17Z

Great, thanks! Perhaps this code can be the basis of new transformer-based LanguageModel and LanguageModelTrainer classes!

stefan-it · 2018-08-16T11:37:21Z

A deep Transformer model achieves state-of-the-art results also in language modeling now, see this paper. So I think integrating such an architecture in flair would be awesome ❤️

But don't look at the evaluation section in the paper mentioned above ;) it took more than 7 days on a single Cloud TPU 😱

mittalsuraj18 · 2018-08-17T10:22:37Z

64 layers wow...
i don't think implementing such a huge network would be feasible since it would slow down the training of further models in the pipeline quite considerably. However their 12 layer network also yielded some decent results.
The concept of auxiliary losses is good and will have to test out and see how that works out.

tabergma · 2018-12-13T10:49:57Z

Small update: We are going to add the BERT embeddings (see #251) in the next release to flair. They are based on transformers.

We are still thinking of adding our own transformer model at one point. But not in the near future.

mittalsuraj18 · 2018-12-21T11:14:09Z

alright 👍

stefan-it · 2019-01-10T09:21:29Z

@alanakbik and @tabergma : Here's another great paper about a Transformer-based LM:

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

-> Yesterday they provided both a TensorFlow and PyTorch implementation of the model. I'm going to play with the implementation now, maybe I find a way to get embeddings for a sentence (like it is done with FlairEmbeddings).

alanakbik · 2019-01-10T09:32:37Z

Wow this looks really interesting!

stefan-it · 2019-01-28T16:04:20Z

Two PR's from the pytorch-pretrained-BERT repository are very interesting:

Once they're merged I would like to add them to flair :)

Training a Transformer-XL model is possible, but on one GPU I had to use a smaller Transformer model (but I'm currently do some experiments with it...)

alanakbik · 2019-01-28T16:08:37Z

Yeah that would be great! :) Also, we'd be very interested to hear about your experiments with Transformer-XL!

stefan-it · 2019-02-11T18:21:29Z

Version 0.5.0 is out now: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.0

I'll check the integration of OpenAI GPT and the Transformer-XL now :)

alanakbik · 2019-02-12T10:27:17Z

Wow awesome!

gccome · 2019-02-12T15:28:57Z

Wow this is awesome. Really look forward to transformer-based models and fine-tuning-based models.

stefan-it · 2019-02-12T22:04:28Z

Two current caveats:

OpenAI GPT needs two libraries to be installed (not covered by pytorch-pretrained-BERTs dependency management): ftfy and spacy. For spacy you also need to manually install the English model with: python -m spacy download en. Then it works fine, I was able to get embeddings of a sentence
Transformer-XL: I wasn't able to get proper embeddings, a "nan" tensor was returned. But I opened an issue, see here :)

alanakbik · 2019-02-13T07:13:06Z

Ah thanks for the update - do you know why OpenAI requires spacy, and why the English models? Only for tokenization?

thomwolf · 2019-02-13T09:37:07Z

Hi guys, I've made some update and a new release for these stuff: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.1

Keep up with the good work on flair.

stefan-it · 2019-02-14T18:09:53Z

I've implemented an early draft of TransformerXLEmbeddings + I'm currently training on CoNLL 2003 dataset. I'll report the results here soon :)

stefan-it · 2019-02-14T19:11:17Z

Bzw: Second version of GPT is out: https://github.com/openai/gpt-2/blob/master/README.md

gccome · 2019-02-15T00:30:30Z

@stefan-it In my understanding, TransformerXLEmbeddings supports varied sentences length, so it won't have out-of-index issue from BertEmbedding, because Bert has fixed length of 512. Is it correct?

alanakbik · 2019-02-15T06:33:30Z

@stefan-it @thomwolf wow that's great - really looking forward to seeing this in action! And very interested to hear how well it does on CoNLL 03 and other tasks.

stefan-it · 2019-03-20T12:50:58Z

Here's another Transformer-based architecture, that uses a new approach for pretraining (cloze-style token reconstruction task is embedded during training):

https://arxiv.org/abs/1903.07785

It also achieves new SOTA on CoNLL-2003 NER: 93.5% (compared to flair: 93.18%)

alanakbik · 2019-03-23T14:23:00Z

Very impressive results - look forward to taking a closer look at this!

stefan-it · 2019-03-26T21:44:56Z

One major drawback is the ridiculous amount of training data 🤣 Unfortunately, there's currently no implementation/model available.

stefan-it · 2019-03-26T22:07:13Z

I just asked @michaelauli if they plan to release the code and model :) [I could imagine that it will be integrated in fairseq, but this is just speculation]

stale · 2020-04-30T02:11:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mittalsuraj18 changed the title ~~transformer models for language model training and tag prediction instead if LSTM's~~ transformer models for language model training and tag prediction instead of LSTM's Aug 15, 2018

tabergma mentioned this issue Dec 13, 2018

Experimentation with transformers #91

Closed

tabergma added help wanted Extra attention is needed feature A new feature labels Dec 13, 2018

kashif mentioned this issue Feb 12, 2019

GH-68 Bert update #490

Closed

thomwolf mentioned this issue Feb 13, 2019

Transformer-XL: hidden states are nan huggingface/transformers#270

Closed

This was referenced Feb 19, 2019

Add support for Transformer-XL #522

Closed

Embeddings from OpenAI GPT models #547

Closed

stale bot added the wontfix This will not be worked on label Apr 30, 2020

stale bot closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer models for language model training and tag prediction instead of LSTM's #68

transformer models for language model training and tag prediction instead of LSTM's #68

mittalsuraj18 commented Aug 15, 2018

alanakbik commented Aug 15, 2018

mittalsuraj18 commented Aug 15, 2018 •

edited

alanakbik commented Aug 15, 2018

stefan-it commented Aug 16, 2018 •

edited

mittalsuraj18 commented Aug 17, 2018

tabergma commented Dec 13, 2018

mittalsuraj18 commented Dec 21, 2018

stefan-it commented Jan 10, 2019 •

edited

alanakbik commented Jan 10, 2019

stefan-it commented Jan 28, 2019 •

edited

alanakbik commented Jan 28, 2019

stefan-it commented Feb 11, 2019 •

edited

alanakbik commented Feb 12, 2019

gccome commented Feb 12, 2019

stefan-it commented Feb 12, 2019

alanakbik commented Feb 13, 2019

thomwolf commented Feb 13, 2019

stefan-it commented Feb 14, 2019

stefan-it commented Feb 14, 2019

gccome commented Feb 15, 2019

alanakbik commented Feb 15, 2019

stefan-it commented Mar 20, 2019

alanakbik commented Mar 23, 2019

stefan-it commented Mar 26, 2019 •

edited

stefan-it commented Mar 26, 2019 •

edited

stale bot commented Apr 30, 2020

transformer models for language model training and tag prediction instead of LSTM's #68

transformer models for language model training and tag prediction instead of LSTM's #68

Comments

mittalsuraj18 commented Aug 15, 2018

alanakbik commented Aug 15, 2018

mittalsuraj18 commented Aug 15, 2018 • edited

alanakbik commented Aug 15, 2018

stefan-it commented Aug 16, 2018 • edited

mittalsuraj18 commented Aug 17, 2018

tabergma commented Dec 13, 2018

mittalsuraj18 commented Dec 21, 2018

stefan-it commented Jan 10, 2019 • edited

alanakbik commented Jan 10, 2019

stefan-it commented Jan 28, 2019 • edited

alanakbik commented Jan 28, 2019

stefan-it commented Feb 11, 2019 • edited

alanakbik commented Feb 12, 2019

gccome commented Feb 12, 2019

stefan-it commented Feb 12, 2019

alanakbik commented Feb 13, 2019

thomwolf commented Feb 13, 2019

stefan-it commented Feb 14, 2019

stefan-it commented Feb 14, 2019

gccome commented Feb 15, 2019

alanakbik commented Feb 15, 2019

stefan-it commented Mar 20, 2019

alanakbik commented Mar 23, 2019

stefan-it commented Mar 26, 2019 • edited

stefan-it commented Mar 26, 2019 • edited

stale bot commented Apr 30, 2020

mittalsuraj18 commented Aug 15, 2018 •

edited

stefan-it commented Aug 16, 2018 •

edited

stefan-it commented Jan 10, 2019 •

edited

stefan-it commented Jan 28, 2019 •

edited

stefan-it commented Feb 11, 2019 •

edited

stefan-it commented Mar 26, 2019 •

edited

stefan-it commented Mar 26, 2019 •

edited