New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformer models for language model training and tag prediction instead of LSTM's #68
Comments
Great idea - we've been discussing this internally and really want to try it out, and compare the two approaches! Any help / pointers are appreciated :) |
|
Great, thanks! Perhaps this code can be the basis of new transformer-based |
A deep Transformer model achieves state-of-the-art results also in language modeling now, see this paper. So I think integrating such an architecture in But don't look at the evaluation section in the paper mentioned above ;) it took more than 7 days on a single Cloud TPU 😱 |
64 layers wow... |
Small update: We are going to add the BERT embeddings (see #251) in the next release to flair. They are based on transformers. We are still thinking of adding our own transformer model at one point. But not in the near future. |
alright 👍 |
@alanakbik and @tabergma : Here's another great paper about a Transformer-based LM: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context -> Yesterday they provided both a TensorFlow and PyTorch implementation of the model. I'm going to play with the implementation now, maybe I find a way to get embeddings for a sentence (like it is done with FlairEmbeddings). |
Wow this looks really interesting! |
Two PR's from the Once they're merged I would like to add them to Training a Transformer-XL model is possible, but on one GPU I had to use a smaller Transformer model (but I'm currently do some experiments with it...) |
Yeah that would be great! :) Also, we'd be very interested to hear about your experiments with Transformer-XL! |
Version 0.5.0 is out now: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.0 I'll check the integration of OpenAI GPT and the Transformer-XL now :) |
Wow awesome! |
Wow this is awesome. Really look forward to transformer-based models and fine-tuning-based models. |
Two current caveats:
|
Ah thanks for the update - do you know why OpenAI requires spacy, and why the English models? Only for tokenization? |
Hi guys, I've made some update and a new release for these stuff: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.1 Keep up with the good work on flair. |
I've implemented an early draft of |
Bzw: Second version of GPT is out: https://github.com/openai/gpt-2/blob/master/README.md |
@stefan-it In my understanding, TransformerXLEmbeddings supports varied sentences length, so it won't have out-of-index issue from BertEmbedding, because Bert has fixed length of 512. Is it correct? |
@stefan-it @thomwolf wow that's great - really looking forward to seeing this in action! And very interested to hear how well it does on CoNLL 03 and other tasks. |
Here's another Transformer-based architecture, that uses a new approach for pretraining (cloze-style token reconstruction task is embedded during training): https://arxiv.org/abs/1903.07785 It also achieves new SOTA on CoNLL-2003 NER: 93.5% (compared to flair: 93.18%) |
Very impressive results - look forward to taking a closer look at this! |
One major drawback is the ridiculous amount of training data 🤣 Unfortunately, there's currently no implementation/model available. |
I just asked @michaelauli if they plan to release the code and model :) [I could imagine that it will be integrated in |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I recently read the generative pretraining paper of openAI.
According to the benchmarks, fine-tuning the openAI model on a custom dataset takes a very less amount of time compared to a LSTM based approach.
Also the model has shown to improve SOTA in a lot of tasks.
So I was wondering if it is possible to replace the pipeline by a transformer based model implemented by OpenAI.
The text was updated successfully, but these errors were encountered: