Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the code supports for the entire end-to-end fine-tuning including the retriever ? #4

Open
shamanez opened this issue Jan 29, 2022 · 7 comments

Comments

@shamanez
Copy link

The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.

if yes: we might able to compare results with RAG-end2end

https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

@qqaatw
Copy link
Owner

qqaatw commented Jan 29, 2022

As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.

@shamanez
Copy link
Author

shamanez commented Jan 29, 2022 via email

@qqaatw
Copy link
Owner

qqaatw commented Jan 29, 2022

Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.

@shamanez
Copy link
Author

Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.

@robbohua
Copy link

robbohua commented Feb 1, 2022

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

@qqaatw
Copy link
Owner

qqaatw commented Feb 2, 2022

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

It was part of the roadmap, but now I'm thinking whether this is worth to port.

You can see the configuration of their experiments:

Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate
of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16
TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅

which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.

@shamanez
Copy link
Author

shamanez commented Feb 3, 2022

@qqaatw "It was part of the roadmap, but now I'm thinking whether this is worth port." Yeah, this seems a problem and I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants