Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters of the retriever in fine-tuning #9

Open
catalwaysright opened this issue Mar 19, 2022 · 17 comments
Open

Parameters of the retriever in fine-tuning #9

catalwaysright opened this issue Mar 19, 2022 · 17 comments

Comments

@catalwaysright
Copy link

Hi! I am wondering why the retriever is frozen during fine-tuning time. I think the retriever will learn more in fine-tuning. I am not very familiar with tensorflow. Is it possible to update the parameters of the retriever during fine-tuning time with this repository? How?

@qqaatw
Copy link
Owner

qqaatw commented Mar 19, 2022

See #5 #6, and see the papers.

@catalwaysright
Copy link
Author

See #5 #6, and see the papers.

Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.

@qqaatw
Copy link
Owner

qqaatw commented Mar 19, 2022 via email

@catalwaysright
Copy link
Author

Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?

@qqaatw
Copy link
Owner

qqaatw commented Mar 20, 2022 via email

@catalwaysright
Copy link
Author

How did you download NQ?

On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>

by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory> and the structure is like this
1647747161(1)

@qqaatw
Copy link
Owner

qqaatw commented Mar 20, 2022 via email

@catalwaysright
Copy link
Author

Thank you so much for answering my questions so patiently! I encountered another problem when running run_finetune.py with the exactly same args as your experiment. However, I got cuda out of memory like this.
image
I am running it on one V100 GPU with 15GB memory and I set the batch size to 1. Is it still not big enough to run this? How could I reduce the memory consumption and reproduce the experiment?

@qqaatw
Copy link
Owner

qqaatw commented Mar 27, 2022

Hi, the fine-tune training given the default configuration can be run on single RTX 2080Ti, so V100 with 15GB mem is totally sufficient. You may find the reasons/solutions by googling the error message.

@catalwaysright Hey sorry I forgot to mention this, If you installed transformers from master, you may need to add this line model.block_embedding_to("cpu") after sending the model to GPU because the latest patch for REALM by default has block_emb tensor, which would occupy appreciable GPU memory, sent to GPU along with model.cuda().

@catalwaysright
Copy link
Author

Sorry for bothering you again. Please show the specific place I should add model.block_embedding_to("cpu"), because when I add it after sending the model to GPU in run_finetune.py, it shows AttributeError: 'RealmForOpenQA' object has no attribute 'block_embedding_to'. Thanks!

@qqaatw
Copy link
Owner

qqaatw commented Apr 12, 2022

Hi, which version of transformers are you using? You can install transformers==4.18.0, where the latest REALM patch is included.

https://huggingface.co/docs/transformers/model_doc/realm#transformers.RealmForOpenQA.block_embedding_to

@catalwaysright
Copy link
Author

I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?

@qqaatw
Copy link
Owner

qqaatw commented Apr 16, 2022 via email

@catalwaysright
Copy link
Author

Hi! Now I am modifying this model with multiple retrievers and I am trying to train this model. However, during the training process, I found that the retriever loss and reader loss are all 0.0 at most times while the reader loss is also often 0.0 when I was training the original model. Why would there be so many 0.0? Is this normal at the beginning or there are other tricks of training this model.

@qqaatw
Copy link
Owner

qqaatw commented May 18, 2022

If there is no presence of ground truth in any retrieved context or predicted answer span, their loss will be set to zero respectively to prevent ineffective updates.

https://github.com/huggingface/transformers/blob/v4.19.2/src/transformers/models/realm/modeling_realm.py#L1662-L1663

It's likely to happen when you train the model from scratch without loading a pre-trained checkpoint like cc_news or having proper warm up.

@catalwaysright
Copy link
Author

On I see! So it will be fine after more steps right?

@qqaatw
Copy link
Owner

qqaatw commented May 18, 2022

For training from scratch, you should follow the steps in REALM/ORQA paper to pre-train/warmup your model; otherwise, the model is unlikely to further improve. If you were fine-tuning from cc-news or a proper pre-trained checkpoint, then you can keep training and check the improvement of the losses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants