Parameters of the retriever in fine-tuning #9

catalwaysright · 2022-03-19T01:32:33Z

Hi! I am wondering why the retriever is frozen during fine-tuning time. I think the retriever will learn more in fine-tuning. I am not very familiar with tensorflow. Is it possible to update the parameters of the retriever during fine-tuning time with this repository? How?

The text was updated successfully, but these errors were encountered:

qqaatw · 2022-03-19T03:50:53Z

See #5 #6, and see the papers.

catalwaysright · 2022-03-19T04:40:14Z

See #5 #6, and see the papers.

Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.

qqaatw · 2022-03-19T05:06:42Z

Indeed, that is how optimization works, isn’t it? We could migrate the async index refresh here, but it requires a lot of work due to its complexity.

…

On Sat, Mar 19, 2022 at 12:40 PM catalwaysright ***@***.***> wrote: See #5 <#5> #6 <#6>, and see the papers. Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question. — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF5PKNTIBHEP4DRFOGRKEUTVAVLDRANCNFSM5RDJG35A> . You are receiving this because you commented.Message ID: ***@***.***>

catalwaysright · 2022-03-20T01:18:55Z

Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?

qqaatw · 2022-03-20T03:31:01Z

How did you download NQ?

…

On Sun, Mar 20, 2022 at 9:19 AM catalwaysright ***@***.***> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A> . You are receiving this because you commented.Message ID: ***@***.***>

catalwaysright · 2022-03-20T03:34:58Z

How did you download NQ?
…
On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>

by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory> and the structure is like this

qqaatw · 2022-03-20T04:17:29Z

The preferred way to download is using huggingface’s datasets library, which provides many utilities like caching, mapping, and filtering. The dataset’s source this library uses is also from Google. If you however want to handle them by yourself, you’ll need to design a dataset loading function in data.py that returns the same format as load_nq(). On Sun, Mar 20, 2022 at 11:35 AM catalwaysright ***@***.***> wrote:

…

How did you download NQ? … <#m_-6377894214107844352_> On Sun, Mar 20, 2022 at 9:19 AM catalwaysright *@*.*> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment) <#9 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A <https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A> . You are receiving this because you commented.Message ID: @.*> by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory> and the structure is like this [image: 1647747161(1)] <https://user-images.githubusercontent.com/60195620/159146888-6d2d70eb-322d-4b17-bafd-5df1979d36c1.png> — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF5PKNTNYLZ53DXMTUQHNJLVA2MG3ANCNFSM5RDJG35A> . You are receiving this because you commented.Message ID: ***@***.***>

catalwaysright · 2022-03-24T01:38:14Z

Thank you so much for answering my questions so patiently! I encountered another problem when running run_finetune.py with the exactly same args as your experiment. However, I got cuda out of memory like this.

I am running it on one V100 GPU with 15GB memory and I set the batch size to 1. Is it still not big enough to run this? How could I reduce the memory consumption and reproduce the experiment?

qqaatw · 2022-03-27T07:52:54Z

Hi, the fine-tune training given the default configuration can be run on single RTX 2080Ti, so V100 with 15GB mem is totally sufficient. You may find the reasons/solutions by googling the error message.

@catalwaysright Hey sorry I forgot to mention this, If you installed transformers from master, you may need to add this line model.block_embedding_to("cpu") after sending the model to GPU because the latest patch for REALM by default has block_emb tensor, which would occupy appreciable GPU memory, sent to GPU along with model.cuda().

catalwaysright · 2022-04-11T23:58:24Z

Sorry for bothering you again. Please show the specific place I should add model.block_embedding_to("cpu"), because when I add it after sending the model to GPU in run_finetune.py, it shows AttributeError: 'RealmForOpenQA' object has no attribute 'block_embedding_to'. Thanks!

qqaatw · 2022-04-12T11:09:40Z

Hi, which version of transformers are you using? You can install transformers==4.18.0, where the latest REALM patch is included.

https://huggingface.co/docs/transformers/model_doc/realm#transformers.RealmForOpenQA.block_embedding_to

catalwaysright · 2022-04-16T03:22:36Z

I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?

qqaatw · 2022-04-16T05:02:45Z

Please reserve GPU memory at least equal or greater than 2080Ti. This is the minimal requirement.

…

On Sat, Apr 16, 2022 at 11:22 AM catalwaysright ***@***.***> wrote: I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF5PKNRYQSQPLXLUS3MSFEDVFIXANANCNFSM5RDJG35A> . You are receiving this because you commented.Message ID: ***@***.***>

catalwaysright · 2022-05-18T06:49:20Z

Hi! Now I am modifying this model with multiple retrievers and I am trying to train this model. However, during the training process, I found that the retriever loss and reader loss are all 0.0 at most times while the reader loss is also often 0.0 when I was training the original model. Why would there be so many 0.0? Is this normal at the beginning or there are other tricks of training this model.

qqaatw · 2022-05-18T07:13:53Z

If there is no presence of ground truth in any retrieved context or predicted answer span, their loss will be set to zero respectively to prevent ineffective updates.

https://github.com/huggingface/transformers/blob/v4.19.2/src/transformers/models/realm/modeling_realm.py#L1662-L1663

It's likely to happen when you train the model from scratch without loading a pre-trained checkpoint like cc_news or having proper warm up.

catalwaysright · 2022-05-18T07:23:07Z

On I see! So it will be fine after more steps right?

qqaatw · 2022-05-18T07:33:07Z

For training from scratch, you should follow the steps in REALM/ORQA paper to pre-train/warmup your model; otherwise, the model is unlikely to further improve. If you were fine-tuning from cc-news or a proper pre-trained checkpoint, then you can keep training and check the improvement of the losses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameters of the retriever in fine-tuning #9

Parameters of the retriever in fine-tuning #9

catalwaysright commented Mar 19, 2022

qqaatw commented Mar 19, 2022

catalwaysright commented Mar 19, 2022

qqaatw commented Mar 19, 2022 via email

catalwaysright commented Mar 20, 2022

qqaatw commented Mar 20, 2022 via email

catalwaysright commented Mar 20, 2022

qqaatw commented Mar 20, 2022 via email

catalwaysright commented Mar 24, 2022

qqaatw commented Mar 27, 2022 •

edited

catalwaysright commented Apr 11, 2022

qqaatw commented Apr 12, 2022 •

edited

catalwaysright commented Apr 16, 2022

qqaatw commented Apr 16, 2022 via email

catalwaysright commented May 18, 2022

qqaatw commented May 18, 2022

catalwaysright commented May 18, 2022

qqaatw commented May 18, 2022

Parameters of the retriever in fine-tuning #9

Parameters of the retriever in fine-tuning #9

Comments

catalwaysright commented Mar 19, 2022

qqaatw commented Mar 19, 2022

catalwaysright commented Mar 19, 2022

qqaatw commented Mar 19, 2022 via email

catalwaysright commented Mar 20, 2022

qqaatw commented Mar 20, 2022 via email

catalwaysright commented Mar 20, 2022

qqaatw commented Mar 20, 2022 via email

catalwaysright commented Mar 24, 2022

qqaatw commented Mar 27, 2022 • edited

catalwaysright commented Apr 11, 2022

qqaatw commented Apr 12, 2022 • edited

catalwaysright commented Apr 16, 2022

qqaatw commented Apr 16, 2022 via email

catalwaysright commented May 18, 2022

qqaatw commented May 18, 2022

catalwaysright commented May 18, 2022

qqaatw commented May 18, 2022

qqaatw commented Mar 27, 2022 •

edited

qqaatw commented Apr 12, 2022 •

edited