Question about reader training. #69
Comments
I found that there used If I construct the training data of reader by myself, can I keep this file unchanged, or what should I note? The retriever training data I used was released by your DPR work, and finally we used 58812 queries of them. |
Or maybe I made some mistake when I construct test pkl file? I used the top 100 retrieved results of 3610 test data and construct a json file same as your Did I make some mistake? |
Our reader training data is taken directly from retriever results - the is the idea of our paper. But for the reader training and those datasets which have gold passages provided, we have some special reader data filtering logic. |
gold-info is the parameter to those datasets with ctx+ provided - reader training only data composition logic has some special heuristics when those are available. |
Get it, so I need to construct a gold-info file of my training data that each query contains a positive passage (first passage of Thank you so much for your friendly help! P.S. So it's OK I used same test-gold-info same as yours? Since the test data seems seem that contains 3610 queries. |
Hi, |
Hi Vladimir, Thank you! |
Hi, |
Thank you! I have another question again :) In Line 96 in 27a8436
I'd like to ask the effect of include_gold_passage , I notice that it is default to be False, how the performance change if I change it to True?
Thank you again! |
Hi, |
guess I can close this now |
Hi, I'm here again :)
I tried to use the test data constructed by my retrieved passage in NQ dataset to test the reader model trained by your provided training data, but the effect is not very good although it has pretty good retrieval performance.
I feel that the problem maybe that the training data does not match my data, so I would like to ask how your training data of reader is structured? Such as what is the query and the passage source?
Thank you!
The text was updated successfully, but these errors were encountered: