A question about ``passages_index'' #10

chuzhumin98 · 2021-07-23T09:47:51Z

Hi, authors. I'm now going to replicate your FiD project. I'm wondering about the data preprocessing strategies.

I found that the ''passages_index'' of Natural Questions and triviaqa datasets are just downloaded from the URL link ''https://dl.fbaipublicfiles.com/FiD/data/[dataset-name].tar.gz''. However, I could not find details about how to generate these passages_index files. Would the passages just be ranked based on the descending order of the Lucene-BM25 scores (excluding the passages that do not contain answers)? Or you adopted other methods to generate the passages_index?

Looking forward to your reply.

gizacard · 2021-08-24T16:20:31Z

Hi,

The passages we have released in our repository have been obtained by distilling the reader into the retriever, the method is described here: https://arxiv.org/pdf/2012.04584.pdf. The retriever can be downloaded from the repo.

gizacard closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about ``passages_index'' #10

A question about ``passages_index'' #10

chuzhumin98 commented Jul 23, 2021 •

edited

gizacard commented Aug 24, 2021

A question about ``passages_index'' #10

A question about ``passages_index'' #10

Comments

chuzhumin98 commented Jul 23, 2021 • edited

gizacard commented Aug 24, 2021

chuzhumin98 commented Jul 23, 2021 •

edited