Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about ``passages_index'' #10

Closed
chuzhumin98 opened this issue Jul 23, 2021 · 1 comment
Closed

A question about ``passages_index'' #10

chuzhumin98 opened this issue Jul 23, 2021 · 1 comment

Comments

@chuzhumin98
Copy link

chuzhumin98 commented Jul 23, 2021

Hi, authors. I'm now going to replicate your FiD project. I'm wondering about the data preprocessing strategies.

I found that the ''passages_index'' of Natural Questions and triviaqa datasets are just downloaded from the URL link ''https://dl.fbaipublicfiles.com/FiD/data/[dataset-name].tar.gz''. However, I could not find details about how to generate these passages_index files. Would the passages just be ranked based on the descending order of the Lucene-BM25 scores (excluding the passages that do not contain answers)? Or you adopted other methods to generate the passages_index?

Looking forward to your reply.

@gizacard
Copy link
Contributor

Hi,

The passages we have released in our repository have been obtained by distilling the reader into the retriever, the method is described here: https://arxiv.org/pdf/2012.04584.pdf. The retriever can be downloaded from the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants