Iterative retrieval in case of non-unique top-k retrieval #25

dhdhagar · 2022-04-11T04:43:52Z

Hi! Thanks for this amazing work, and for making your code open-source.

I'm trying to figure out where in the code is non-unique passage retrieval handled that ensures that the final k results are unique. According to this footnote on page 3 in your paper "Phrase Retrieval Learns Passage Retrieval, Too", it seems that you perform iterative retrieval to achieve this. Could you point me to the code where this is happening?

The text was updated successfully, but these errors were encountered:

jhyuklee · 2022-04-11T05:10:15Z

Hi @dhdhagar! Thanks for your issue.

In this makefile line:

DensePhrases/Makefile

Line 490 in 3a0365e

--top_k 200 \

you can see that we are retrieving top 200 phrases first. And the aggregation based on the unique passages happen here:

https://github.com/princeton-nlp/DensePhrases/blob/main/densephrases/index.py#L430

For our current datasets provided, this is enough to ensure outputting 100 passages per query. There are a very small number of edge cases where top 100 passages are not retrieved even with this setting, and you can enlarge the top-k to 400 in the makefile. Currently there's no automatic procedure for this.

dhdhagar · 2022-04-11T05:20:09Z

That makes it clear, thank you! So, --top_k is used to fetch a larger number of phrases, and the length of the final results depends on --psg_top_k.

dhdhagar closed this as completed Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative retrieval in case of non-unique top-k retrieval #25

Iterative retrieval in case of non-unique top-k retrieval #25

dhdhagar commented Apr 11, 2022

jhyuklee commented Apr 11, 2022

dhdhagar commented Apr 11, 2022

Iterative retrieval in case of non-unique top-k retrieval #25

Iterative retrieval in case of non-unique top-k retrieval #25

Comments

dhdhagar commented Apr 11, 2022

jhyuklee commented Apr 11, 2022

dhdhagar commented Apr 11, 2022