Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Reproduce Passage Retrieval Results on NQ #21

Closed
alexlimh opened this issue Nov 20, 2021 · 9 comments
Closed

Unable to Reproduce Passage Retrieval Results on NQ #21

alexlimh opened this issue Nov 20, 2021 · 9 comments

Comments

@alexlimh
Copy link

Hi Jinhyuk,

I was trying to reproduce the third row of Table 1 in your paper (https://arxiv.org/pdf/2109.08133.pdf). I'm using the index and pre-trained ckpt on NQ you gave me several days ago. Here's my results:

Top-1 = 34.32%
Top-5 = 54.13%
Top-20 = 66.59%
Top-100 = 76.43%
Acc@1 when Acc@100 = 44.91%
MRR@20 = 43.12
P@20 = 14.61

Here's the command I use:

make eval-index-psg MODEL_NAME=densephrases-nq-query-nq DUMP_DIR=densephrases-nq_wiki-20181220-p100/dump/ TEST_DATA=open-qa/nq-open/test_preprocessed.json

Any idea what I might do wrong? Thanks in advance.

Minghan

@jhyuklee
Copy link
Member

This is strange. I used exactly the same setting and it was the same as the one in the paper. How does the EM top-1 look like?

@jhyuklee
Copy link
Member

It would be nice if you could post the exact command that was run by eval-index-psg. You might have missed --truecase option if you use TEST_DATA instead of nq-open-data dependency.

@alexlimh
Copy link
Author

It would be nice if you could post the exact command that was run by eval-index-psg. You might have missed --truecase option if you use TEST_DATA instead of nq-open-data dependency.

Oh, I see. That could very well be the reason. I would try the original command and see if it works.

Thanks!

@alexlimh
Copy link
Author

Ah it's because I didn't use the nq-open-data dependency as you mentioned.
Now the results look good:

Top-1 = 45.41%
Top-5 = 63.01%
Top-20 = 73.31%
Top-100 = 81.71%
Acc@1 when Acc@100 = 55.57%
MRR@20 = 53.33
P@20 = 14.48

Thanks again!

@jhyuklee
Copy link
Member

It still looks bit lower. You should be able to achieve the same performance from the paper.

@jhyuklee
Copy link
Member

Posting the entire command will help :)

@alexlimh
Copy link
Author

alexlimh commented Nov 21, 2021

Right, here's the command that I used:

make eval-index-psg MODEL_NAME=densephrases-nq-query-nq DUMP_DIR=densephrases-nq_wiki-20181220-p100/dump/
# agg_strat=opt2 means passage retrieval
eval-index-psg: dump-dir model-name large-index nq-open-data
	python eval_phrase_retrieval.py \
		--run_mode eval \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--cuda \
		--dump_dir $(DUMP_DIR) \
		--index_name start/$(NUM_CLUSTERS)_flat_$(INDEX_TYPE) \
		--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--test_path $(DATA_DIR)/$(TEST_DATA) \
		--save_pred \
		--aggregate \
		--agg_strat opt2 \
		--top_k 200 \
		--eval_psg \
		--psg_top_k 100 \
		$(OPTIONS)

@alexlimh alexlimh reopened this Nov 21, 2021
@jhyuklee
Copy link
Member

I don't think there's a problem here. Did you change the code? or is this for NaturalQuestions?

@alexlimh
Copy link
Author

you are right, it seems that I've changed some other codes in the make file which caused the difference.
I re-downloaded the repo and got the correct results:

Top-1 = 50.06%
Top-5 = 69.53%
Top-20 = 79.78%
Top-100 = 85.04%
Acc@1 when Acc@100 = 58.86%
MRR@20 = 58.69
P@20 = 20.55

Thanks, Jinhyuk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants