TOP-K results on NQ datasets #136
Comments
Hi, |
Thanks,but when I run generate_embedding.py ,I only generate one file about 1.3G with default settings. So how to obtain a list of glob-expressions
发自我的iPhone
… 在 2021年4月20日,上午1:17,vlad-karpukhin ***@***.***> 写道:
Hi,
encoded_ctx_files parameter is supposed to be a list of glob-expressions.
You need to make it an expressions to point to ALL of the embeddings, not just a single file.
So, if you generated files have pattern /home/v-nuochen/DPR/outputs/2021-04-17/08-29-34/nq-generate-emd_, please use the following param value (with double quotes):
encoded_ctx_files=["/home/v-nuochen/DPR/outputs/2021-04-17/08-29-34/nq-generate-emd_"]
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks,but when I run generate_embedding.py ,I only generate one file about 1.3G with default settings. So how to obtain a list of glob-expressions
发自我的iPhone
… 在 2021年4月20日,上午1:17,vlad-karpukhin ***@***.***> 写道:
Hi,
encoded_ctx_files parameter is supposed to be a list of glob-expressions.
You need to make it an expressions to point to ALL of the embeddings, not just a single file.
So, if you generated files have pattern /home/v-nuochen/DPR/outputs/2021-04-17/08-29-34/nq-generate-emd_, please use the following param value (with double quotes):
encoded_ctx_files=["/home/v-nuochen/DPR/outputs/2021-04-17/08-29-34/nq-generate-emd_"]
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
When you use |
Thank you for your answering again! But when I set
python generate_dense_embeddings.py \
model_file=/home/v-nuochen/DPR/outputs/2021-04-15/05-14-43/nq_results/dpr_biencoder.37 \
shard_id=50 num_shards=50 \
out_file=nq-generate-emb
It still generates only one embedding file.
… 2021年4月20日 下午3:06,hnt4499 ***@***.***> 写道:
When you use num_shards=m, you should issue m commands, each with shard_id ranging from 0 to m - 1
|
Firstly, you should only set Secondly, you need to issue 50 separate commands, each with a different |
OK thx On 04/20/2021 16:19, hnt4499 wrote:
Firstly, you should only set shard_id up to 49.
Secondly, you need to issue 50 separate commands, each with a different shard_id, ranging from 0 to 49. Each command will then generate an embedding file of the corresponding shard (with the shard id as the suffix)
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
[
{
***@***.***": "http://schema.org",
***@***.***": "EmailMessage",
"potentialAction": {
***@***.***": "ViewAction",
"target": "#136 (comment)",
"url": "#136 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
***@***.***": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
50 shards is also a configurable parameter, but single generate_dense_embeddings.py run produces only one of them |
font{
line-height: 1.6;
}
ul,ol{
padding-left: 20px;
list-style-position: inside;
}
Thank you!!!
On 04/20/2021 ***@***.***> wrote:
50 shards is also a configurable parameter, but single generate_dense_embeddings.py run produces only one of them
You can sure set shard_id=0 num_shards=1 and generate all embeddings in one go but this is going to take 40-60 hours on a 2 gpu server.
As @hnt4499 correctly noted above, please generate all shards by running multiple generate_dense_embeddings.py commands - one per each shard_id.
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi, I run dense_retriver.py to obtain results on NQ dataset. But I get following, which is far behind the results in the paper.
Validation results: top k documents hits accuracy [0.13268698060941828, 0.17506925207756233, 0.20498614958448755, 0.22548476454293628, 0.24681440443213296, 0.2614958448753463, 0.2742382271468144, 0.2839335180055402, 0.29418282548476454, 0.30193905817174516, 0.3080332409972299, 0.31329639889196675, 0.31994459833795014, 0.3238227146814404, 0.328808864265928, 0.33518005540166207, 0.3371191135734072, 0.3401662049861496, 0.3443213296398892, 0.34709141274238225, 0.3518005540166205, 0.35650969529085874, 0.35983379501385043, 0.3634349030470914, 0.36925207756232686, 0.3717451523545706, 0.3731301939058172, 0.37590027700831025, 0.378393351800554, 0.3797783933518006, 0.3817174515235457, 0.3847645429362881, 0.38725761772853184, 0.3897506925207756, 0.3914127423822715, 0.3930747922437673, 0.3939058171745152, 0.3958448753462604, 0.3972299168975069, 0.4, 0.4005540166204986, 0.4024930747922438, 0.40470914127423824, 0.4069252077562327, 0.40941828254847645, 0.4113573407202216, 0.41301939058171744, 0.4138504155124654, 0.4149584487534626, 0.41606648199445984, 0.41772853185595565, 0.42049861495844876, 0.4224376731301939, 0.42382271468144045, 0.4249307479224377, 0.42548476454293627, 0.4279778393351801, 0.4293628808864266, 0.4310249307479224, 0.4326869806094183, 0.4337950138504155, 0.43490304709141275, 0.4357340720221607, 0.43656509695290857, 0.4373961218836565, 0.43822714681440444, 0.4404432132963989, 0.4409972299168975, 0.44182825484764543, 0.4437673130193906, 0.4451523545706371, 0.44626038781163435, 0.44681440443213294, 0.4473684210526316, 0.44792243767313017, 0.4490304709141274, 0.450415512465374, 0.4506925207756233, 0.4518005540166205, 0.4520775623268698, 0.45290858725761773, 0.4534626038781163, 0.45373961218836567, 0.45457063711911355, 0.4551246537396122, 0.45595567867036013, 0.45706371191135736, 0.45789473684210524, 0.4581717451523546, 0.4587257617728532, 0.4590027700831025, 0.4592797783933518, 0.4598337950138504, 0.46066481994459835, 0.46094182825484764, 0.46121883656509693, 0.4614958448753463, 0.46232686980609417, 0.4628808864265928, 0.4634349030470914]
python dense_retriever.py
model_file=[checkpoint]
qa_dataset=nq_test
ctx_datatsets=[dpr_wiki]
encoded_ctx_files=[/home/v-nuochen/DPR/outputs/2021-04-17/08-29-34/nq-generate-emd_0]
out_file=nq_retrieval_07_08
The model_file is download from your previous checkpoint. But encoded_ctx_files is generated from generate_dense_embedding.py with default settings by myself.
Could you please tell me why?
The text was updated successfully, but these errors were encountered: