You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am planning to use ColBERT for a ranking task on Wikipedia corpus (as part of FAIR Ranking track: https://fair-trec.github.io/). Briefly, given a keyword consisting of terms related to Wiki articles, the task is to generate a rank list of Wiki docs. I have a couple of questions about using the model on the task:
Wikipedia docs are typically very long-docs. To use them in the model, if I truncate the doc (say take top-500 words), will it affect the perf of the model?
I want to use the query & document embeddings from ColBERT as feats. in another model. Is there a way to get the query and doc. embedding after training?
Thanks.
The text was updated successfully, but these errors were encountered:
I strongly recommend using a passage-level Wikipedia corpus. It's common in the Open-QA literature (e.g., our ColBERT-QA paper) to divide Wikipedia into 100-word or (say) 200-token passages, keeping the title of the page at the start of each passage.
For the second one, encoding the corpus (or the queries) with colbert.index can give you files with all the embeddings. Or you can use the ModelInference class from colbert/modeling/inference.py, and in particular queryFromText and docFromText. See existing uses in the code for how to do this; it's pretty simple!
Hi,
Thanks for releasing this library.
I am planning to use ColBERT for a ranking task on Wikipedia corpus (as part of FAIR Ranking track: https://fair-trec.github.io/). Briefly, given a keyword consisting of terms related to Wiki articles, the task is to generate a rank list of Wiki docs. I have a couple of questions about using the model on the task:
Wikipedia docs are typically very long-docs. To use them in the model, if I truncate the doc (say take top-500 words), will it affect the perf of the model?
I want to use the query & document embeddings from ColBERT as feats. in another model. Is there a way to get the query and doc. embedding after training?
Thanks.
The text was updated successfully, but these errors were encountered: