-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colbert -- > Next step #20
Comments
I believe I have nailed down the problem. |
@yogeswarl |
TCT_Colbert is a dense retriever. Unlike BM25, which uses a bag of words to find the best answer for a query. Dense retrieval uses vectors that were trained using COLBERT. TCT_colbert is an improved version of colBERT. It uses the bi-encoder decoder to train query-passage pairs for highly efficient retrieval, although it is magnitudes in consumption of time. I will write a literature review on this paper to explain more about TCT Colbert. |
@yogeswarl don't go for the best denser retrievals since we just want to show the application of them using our gold statandars. So, the most simplest but dense model is enough. |
For MSMARCO passage, the colbert is already available. I am using the top retrieval index. I am going to go for something simpler and check if that helps. Thanks for your help. But for AOL, we will have to train them!! |
Hello @hosseinfani , I did try every dense model. They all can only do maximum of 10 iterations per second. I reread the paper and it is one of the drawbacks. In acheiving efficiency, They sacrifice time consumptions |
I would like to suggest we try doing it for a sample of probably 10,000 or a somewhat large number. please advice. |
@yogeswarl |
It is the retrieval step. The dense retrieval model is already prebuilt. |
@hosseinfani . As discussed in Lab. Here are our next approaches to tackle the dense retrieval problem.
I will update you with my findings by next week. |
@hosseinfani. Update |
@hosseinfani , Colbert is almost complete. I will get the results by this evening after computing map scores and add it to our docs. |
@hosseinfani . This is complete. I have added the results to our google docs! |
Hello @hosseinfani, I have an issue where TCT Colbert works on running it in a single process. Since this was tested on the original queries. But when I multiprocess with predicted queries. it runs into a memory error, and I cannot fix it by any means with the debugger. Please let me know if you will be available in the lab this weekend so we can sort of this issue with your guidance.
Thanks
The text was updated successfully, but these errors were encountered: