GitHub - lucidrains/distilled-retriever-pytorch: Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"

Distilling Knowledge from Reader to Retriever

Implementation of the retriever distillation procedure as outlined in the paper Distilling Knowledge from Reader to Retriever in Pytorch. They propose to train the retriever using the cross attention scores as pseudo-labels. SOTA on QA.

Update: The BM25 gains actually do not look as impressive as the BERT gains. Also, it seems like distilling with BERT as the starting point never gets to the same level as BM25.

I am thinking whether it makes more sense to modify Marge (https://github.com/lucidrains/marge-pytorch) so one minimizes a loss between an extra prediction head on top of the retriever to the cross-attention scores, during training.

Citations

@misc{izacard2020distilling,
    title={Distilling Knowledge from Reader to Retriever for Question Answering}, 
    author={Gautier Izacard and Edouard Grave},
    year={2020},
    eprint={2012.04584},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Distilling Knowledge from Reader to Retriever

Citations

About

Releases

Packages

License

lucidrains/distilled-retriever-pytorch

Folders and files

Latest commit

History

Repository files navigation

Distilling Knowledge from Reader to Retriever

Citations

About

Topics

Resources

License

Stars

Watchers

Forks