Skip to content

SIGIR 2023: A unified generative retriever for knowledge-intensive language tasks via prompt learning

License

Notifications You must be signed in to change notification settings

ict-bigdatalab/UGR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UGR

This is the source code for paper "A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning".

Introduction

Overview

Knowledge-intensive language tasks (KILTs) benefit from retrieving high-quality relevant contexts from large external knowledge corpora. Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task. But a task-specific retriever usually has poor generalization ability to new domains and tasks, and it may be costly to deploy a variety of specialized retrievers in practice.

We propose a unified generative retriever (UGR) that combines task-specific effectiveness with robust performance over different retrieval tasks in KILTs. To achieve this goal, we make two major contributions:

  • To unify different retrieval tasks into a single generative form, we introduce an n-gram-based identifier for relevant contexts at different levels of granularity in KILTs.
  • To address different retrieval tasks with a single model, we employ a prompt learning strategy and investigate three methods to design prompt tokens for each task.

In this way, the proposed UGR model can not only share common knowledge across tasks for better generalization but also perform different retrieval tasks effectively by distinguishing task-specific characteristics. We train UGR on a heterogeneous set of retrieval corpora with well-designed prompts in a supervised and multi-task fashion. Experimental results on the KILT benchmark demonstrate the effectiveness of UGR on in-domain datasets, out-of-domain datasets, and unseen tasks.

Acknowledgement

License

This project is under Apache License 2.0.

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{chen2023a,
	author = {Chen, Jiangui and Zhang, Ruqing and Guo, Jiafeng and Rijke, Maarten de and Liu, Yiqun and Fan, Yixing and Cheng, Xueqi},
	booktitle = {Proceedings of the 46th {International} {ACM} {SIGIR} {Conference} on {Research} and {Development} in {Information} {Retrieval}},
	year = {2023},
	pages = {1448--1457},
	organization = {ACM},
	title = {A {Unified} {Generative} {Retriever} for {Knowledge}-{Intensive} {Language} {Tasks} via {Prompt} {Learning}},
	volume = {},
}

About

SIGIR 2023: A unified generative retriever for knowledge-intensive language tasks via prompt learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published