Skip to content

quqxui/Awesome-LLM4IE-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-LLM4IE-Papers

Awesome papers about generative Information extraction using LLMs

The organization of papers is discussed in our survey: Large Language Models for Generative Information Extraction: A Survey.

If you have any suggestions or come across any mistakes, missing information, please feel free to let us know via email at derongxu@mail.ustc.edu.cn and chenweicw@mail.ustc.edu.cn. We appreciate your feedback and help in improving our work.

If you find our survey useful for your research, please cite the following paper:

@misc{xu2023large,
    title={Large Language Models for Generative Information Extraction: A Survey}, 
    author={Derong Xu and Wei Chen and Wenjun Peng and Chao Zhang and Tong Xu and Xiangyu Zhao and Xian Wu and Yefeng Zheng and Enhong Chen},
    year={2023},
    eprint={2312.17617},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

📒 Table of Contents

💡 News

  • Update Logs
    • The details can be find in ./update_new_papers_list.
    • 2024/03/30 Add 27 papers
    • 2024/03/29 Add 20 papers

Information Extraction tasks

A taxonomy by various tasks.

Named Entity Recognition

Models targeting only ner tasks.

Entity Typing

Paper Venue Date Code
Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing EMNLP Findings 2023-12 GitHub
Generative Entity Typing with Curriculum Learning EMNLP 2022-12 GitHub

Entity Identification & Typing

Paper Venue Date Code
LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty WWW 2024
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models NAACL 2024 GitHub
ConsistNER: Towards Instructive NER Demonstrations for LLMs with the Consistency of Ontology and Context AAAI 2024-03
Embedded Named Entity Recognition using Probing Classifiers Arxiv 2024-03 GitHub
ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models Arxiv 2024-03 GitHub
Rethinking Negative Instances for Generative Named Entity Recognition Arxiv 2024-02 GitHub
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Arxiv 2024-02
VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models Arxiv 2024-02
A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction Arxiv 2024-02
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition Arxiv 2024-02
Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction Arxiv 2024-02
C-ICL: Contrastive In-context Learning for Information Extraction Arxiv 2024-02
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition ICLR 2024-01 GitHub
2INER: Instructive and In-Context Learning on Few-Shot Named Entity Recognition EMNLP Findings 2023-12
In-context Learning for Few-shot Multimodal Named Entity Recognition EMNLP Findings 2023-12
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! EMNLP Findings 2023-12 GitHub
Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset EMNLP 2023-12 GitHub
LLMaAA: Making Large Language Models as Active Annotators EMNLP Findings 2023-12 GitHub
Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge EMNLP Findings 2023-12 GitHub
GPT Struct Me: Probing GPT Models on Narrative Entity Extraction WI-IAT 2023-10 GitHub
GPT-NER: Named Entity Recognition via Large Language Models Arxiv 2023-10 GitHub
Prompt-NER: Zero-shot Named Entity Recognition in Astronomy Literature via Large Language Models Arxiv 2023-10
Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition Arxiv 2023-09
One Model for All Domains: Collaborative Domain-Prefx Tuning for Cross-Domain NER IJCAI 2023-09 GitHub
Chain-of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction Arxiv 2023-08
Learning In-context Learning for Named Entity Recognition  ACL 2023-07 GitHub
Debiasing Generative Named Entity Recognition by Calibrating Sequence Likelihood ACL Short 2023-07
Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks ACL Findings 2023-07
Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction BioNLP 2023-07 GitHub
NAG-NER: a Unified Non-Autoregressive Generation Framework for Various NER Tasks ACL Industry 2023-07
Unified Named Entity Recognition as Multi-Label Sequence Generation IJCNN 2023-06
PromptNER : Prompting For Named Entity Recognition Arxiv 2023-06
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023-04
Structured information extraction from complex scientific text with fine-tuned large language models Arxiv 2022-12 Demo
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting COLING 2022-10 GitHub
De-bias for generative extraction in unified NER task ACL 2022-05
InstructionNER: A Multi-Task Instruction-Based Generative Framework for Few-shot NER Arxiv 2022-03
Document-level Entity-based Extraction as Template Generation EMNLP 2021-11 GitHub
A Unified Generative Framework for Various NER Subtasks ACL 2021-08 GitHub
Template-Based Named Entity Recognition Using BART ACL Findings 2021-08 GitHub

Relation Extraction

Models targeting only RE tasks.

Relation Classification

Paper Venue Date Code
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models AAAI 2024-03
Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction Arxiv 2024-02
Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction EMNLP Findings 2023-12
GPT-RE: In-context Learning for Relation Extraction using Large Language Models EMNLP 2023-12 GitHub
Guideline Learning for In-context Information Extraction EMNLP 2023-12
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! EMNLP Findings 2023-12 GitHub
LLMaAA: Making Large Language Models as Active Annotators EMNLP Findings 2023-12 GitHub
Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs EMNLP 2023-12 GitHub
Revisiting Large Language Models as Zero-shot Relation Extractors EMNLP Findings 2023-12
Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment Arxiv 2023-10
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors ACL Findings 2023-07 GitHub
How to Unleash the Power of Large Language Models for Few-shot Relation Extraction? ACL Workshop 2023-07 GitHub
Sequence generation with label augmentation for relation extraction AAAI 2023-06 GitHub
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023-04
DORE: Document Ordered Relation Extraction based on Generative Framework EMNLP Findings 2022-12
REBEL: Relation Extraction By End-to-end Language generation EMNLP Findings 2021-11 GitHub

Relation Triplet

Paper Venue Date Code
Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction WWW 2024
Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction COLING 2024
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Arxiv 2024-03 GitHub
AutoRE: Document-Level Relation Extraction with Large Language Models Arxiv 2024-03 GitHub
A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction Arxiv 2024-02
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models Arxiv 2024-02 GitHub
Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction Arxiv 2024-02
Efficient Data Learning for Open Information Extraction with Pre-trained Language Models EMNLP Findings 2023-12
Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment Arxiv 2023-10

| Document-level Entity-based Extraction as Template Generation | EMNLP | 2021-11 | GitHub |

Relation Strict

Paper Venue Date Code
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction AAAI 2024-03 GitHub
C-ICL: Contrastive In-context Learning for Information Extraction Arxiv 2024-02
REBEL: Relation Extraction By End-to-end Language generation EMNLP Findings 2021-11 GitHub

Event Extraction

Models targeting only EE tasks.

Event Detection

Paper Venue Date Code
Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment Arxiv 2023-10
Unleash GPT-2 Power for Event Detection ACL 2021-08

Event Argument Extraction

Paper Venue Date Code
Context-Aware Prompt for Generation-based Event Argument Extraction with Diffusion Models CIKM 2023-10
Contextualized Soft Prompts for Extraction of Event Arguments ACL Findings 2023-07
AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model ACL 2023-07 GitHub
Code4Struct: Code Generation for Few-Shot Event Structure Prediction ACL 2023-07 GitHub
Event Extraction as Question Generation and Answering ACL short 2023-07 GitHub
Global Constraints with Prompting for Zero-Shot Event Argument Classification EACL Findings 2023-05
Prompt for extraction? PAIE: prompting argument interaction for event argument extraction ACL 2022-05 GitHub

Event Detection & Argument Extraction

Paper Venue Date Code
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models Arxiv 2024-02
Guideline Learning for In-context Information Extraction EMNLP 2023-12
DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event Extraction EMNLP Findings 2023-12 GitHub
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! EMNLP Findings 2023-12 GitHub
DICE: Data-Efficient Clinical Event Extraction with Generative Models ACL 2023-07 GitHub
A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction NIPS Workshop 2023-10
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models AAAI 2024-03
DEGREE: A Data-Efficient Generative Event Extraction Model NAACL 2022-07 GitHub
ClarET: Pre-training a correlation-aware context-to-event transformer for event-centric generation and classification ACL 2022-05 GitHub
Dynamic prefix-tuning for generative template-based event extraction ACL 2022-05
Text2event: Controllable sequence-to- structure generation for end-to-end event extraction ACL 2021-08 GitHub
Document-level event argument extraction by conditional generation NAACL 2021-06 GitHub

Universal Information Extraction

Unified models targeting multiple IE tasks.

NL-LLMs based

Paper Venue Date Code
ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models COLING 2024
YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction Arxiv 2024-01
Set Learning for Generative Information Extraction EMNLP 2023-12
GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect Arxiv 2023-11
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction Arxiv 2023-04 GitHub
Zero-Shot Information Extraction via Chatting with ChatGPT Arxiv 2023-02 GitHub
GenIE: Generative Information Extraction NAACL 2022-07 GitHub
DEEPSTRUCT: Pretraining of Language Models for Structure Prediction ACL Findings 2022-05 GitHub
Unified Structure Generation for Universal Information Extraction ACL 2022-05 GitHub
Structured prediction as translation between augmented natural languages ICLR 2021-01 GitHub

Code-LLMs based

Paper Venue Date Code
KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction Arxiv 2024-03 GitHub
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction ICLR 2024-1 GitHub
Retrieval-Augmented Code Generation for Universal Information Extraction Arxiv 2023-11
CODEIE: Large Code Generation Models are Better Few-Shot Information Extractors ACL 2023-07 GitHub
CodeKGC: Code Language Model for Generative Knowledge Graph Construction ACM TALLIP 2024-03 GitHub

Learning Paradigms

A taxonomy by Learning Paradigms.

Supervised Fine-tuning

Paper Venue Date Code
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction AAAI 2024-03 GitHub
AutoRE: Document-Level Relation Extraction with Large Language Models Arxiv 2024-03 GitHub
Embedded Named Entity Recognition using Probing Classifiers Arxiv 2024-03 GitHub
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models Arxiv 2024-02
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition Arxiv 2024-02
Rethinking Negative Instances for Generative Named Entity Recognition Arxiv 2024-02 GitHub
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition ICLR 2024-01 GitHub
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction ICLR 2024-1 GitHub
Set Learning for Generative Information Extraction EMNLP 2023-12
Efficient Data Learning for Open Information Extraction with Pre-trained Language Models EMNLP Findings 2023-12
DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event Extraction EMNLP Findings 2023-12 GitHub
Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing EMNLP Findings 2023-12
GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect Arxiv 2023-11
Context-Aware Prompt for Generation-based Event Argument Extraction with Diffusion Models CIKM 2023-10
Contextualized Soft Prompts for Extraction of Event Arguments ACL Findings 2023-07
AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model ACL 2023-07 GitHub
Debiasing Generative Named Entity Recognition by Calibrating Sequence Likelihood ACL short 2023-07
DICE: Data-Efficient Clinical Event Extraction with Generative Models ACL 2023-07 GitHub
Event Extraction as Question Generation and Answering ACL short 2023-07 GitHub
NAG-NER: a Unified Non-Autoregressive Generation Framework for Various NER Tasks ACL Industry 2023-07
Sequence generation with label augmentation for relation extraction AAAI 2023-06 GitHub
Unified Named Entity Recognition as Multi-Label Sequence Generation IJCNN 2023-06
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction Arxiv 2023-04 GitHub
Structured information extraction from complex scientific text with fine-tuned large language models Arxiv 2022-12 Demo
Generative Entity Typing with Curriculum Learning EMNLP 2022-12 GitHub
DORE: Document Ordered Relation Extraction based on Generative Framework EMNLP Findings 2022-12
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model NeurIPS 2022-10 GitHub
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting COLING 2022-10 GitHub
GenIE: Generative Information Extraction NAACL 2022-07 GitHub
DEGREE: A Data-Efficient Generative Event Extraction Model NAACL 2022-07 GitHub
ClarET: Pre-training a correlation-aware context-to-event transformer for event-centric generation and classification ACL 2022-05 GitHub
DEEPSTRUCT: Pretraining of Language Models for Structure Prediction ACL Findings 2022-05 GitHub
Dynamic prefix-tuning for generative template-based event extraction ACL 2022-05
Prompt for extraction? PAIE: prompting argument interaction for event argument extraction ACL 2022-05 GitHub
Unified Structure Generation for Universal Information Extraction ACL 2022-05 GitHub
De-bias for generative extraction in unified NER task ACL 2022-05
Document-level Entity-based Extraction as Template Generation EMNLP 2021-11 GitHub
REBEL: Relation Extraction By End-to-end Language generation EMNLP Findings 2021-11 GitHub
A Unified Generative Framework for Various NER Subtasks ACL 2021-08 GitHub
Template-Based Named Entity Recognition Using BART ACL Findings 2021-08 GitHub
Text2event: Controllable sequence-to- structure generation for end-to-end event extraction ACL 2021-08 GitHub
Document-level event argument extraction by conditional generation NAACL 2021-06 GitHub
Structured prediction as translation between augmented natural languages ICLR 2021-01 GitHub

Few-shot

Few-shot Fine-tuning

Paper Venue Date Code
DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event Extraction EMNLP Findings 2023-12 GitHub
One Model for All Domains: Collaborative Domain-Prefx Tuning for Cross-Domain NER IJCAI 2023-09 GitHub
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting COLING 2022-10 GitHub
Unified Structure Generation for Universal Information Extraction ACL 2022-05 GitHub
InstructionNER: A Multi-Task Instruction-Based Generative Framework for Few-shot NER Arxiv 2022-03
Template-Based Named Entity Recognition Using BART ACL Findings 2021-08 GitHub
Structured prediction as translation between augmented natural languages ICLR 2021-01 GitHub

In-Context Learning

Paper Venue Date Code
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models NAACL 2024 GitHub
ConsistNER: Towards Instructive NER Demonstrations for LLMs with the Consistency of Ontology and Context AAAI 2024-03
CodeKGC: Code Language Model for Generative Knowledge Graph Construction ACM TALLIP 2024-03 GitHub
Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction COLING 2024
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models Arxiv 2024-02 GitHub
LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty WWW 2024
Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction Arxiv 2024-02
C-ICL: Contrastive In-context Learning for Information Extraction Arxiv 2024-02
Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction EMNLP Findings 2023-12
GPT-RE: In-context Learning for Relation Extraction using Large Language Models EMNLP 2023-12 GitHub
Guideline Learning for In-context Information Extraction EMNLP 2023-12
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! EMNLP Findings 2023-12 GitHub
Retrieval-Augmented Code Generation for Universal Information Extraction Arxiv 2023-11
Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment Arxiv 2023-10
GPT-NER: Named Entity Recognition via Large Language Models Arxiv 2023-10 GitHub
GPT Struct Me: Probing GPT Models on Narrative Entity Extraction WI-IAT 2023-10 GitHub
Learning In-context Learning for Named Entity Recognition  ACL 2023-07 GitHub
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors ACL Findings 2023-07 GitHub
Code4Struct: Code Generation for Few-Shot Event Structure Prediction ACL 2023-07 GitHub
CODEIE: Large Code Generation Models are Better Few-Shot Information Extractors ACL 2023-07 GitHub
How to Unleash the Power of Large Language Models for Few-shot Relation Extraction? ACL Workshop 2023-07 GitHub
PromptNER : Prompting For Named Entity Recognition Arxiv 2023-06 GitHub

Zero-shot

Zero-shot Prompting

Paper Venue Date Code
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models NAACL 2024 GitHub
CodeKGC: Code Language Model for Generative Knowledge Graph Construction ACM TALLIP 2024-03 GitHub
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Arxiv 2024-03 GitHub
A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction Arxiv 2024-02
Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction Arxiv 2024-02
Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs EMNLP 2023-12 GitHub
Prompt-NER: Zero-shot Named Entity Recognition in Astronomy Literature via Large Language Models Arxiv 2023-10
Revisiting Large Language Models as Zero-shot Relation Extractors EMNLP Findings 2023-10
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors ACL Findings 2023-07 GitHub
Code4Struct: Code Generation for Few-Shot Event Structure Prediction ACL 2023-07 GitHub
A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction NIPS Workshop 2023-10
Global Constraints with Prompting for Zero-Shot Event Argument Classification EACL Findings 2023-05
Zero-Shot Information Extraction via Chatting with ChatGPT Arxiv 2023-02 GitHub

Cross-Domain Learning

Paper Venue Date Code
VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models Arxiv 2024-02
Rethinking Negative Instances for Generative Named Entity Recognition Arxiv 2024-02 GitHub
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction ICLR 2024-01 GitHub
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition ICLR 2024-01 GitHub
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction Arxiv 2023-04 GitHub
DEEPSTRUCT: Pretraining of Language Models for Structure Prediction ACL Findings 2022-05 GitHub
Multilingual generative language models for zero-shot cross-lingual event argument extraction ACL 2022-05 GitHub

Cross-Type Learning

Paper Venue Date Code
Document-level event argument extraction by conditional generation NAACL 2021-06 GitHub

Data Augmentation

Data Annotation

Paper Venue Date Code
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Arxiv 2024-02
LLMaAA: Making Large Language Models as Active Annotators EMNLP Findings 2023-12 GitHub
Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs EMNLP 2023-12 GitHub
Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models EMNLP 2023-12 GitHub
How to Unleash the Power of Large Language Models for Few-shot Relation Extraction? ACL Workshop 2023-07 GitHub
Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction bioNLP Workshop 2023-07 GitHub
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023-04
Unleash GPT-2 Power for Event Detection ACL 2021-08

Knowledge Retrieval

Paper Venue Date Code
Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction WWW 2024
Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset EMNLP 2023-12 GitHub
Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge EMNLP Findings 2023-12 GitHub
Chain-of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction Arxiv 2023-08

Inverse Generation

Paper Venue Date Code
ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models Arxiv 2024-03 GitHub
Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction Arxiv 2024-02
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction EMNLP 2023-12 GitHub
Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks ACL Findings 2023-07
Event Extraction as Question Generation and Answering ACL Short 2023-07 GitHub
STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models AAAI 2024-03

Specific Domain

Paper Domain Venue Date Code
Advancing Entity Recognition in Biomedicine via Instruction Tuning of Large Language Models Biomedical Bioinformatics 2024-03 GitHub
Improving LLM-Based Health Information Extraction with In-Context Learning Health Others 2024-03
Structured information extraction from scientific text with large language models Scientific Nat. Commun. 2024-02 GitHub
Combining prompt‑based language models and weak supervision for labeling named entity recognition on legal documents Legal Others 2024-02
Impact of Sample Selection on In-Context Learning for Entity Extraction from Scientific Writing Scientific EMNLP Findings 2023-12 GitHub
Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge Multimodal ENMLP Findings 2023-12 GitHub
In-context Learning for Few-shot Multimodal Named Entity Recognition Multimodal ENMLP Findings 2023-12
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature Polymer Material Arxiv 2023-11 GitHub
Prompt-NER: Zero-shot Named Entity Recognition in Astronomy Literature via Large Language Models Astronomical Arxiv 2023-10
Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition Biomedical Arxiv 2023-09
Chain-of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction Multimodal Arxiv 2023-08
DICE: Data-Efficient Clinical Event Extraction with Generative Models Clinical ACL 2023-07 GitHub
How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain Medical Arxiv 2023-07 GitHub
Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction Multilingual / Clinical BioNLP 2023-07 GitHub
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Clinical Arxiv 2023-04
Yes but.. Can ChatGPT Identify Entities in Historical Documents Historical JCDL 2023-03
Zero-shot Clinical Entity Recognition using ChatGPT Clinical Arxiv 2023-03
Structured information extraction from complex scientific text with fine-tuned large language models Scientific Arxiv 2022-12 Demo
Multilingual generative language models for zero-shot cross-lingual event argument extraction Multilingual ACL 2022-05 GitHub

Evaluation and Analysis

Paper Venue Date Code
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities Arxiv 2024-02 GitHub
IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus Arxiv 2024-02 GitHub
Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting Arxiv 2024-02
Information Extraction from Legal Wills: How Well Does GPT-4 Do? EMNLP Findings 2023-12 GitHub
Information Extraction in Low-Resource Scenarios: Survey and Perspective Arxiv 2023-12 GitHub
GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models Arxiv 2023-12 GitHub
Empirical Study of Zero-Shot NER with ChatGPT EMNLP 2023-12 GitHub
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval EMNLP Findings 2023-12 GitHub
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction EMNLP 2023-12 GitHub
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature Arxiv 2023-11 GitHub
XNLP: An Interactive Demonstration System for Universal Structured NLP Arxiv 2023-08 Demo
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks Arxiv 2023-07
How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain Arxiv 2023-07 GitHub
Revisiting Relation Extraction in the era of Large Language Models ACL 2023-07 GitHub
Zero-shot Temporal Relation Extraction with ChatGPT BioNLP 2023-07
InstructIE: A Chinese Instruction-based Information Extraction Dataset Arxiv 2023-05 GitHub
Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors Arxiv 2023-05 GitHub
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness Arxiv 2023-04 GitHub
Exploring the Feasibility of ChatGPT for Event Extraction Arxiv 2023-03
Yes but.. Can ChatGPT Identify Entities in Historical Documents JCDL 2023-03
Zero-shot Clinical Entity Recognition using ChatGPT Arxiv 2023-03
Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again EMNLP Findings 2022-12 GitHub
Large Language Models are Few-Shot Clinical Information Extractors EMNLP 2022-12 Huggingface

Project and Toolkit

Paper Type Venue Date Link
TechGPT-2.0: A Large Language Model Project to Solve the Task of Knowledge Graph Construction Project Arxiv 2024-01 Link
CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction Toolkit Arxiv 2023-07 Link

Datasets

* denotes the dataset is multimodal. # refers to the number of categories or sentences.

Task Dataset Domain #Class #Train #Val #Test Link
NER ACE04 News 7 6202 745 812 Link
ACE05 News 7 7299 971 1060 Link
BC5CDR Biomedical 2 4560 4581 4797 Link
Broad Twitter Corpus Social Media 3 6338 1001 2000 Link
CADEC Biomedical 1 5340 1097 1160 Link
CoNLL03 News 4 14041 3250 3453 Link
CoNLLpp News 4 14041 3250 3453 Link
CrossNER-AI Artificial Intelligence 14 100 350 431 Link
CrossNER-Literature Literary 12 100 400 416
CrossNER-Music Musical 13 100 380 465
CrossNER-Politics Political 9 199 540 650
CrossNER-Science Scientific 17 200 450 543
FabNER Scientific 12 9435 2182 2064 Link
Few-NERD General 66 131767 18824 37468 Link
FindVehicle Traffic 21 21565 20777 20777 Link
GENIA Biomedical 5 15023 1669 1854 Link
HarveyNER Social Media 4 3967 1301 1303 Link
MIT-Movie Social Media 12 9774 2442 2442 Link
MIT-Restaurant Social Media 8 7659 1520 1520 Link
MultiNERD Wikipedia 16 134144 10000 10000 Link
NCBI Biomedical 4 5432 923 940 Link
OntoNotes 5.0 General 18 59924 8528 8262 Link
ShARe13 Biomedical 1 8508 12050 9009 Link
ShARe14 Biomedical 1 17404 1360 15850 Link
SNAP* Social Media 4 4290 1432 1459 Link
Temporal Twitter Corpus (TTC) Social Meida 3 10000 500 1500 Link
Tweebank-NER Social Media 4 1639 710 1201 Link
Twitter2015* Social Media 4 4000 1000 3357 Link
Twitter2017* Social Media 4 3373 723 723 Link
TwitterNER7 Social Media 7 7111 886 576 Link
WikiDiverse* News 13 6312 755 757 Link
WNUT2017 Social Media 6 3394 1009 1287 Link
RE ACE05 News 7 10051 2420 2050 Link
ADE Biomedical 1 3417 427 428 Link
CoNLL04 News 5 922 231 288 Link
DocRED Wikipedia 96 3008 300 700 Link
MNRE* Social Media 23 12247 1624 1614 Link
NYT News 24 56196 5000 5000 Link
Re-TACRED News 40 58465 19584 13418 Link
SciERC Scientific 7 1366 187 397 Link
SemEval2010 General 19 6507 1493 2717 Link
TACRED News 42 68124 22631 15509 Link
TACREV News 42 68124 22631 15509 Link
EE ACE05 News 33/22 17172 923 832 Link
CASIE Cybersecurity 5/26 11189 1778 3208 Link
GENIA11 Biomedical 9/11 8730 1091 1092 Link
GENIA13 Biomedical 13/7 4000 500 500 Link
PHEE Biomedical 2/16 2898 961 968 Link
RAMS News 139/65 7329 924 871 Link
WikiEvents Wikipedia 50/59 5262 378 492 Link