FactKG: Fact Verification via Reasoning on Knowledge Graphs

We introduce a new dataset, FactKG: Fact Verification via Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation. Furthermore, FactKG contains various linguistic patterns, including colloquial style claims as well as written style claims to increase practicality.

The dataset is released along with our paper (ACL 2023). For further details, please refer to our paper.

Getting Started

Dataset

You can download the dataset here.

The factkg_train.pickle is a train set in the form of a dictionary.

Each dictionary key is a claim. The following information is included in the value of each claim as a key.

'Label': the label of the claim (True / False)
'Entity_set': the named entity set that exists in the claim. These entities can be used as keys for the given DBpedia.
'Evidence': the set of evidence to be found using the claim and entity set
- Example format: {'entity_0': [[rel_1, rel_2], [~rel_3]], 'entity_1': [rel_3], 'entity_2': [~rel_2, ~rel_1]}
- It means the graph that contains two paths ([entity_0, rel_1, X], [X, rel_2, entity_2]) and (entity_1, rel_3, entity_0)
  - Example claim: A's spouse is B and his child is a person who was born in 1998.
  - Corresponding evidence: {'A': [[child, birthYear], [spouse]], 'B': [~spouse], '1998': [~birthYear, ~child]}
'types' (metadata): the types of the claim
- Claim style: ('written': written style claim, 'coll:model': claim generated by colloquial style transfer model, 'coll:presup': claim generated by presupposition templates)
- Reasoning type: ('num1': One-hop, 'multi claim': Conjunction, 'existence': Existence, 'multi hop': Multi-hop, 'negation': Negation)
- If the substitution was used to generate the claim, it contains 'substitution'

{
 'Adam McQuaid weighed 94.8024 kilograms and is from Pleasant Springs, Wisconsin.': 
   {
   'Label': [False],
    'Entity_set': ['Adam_McQuaid', '"94802.4"', 'Pleasant_Springs,_Wisconsin'],
    'Evidence': {'Adam_McQuaid': [['weight'], ['placeOfBirth']],
     '"94802.4"': [['~weight']],
     'Pleasant_Springs,_Wisconsin': [['~placeOfBirth']]},
    'types': ['coll:model', 'num2', 'substitution', 'multi claim']
    }
}

The factkg_dev.pickle is a dev set in the form of a dictionary. The format of the data is the same as the train set.

The factkg_test.pickle is a test set in the form of a dictionary. The format is almost same as the train set, but 'Evidence' is not given.

Data resource

We created a DB by preprocessing DBpedia 2015 version. DBpedia is a directed graph, but we removed the directionality by adding triples corresponding to the reverse relation.

The KG is stored in the form of a dictionary in dbpedia_2015_undirected.pickle file.

We also release a subset of DBpedia (named dbpedia_2015_undirected_light.pickle) by selecting only the relations used in FactKG among the triples of DBpedia. You can also use it as a whole KG.

with open('dbpedia_2015_undirected.pickle', 'rb') as f:
    dbpedia = pickle.load(f)

list(dbpedia) returns all of the entities in the KG.
list(dbpedia[entity]) returns the list of the relations that are connected to the entity.
dbpedia[entity][relation] returns the list of the tails that are connected to the entity with the relation.
There are also reversed relations (they contain '~' in front of the relations). For example, the entity 'Korean_language' is in dbpedia['Korea']['language'] and the entity 'Korea' is in dbpedia['Korean_language']['~language'].

Baseline

By correcting some errors in the dataset, the baseline performance may differ from the paper.

Environment: pip install -r requirements.txt

Claim Only

cd claim_only

BERT: python bert_classification.py --model_name bert-base-uncased --exp_name bert_log --train_data_path /path/to/factkg_train.pickle --valid_data_path /path/to/factkg_test.pickle --scheduler linear --batch_size 64 --eval_batch_size 64 --total_epoch 3
BlueBERT: python bert_classification.py --model_name bionlp/bluebert_pubmed_mimic_uncased_L-12_H-768_A-12 --exp_name bluebert_log --train_data_path /path/to/factkg_train.pickle --valid_data_path /path/to/factkg_test.pickle --scheduler linear --batch_size 64 --eval_batch_size 64 --total_epoch 3
Flan-T5: python flan_xl_zeroshot.py --valid_data_path /path/to/factkg_test.pickle --model_name google/flan-t5-xl

With Evidence

cd with_evidence

1. Graph Retriever

cd retrieve

Step 1) preprocess data

cd data
python data_preprocess.py --data_directory_path <<<directory path where factkg_{train, dev, test}.pickle are located>>> --output_directory_path ../model/

Step 2) train relation predictor

cd model/relation_predict
python main.py --mode train --config ../config/relation_predict_top3.yaml
python main.py --mode eval --config ../config/relation_predict_top3.yaml --model_path <<<model_path.ckpt>>>

Step 3) train hop predictor

cd model/hop_predict
python main.py --mode train --config ../config/hop_predict.yaml
python main.py --mode eval --config ../config/hop_predict.yaml --model_path ./model.pth

2. Classifier

cd classifier
python baseline.py --data_path <<<directory path where factkg_{train, dev, test}.pickle are located>>> --kg_path /path/to/dbpedia_2015_undirected_light.pickle

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
claim_only		claim_only
with_evidence		with_evidence
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

claim_only

claim_only

with_evidence

with_evidence

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

FactKG: Fact Verification via Reasoning on Knowledge Graphs

Getting Started

Dataset

Data resource

Baseline

Claim Only

With Evidence

1. Graph Retriever

2. Classifier

About

Releases

Packages

Languages

jiho283/FactKG

Folders and files

Latest commit

History

Repository files navigation

FactKG: Fact Verification via Reasoning on Knowledge Graphs

Getting Started

Dataset

Data resource

Baseline

Claim Only

With Evidence

1. Graph Retriever

2. Classifier

About

Resources

Stars

Watchers

Forks

Languages