A Python NLP Commonsense Knowledge Inference Toolkit
System Description available here: https://arxiv.org/abs/2211.08451
kogito can be installed using pip.
pip install kogito
It requires a minimum python
version of 3.8
.
kogito uses spacy under the hood for various text processing purposes, so, a spacy language package has to be installed before running the inference module.
python -m spacy download en_core_web_sm
By default, CommonsenseInference
module uses en_core_web_sm
to initialize spacy
pipeline, but a different language pipeline can be specified as well.
If you also would like evaluate knowledge models using METEOR
score, then you need to download the following nltk
libraries:
import nltk
nltk.download("punkt")
nltk.download("wordnet")
nltk.download("omw-1.4")
kogito provides an easy interface to interact with knowledge inference or commonsense reasoning models such as COMET to generate inferences from a text input. Here is a sample usage of the library where you can initialize an inference module, a custom commonsense reasoning model, and generate a knowledge graph from text on the fly.
from kogito.models.bart.comet import COMETBART
from kogito.inference import CommonsenseInference
# Load pre-trained model from HuggingFace
model = COMETBART.from_pretrained("mismayil/comet-bart-ai2")
# Initialize inference module with a spacy language pipeline
csi = CommonsenseInference(language="en_core_web_sm")
# Run inference
text = "PersonX becomes a great basketball player"
kgraph = csi.infer(text, model)
# Save output knowledge graph to JSON file
kgraph.to_jsonl("kgraph.json")
Here is an excerpt from the result of the above code sample:
{"head": "PersonX becomes a great basketball player", "relation": "Causes", "tails": [" PersonX practices every day.", " PersonX plays basketball every day", " PersonX practices every day"]}
{"head": "basketball", "relation": "ObjectUse", "tails": [" play with friends", " play basketball with", " play basketball"]}
{"head": "player", "relation": "CapableOf", "tails": [" play game", " win game", " play football"]}
{"head": "great basketball player", "relation": "HasProperty", "tails": [" good at basketball", " good at sports", " very good"]}
{"head": "become player", "relation": "isAfter", "tails": [" play game", " become coach", " play with"]}
This is just one way to generate commonsense inferences and kogito offers much more. For complete documentation, check out the kogito docs.
kogito uses Poetry to manage its dependencies.
Install poetry from the official repository first:
curl -sSL https://install.python-poetry.org | python3 -
Then run the following command to install package dependencies:
poetry install
If you need the ATOMIC2020 data to train your knowledge models, you can download it from AI2:
For ATOMIC:
wget https://storage.googleapis.com/ai2-mosaic/public/atomic/v1.0/atomic_data.tgz
For ATOMIC 2020:
wget https://ai2-atomic.s3-us-west-2.amazonaws.com/data/atomic2020_data-feb2021.zip
If you want to learn more about the library design, models and data used for this toolkit, check out our paper. The paper can be cited as:
@article{Ismayilzada2022kogito,
title={kogito: A Commonsense Knowledge Inference Toolkit},
author={Mete Ismayilzada and Antoine Bosselut},
journal={ArXiv},
volume={abs/2211.08451},
year={2022}
}
If you work with knowledge models, consider citing the following papers:
@article{Hwang2020COMETATOMIC,
author = {Jena D. Hwang and Chandra Bhagavatula and Ronan Le Bras and Jeff Da and Keisuke Sakaguchi and Antoine Bosselut and Yejin Choi},
booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
title = {COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs},
year = {2021}
}
@inproceedings{Bosselut2019COMETCT,
author = {Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Çelikyilmaz and Yejin Choi},
booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
title = {COMET: Commonsense Transformers for Automatic Knowledge Graph Construction},
year = {2019}
}
Significant portion of the model training and evaluation code has been adapted from the original codebase for the paper (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs.