Skip to content

epfl-nlp/kogito

Repository files navigation

kogito

A Python NLP Commonsense Knowledge Inference Toolkit

System Description available here: https://arxiv.org/abs/2211.08451

Installation

Installation with pip

kogito can be installed using pip.

pip install kogito

It requires a minimum python version of 3.8.

Setup

Inference

kogito uses spacy under the hood for various text processing purposes, so, a spacy language package has to be installed before running the inference module.

python -m spacy download en_core_web_sm

By default, CommonsenseInference module uses en_core_web_sm to initialize spacy pipeline, but a different language pipeline can be specified as well.

Evaluation

If you also would like evaluate knowledge models using METEOR score, then you need to download the following nltk libraries:

import nltk

nltk.download("punkt")
nltk.download("wordnet")
nltk.download("omw-1.4")

Quickstart

kogito provides an easy interface to interact with knowledge inference or commonsense reasoning models such as COMET to generate inferences from a text input. Here is a sample usage of the library where you can initialize an inference module, a custom commonsense reasoning model, and generate a knowledge graph from text on the fly.

from kogito.models.bart.comet import COMETBART
from kogito.inference import CommonsenseInference

# Load pre-trained model from HuggingFace
model = COMETBART.from_pretrained("mismayil/comet-bart-ai2")

# Initialize inference module with a spacy language pipeline
csi = CommonsenseInference(language="en_core_web_sm")

# Run inference
text = "PersonX becomes a great basketball player"
kgraph = csi.infer(text, model)

# Save output knowledge graph to JSON file
kgraph.to_jsonl("kgraph.json")

Here is an excerpt from the result of the above code sample:

{"head": "PersonX becomes a great basketball player", "relation": "Causes", "tails": [" PersonX practices every day.", " PersonX plays basketball every day", " PersonX practices every day"]}
{"head": "basketball", "relation": "ObjectUse", "tails": [" play with friends", " play basketball with", " play basketball"]}
{"head": "player", "relation": "CapableOf", "tails": [" play game", " win game", " play football"]}
{"head": "great basketball player", "relation": "HasProperty", "tails": [" good at basketball", " good at sports", " very good"]}
{"head": "become player", "relation": "isAfter", "tails": [" play game", " become coach", " play with"]}

This is just one way to generate commonsense inferences and kogito offers much more. For complete documentation, check out the kogito docs.

Development

Setup

kogito uses Poetry to manage its dependencies.

Install poetry from the official repository first:

curl -sSL https://install.python-poetry.org | python3 -

Then run the following command to install package dependencies:

poetry install

Data

If you need the ATOMIC2020 data to train your knowledge models, you can download it from AI2:

For ATOMIC:

wget https://storage.googleapis.com/ai2-mosaic/public/atomic/v1.0/atomic_data.tgz

For ATOMIC 2020:

wget https://ai2-atomic.s3-us-west-2.amazonaws.com/data/atomic2020_data-feb2021.zip

Paper

If you want to learn more about the library design, models and data used for this toolkit, check out our paper. The paper can be cited as:

@article{Ismayilzada2022kogito,
  title={kogito: A Commonsense Knowledge Inference Toolkit},
  author={Mete Ismayilzada and Antoine Bosselut},
  journal={ArXiv},
  volume={abs/2211.08451},
  year={2022}
}

If you work with knowledge models, consider citing the following papers:

@article{Hwang2020COMETATOMIC,
 author = {Jena D. Hwang and Chandra Bhagavatula and Ronan Le Bras and Jeff Da and Keisuke Sakaguchi and Antoine Bosselut and Yejin Choi},
 booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
 title = {COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs},
 year = {2021}
}

@inproceedings{Bosselut2019COMETCT,
 author = {Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Çelikyilmaz and Yejin Choi},
 booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
 title = {COMET: Commonsense Transformers for Automatic Knowledge Graph Construction},
 year = {2019}
}

Acknowledgements

Significant portion of the model training and evaluation code has been adapted from the original codebase for the paper (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs.