IDEA: Infer, DEsign, creAte

A collection of prototypes for LM-assisted ontology engineering.

IDEA provides analytical tools for ontology design based on state-of-the-art natural language methods. To date, IDEA extracts the competency questions from an ontology repository, analyses them to find inconsistencies and similarities, and visually project them to a sentencelevel embedding space - hence allowing for semantic search. IDEA has been used in WP2, it allowed us to improve our work on refining requirements among the pilots, refining them in collaboration with the experts, therefore supporting the refactoring of PON. In sum, he framework has demonstrated to create synergies between different stakeholders, and to accelerate/support ontology design activities. A live dashboard is available at this link, with a screenshot reported below.

Functionalities of IDEA

Currently supported

Automatic construction of PolifoniaCQ dataset with CQ checks
Dashboard website
Automatic dashboard update
CQ embeddings and interactive visualisation
Semantic search of CQs via sentence embeddings

Next up

Support of semantic search on dashboard website
Use of graph generation tools for prototyping

Installation

It is recommended to install the development version of IDEA (the one that we currently support) on a separate environment. If you use conda, these commands will do it for you.

conda create -n 'idea' python=3.9
conda activate idea

Once in your environment, you can install requirements using pip as follows.

pip install -r requirements.txt

Documentation

The main entry point is through the CLI provided by idea.py. This is also how content is updated before being commited and pushed to this repository. This also allows to re-compute the dashboard with the latest data running at this link.

Command line interface of the IDEA framework.

positional arguments:
  {dataset,embed,search}
                        Either dataset, embed, search.
  input_dir             Directory where the input files will be read.

optional arguments:
  -h, --help            show this help message and exit
  --out_dir OUT_DIR     Directory where output will be saved.
  --model MODEL         Name of the language model to use.
  --validate            Whether to validate the competency questions.
  --search_query SEARCH_QUERY
                        A textual query to search against the CQs.
  --as_session          Whether to keep a session for more searches.
  --search_topk SEARCH_TOPK
                        Number of CQs to retrieve per semantic search.
  --search_threshold SEARCH_THRESHOLD
                        Similarity threshold for semantic search.
  --device DEVICE       The default device to use for computation.
  --n_workers N_WORKERS
                        Number of workers for parallel computation.

Create or update a CQ dataset

To create a dataset of competency questions, just call the following command by specifying the directory where personas and stories are stored. This will also update the documentation on the online dashboard. In this case, we are using a path to another repository in Polifonia.

python idea.py dataset ../../stories --validate

Compute and visualise CQ embeddings

python idea.py embed ../data/cq_sanity_checks.csv --model all-MiniLM-L6-v2 --device cpu

Multi-lingual semantic search of CQs

For a single search just call:

python idea.py search ../data --search_query instruments --search_topk 20

Otherwise, to keep a session for searching CQs without the need to re-load the model:

python idea.py search ../data --as_session --search_topk 20

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
dashboard		dashboard
data		data
idea		idea
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDEA: Infer, DEsign, creAte

Functionalities of IDEA

Currently supported

Next up

Installation

Documentation

Create or update a CQ dataset

Compute and visualise CQ embeddings

Multi-lingual semantic search of CQs

About

Releases

Packages

Contributors 3

Languages

License

polifonia-project/idea

Folders and files

Latest commit

History

Repository files navigation

IDEA: Infer, DEsign, creAte

Functionalities of IDEA

Currently supported

Next up

Installation

Documentation

Create or update a CQ dataset

Compute and visualise CQ embeddings

Multi-lingual semantic search of CQs

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages