Automated parsing, and ontological & machine learning-powered semantic similarity modelling, of the Digital, Data and Technology (DDaT) profession capability framework website.
DDaT ontology visualisation in OntoSpark |
1. Introduction
2. Getting Started
2.1. Prerequisites
2.2. Clone Source Code
2.3. Install Python Packages
2.4. Configuration
2.5. Usage
3. License
4. Acknowledgements
5. Useful Links
6. Authors
The DDaT ontology modeller application is a Python application that automatically parses, and performs ontological and machine learning-powered semantic similarity modelling of, the Digital, Data and Technology (DDaT) profession capability framework. The goal of the resulting ontology is to enable effective visualisation of the framework, and the goal of the machine-learning powered semantic similarity modelling is to identify potentially duplicate classes such as skills.
Please ensure that the following prerequisite software services are installed in your environment.
- Git - open source distributed version control system.
- Python 3 - Python 3 general-purpose programming language.
- ChromeDriver - WebDriver for Chrome.
The open source code for this application may be found on GitHub at https://github.com/hyperlearningai/ddat-ontology-modeller. To clone the repository, please run the following Git command via your command line or preferred Git GUI tool. The base location of the cloned repository will hereafter be referred to as $DDAT_ONTOLOGY_MODELLER_BASE
.
# Clone the GitHub public repository
$ git clone https://github.com/hyperlearningai/ddat-ontology-modeller
# Navigate into the base project folder
# This location will hereafter be referred to as $DDAT_ONTOLOGY_MODELLER_BASE
$ cd ddat-ontology-modeller
The DDaT ontology modeller application requires the Pandas, PyYAML, Selenium and Sentence Transformers Python packages to be installed in the relevant Python 3 environment. To install these Python package dependencies, please do so either manually or via the requirements.txt
in $DDAT_ONTOLOGY_MODELLER_BASE
using pip
in the relevant Python environment as follows:
# Install the required Python package dependencies in your active Python environment
$ pip install -r requirements.txt
The DDaT ontology modeller application configuration may be found at $DDAT_ONTOLOGY_MODELLER_BASE/ddat/config/config.yaml
. Please review and update the following configuration as required before running the application.
Property | Description |
---|---|
app.base_working_dir |
Absolute path to a readable and writeable local directory where the DDaT ontology will be written to as an OWL RDF/XML file, as well as other working and application log files. |
app.webdriver_paths.chromedriver |
Absolute path to the Google Chrome WebDriver (see Prerequisites). |
To run the DDaT ontology modeller application, simply run $DDAT_ONTOLOGY_MODELLER_BASE/main.py
as follows:
# Run the DDaT ontology modeller application
$ python main.py
The DDaT ontology modeller application source code is available and distributed under the MIT license. Please refer to LICENSE
for further information. The DDaT ontology created by the DDaT ontology modeller application contains public sector information licensed under the Open Government License v3.0.
The DDaT ontology created by the DDaT ontology modeller application contains public sector information sourced from the Digital, Data and Technology (DDaT) profession capability framework which is maintained by the Central Digital and Data Office (CDDO). The framework is publicly-available under the Open Government License v3.0.
- DDaT profession capability framework
- DDaT ontology visualisation in OntoSpark
- DDaT ontology OWL RDF/XML file
- DDaT ontology in WebProtégé
The DDaT ontology modeller application was developed by the following authors:
- Jillur Quddus
Chief Data Scientist & Principal Polyglot Software Engineer