Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare release 0.4.0 #102

Merged
merged 5 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 2 additions & 74 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,21 +1,9 @@
# Created by https://www.toptal.com/developers/gitignore/api/jupyternotebooks,python

### JupyterNotebooks ###
# gitignore template for Jupyter Notebooks
# website: http://jupyter.org/
# Jupyter Notebooks

.ipynb_checkpoints
*/.ipynb_checkpoints/*

# IPython
profile_default/
ipython_config.py

# Remove previous ipynb_checkpoints
# git rm -r .ipynb_checkpoints/

### Python ###s
# Byte-compiled / optimized / DLL files
# Python - Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
Expand Down Expand Up @@ -72,59 +60,12 @@ cover/
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook

# IPython

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.venv
env/
Expand All @@ -143,9 +84,6 @@ venv.bak/
# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
Expand All @@ -160,15 +98,5 @@ dmypy.json
# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# setuptools_scm generated version file
_version.py

# output dir
outbox/
48 changes: 0 additions & 48 deletions .gitlab-ci.yml

This file was deleted.

30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Change log

## Release x.y.z (2023-MM-DD)

Features:

- ...

Bug fixes:

- ...

Changes:

- ...


## Release 0.4.0 (2023-03-dd)

First public release of voc4cat on github.


## Releases before 0.4.0

Before 0.4.0 the code was in alpha state and kept private.
There was no need for a high-level documentaion of changes.
See git commit log and the issues & milestones in this repository for the early history.

Just before 0.4.0 the code was migrated from a private gitlab instance to github.
The transfer went OK but not perfect (gitlab-MRs were not well converted to github-PRs).
110 changes: 48 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,31 @@
This is the repository
# SKOS vocabulary creation & maintenance with GitHub & Excel

- for working in NFDI4Cat on **Vocabularies for analytics, synthesis and heterogeneous catalysis**.
- for **voc4cat** a script that adds additional options to the original vocexcel tool (without changing or copying any original vocexcel code)
- for developing a **gitlab-based, CI-supported workflow for maintaining vocabularies.**
## Overview

Additional files and notes from the subgroup can be found in the HLRS cloud:
**[voc4cat](https://github.com/nfdi4cat/voc4cat-playground)** is a vocabulary for catalysis developed in NFDI4Cat.

- [Folder of TA1 subgroup Analytics-Synthesis-HeterogeneousCatalysis](https://edocs.hlrs.de/nextcloud/apps/files/?dir=/NFDI4Cat/Project-related%20activities/Task%20Areas/TA1/Subgroup_Analytics-Synthesis-HeterogCatalysis&fileid=155479)
- [Top level folder for TA1](https://edocs.hlrs.de/nextcloud/apps/files/?dir=/NFDI4Cat/Project-related%20activities/Task%20Areas/TA1&fileid=96729)
Related to voc4cat we developed a **toolbox for collaboratively maintaining SKOS vocabularies on github using Excel** (xlsx) as user-friendly interface. It consists of several parts:

# Gitlab-based vocabulary development and maintenance
- **voc4cat-tool** (this repository)
- A commandline tool that adds additional options to [vocexcel](https://github.com/nfdi4cat/VocExcel) without changing or copying any code from the original vocexcel project (GPL 3 licensed).
- **[voc4cat-template](https://github.com/nfdi4cat/voc4cat-template)**
- A github project template for managing SKOS-vocabularies using a GitHub-based workflows including automation by gh-actions.
- **[voc4cat-playground](https://github.com/nfdi4cat/voc4cat-playground)**
- A testbed for playing with the voc4cat workflow. The playground is a test-deployment of the voc4cat-template.

All vocabularies that have gone through the standard gitlab process of

- submission of merge request,
- review of merge request,
- approval of merge request

finally land in the folder `vocabularies`.

## New vocabularies

Please use the Excel file from the template folder to create vocabularies that are compatible with this repository.
If you already have a turtle file, you may convert it with voc4cat to xlsx (Excel).
Be careful that you use a different name for your vocabulary then the ones already present in the `vocabularies` folder.

The rest of the process is the same as in "Update existing vocabularies" below.

## Update existing vocabularies

To add or change details in an existing vocabulary, submit a pull request with an updated Excel-file that has the same name as the vocabulary that you want to update
(see `vocabularies` folder for the names of existing vocabularies).
If you don't have an Excel file but just the turtle file, you can use voc4cat (see below) to convert the turtle file to xlsx (Excel).

Upon submission of the merge request, the Excel file is automatically processed by a CI pipeline.
The results (updated Excel-file, documentation, dendogram, processing log) will be available after about 1 min as download on the merge request page (as so called "artifact").
If you need to fix something just update the merge request branch. This will trigger the pipeline to run again.

Please describe your changes and the motivation for the changes in the merge request note(s) or link to an issue with this information. This will help reviewers to decide on the proposed change.

Finally, when the proposed merge request is accepted, your changes will be integrated in the vocabularies in the folder `vocabularies`.

# Tool voc4cat
## Commandline tool voc4cat

To support what is not provided by the original vocexcel project we have developed this wrapper "voc4cat" that augments vocexcel with additional options like

- Checking our NFDI4Cat-Excel template
- Enriching our NFDI4Cat-Excel template (e.g. automatically add IRIs)
- Enriching our NFDI4Cat-Excel template (e.g. add IRIs)
- Processing all files in a folder at once
- Generating documentation (with [ontospy](http://lambdamusic.github.io/Ontospy/))
- Support for expressing concept-hierarchies by indentation.

Since mid 2022 voc4cat should be used with our vocexcel fork [nfdi4cat/vocexcel](https://github.com/nfdi4cat/vocexcel). The original project (now at [RDFlib/vocexcel](https://github.com/RDFLib/VocExcel)) changed its templates substantially which made it too cumbersome to keep our code and the customized templates compatible.

## Installation

The installation requires internet access since we install vocexcel directly from GitHub.
Expand All @@ -61,81 +36,92 @@ Preconditions:
- Python (3.8 or newer)

voc4cat works on windows, linux and mac. However, the command examples below assume that you work on windows
and that the [launcher](https://docs.python.org/3.10/using/windows.html#python-launcher-for-windows) is also installed.
The launcher is included by default in Windows installers from [python.org](https://www.python.org/downloads/)
and that the [launcher](https://docs.python.org/3.11/using/windows.html#python-launcher-for-windows) is also installed.
The launcher is included by default in Windows installers from [python.org](https://www.python.org/downloads/).
If you don't have the launcher replace `py` by `python` (or `python3`, depending on your OS) in the commands below.

## Installation steps

Checkout this repository

`git clone https://gitlab.fokus.fraunhofer.de/nfdi4cat/ta1-ontologies/voc4cat-tool.git`
`git clone https://github.com/nfdi4cat/voc4cat-tool.git`

Enter the directory to which you cloned.

`cd voc4cat-tool`

Create a virtual environment in a local subfolder ".venv" (Note that the command is for windows. Replace "py" with "python3" on other platforms.):
Create a virtual environment in a local subfolder ".venv" (This command is for windows. Replace "py" with "python3" on other platforms.):

`py -m venv .venv`

Activate the virtual environment (This is again for windows).

`.venv\scripts\activate.bat` (cmd) or `.venv\scripts\Activate.ps1` (powershell)

Update the packages in the virtual environment.
Update pip in the virtual environment.

`py -m pip install -U pip setuptools`
`py -m pip install -U pip`

Install voc4cat into the virtual environment.

`pip install .`

To install including all development tools use `pip install .[dev]` for just the test tools us `pip install .[tests]`. For tests we use [pytest](https://docs.pytest.org).


## Typical use

Run the wrapper and show the help message.
Show a help message for the voc4cat command line tool with all available options.

`voc4cat --help`
`voc4cat --help` (or simply `voc4cat`)

To create a new vocabulary use the NFDI4Cat-adjusted template from the `templates` subfolder.
Typically, when a new vocabulary is created you want to create IRIs from the preferred labels:
To create a new vocabulary use the voc4Cat-adjusted template from the `templates` subfolder.
You may first use simple temporary IRIs like (`new:my_term`). These temporary IRIS may be text based.

`voc4cat -i Your_Vocabulary.xlsx`
With voc4cat you can later replace all IDs belonging to a given prefix (here `ex`) by numeric IDs e.g. starting from 1001:

This will fill the IRI-column for all rows with missing IRI entries.
`voc4cat --make-ids ex 1001 --output-directory output example/concept_hierarchy_043_4Cat.xlsx`

Manually filling the Children URI (in sheet "Concepts") and Members URI (in sheet "Collections") with URIs can be tedious.
An easier way to express hierarchies between concepts, is to use indentation. voc4Cat supports Excel-indentation (default).
voc4cat can also convert other indentaions (e.g.by 3 spaces per level) into Excel-indentation.
This will update all IRIs matching the `ex:`-prefix in the sheets "Concepts", "Additional Concept Features" and "Collections".

Manually filling the Children URI (in sheet "Concepts") and Members URI (in sheet "Collections") with lists of IRIs can be tedious.
An easier way to express hierarchies between concepts is to use indentation.
voc4Cat understands Excel-indentation (the default) for this purpose but can also work with other indentation formats (e.g. by 3 spaces per level).
voc4cat supports converting between indentation-based hierarchy and Children-URI hierarchy (both directions). For example, use

`voc4cat --hierarchy-from-indent --output_directory output example/indent_043_4Cat.xlsx`
`voc4cat --hierarchy-from-indent --output-directory output example/indent_043_4Cat.xlsx`

or if you were using 3 spaces per level

`voc4cat --hierarchy-from-indent --indent-separator " " --output_directory output example/indent_3spaces_043_4Cat.xlsx`
`voc4cat --hierarchy-from-indent --indent-separator " " --output-directory output example/indent_3spaces_043_4Cat.xlsx`

to convert to ChildrenURI-hierarchy. For ChildrenURI-hierarchy to Excel-indenation, use
to convert to ChildrenURI-hierarchy. For ChildrenURI-hierarchy to Excel-indentation, use

`voc4cat --hierarchy-to-indent --output_directory output example/concept_hierarchy_043_4Cat.xlsx`
`voc4cat --hierarchy-to-indent --output-directory output example/concept_hierarchy_043_4Cat.xlsx`

Finally, the vocabulary file can be converted to turtle format. In this case the wrapper script passes the job on to vocexcel:

`voc4cat vocabulary.xlsx`

A turtle file `vocabulary.ttl` is created in the same directory where the xlsx-file is located.

It is also possible to create an xlsx file from a turtle file. Optionally a custom template (like we use here) can be specified:
It is also possible to create an xlsx file from a turtle file. Optionally a custom template can be specified:

`voc4cat --template template/VocExcel-template_043_4Cat.xlsx vocabulary.ttl`

Options that are specific for vocexcel can be put at the end of a `voc4cat` command.
Here is an example that forwards the `-e 3` and `-m 3` options to vocexcel and moreover demonstrates a complex combination of options (as used in CI):

`voc4cat --add_IRI --check --forward --docs --output_directory outbox inbox-excel-vocabs/ -e 3 -m 3`
`voc4cat --check --forward --docs --output-directory outbox inbox-excel-vocabs/ -e 3 -m 3`

# Feedback and code contributions
## Feedback and code contributions

Just create an issue here. We appreciate any kind of feedback and reasoned criticism.

If you want to contribute code, that is even better! We advise to create an issue to get feedback on your plans before you spent too much time on the problem.
If you want to contribute code, we suggest to create an issue first to get early feedback on your plans before you spent too much time.

By contributing you agree that your contributions fall under the project´s MIT license.

## Acknowledgement

This work was funded by the German Research Foundation (DFG) through the project "[NFDI4Cat](https://www.nfdi4cat.org) - NFDI for Catalysis-Related Sciences" (DFG project no. [441926934](https://gepris.dfg.de/gepris/projekt/441926934)), within the National Research Data Infrastructure ([NFDI](https://www.nfdi.de)) programme of the Joint Science Conference (GWK).
Loading