Skip to content

Commit

Permalink
Merge pull request #16 from bond-lab/omw-1.4
Browse files Browse the repository at this point in the history
Rework release scripts for v1.4
  • Loading branch information
goodmami committed Nov 5, 2021
2 parents 1564270 + 76f60fb commit 3393edf
Show file tree
Hide file tree
Showing 29 changed files with 1,387 additions and 118,568 deletions.
43 changes: 29 additions & 14 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,41 @@ on:
jobs:
run:
runs-on: ubuntu-latest
env:
DTD: etc/WN-LMF-1.1.dtd

steps:
- uses: actions/checkout@v2
- name: Get release tag
run: |
tagname=${GITHUB_REF##*/}
echo "TAGNAME=$tagname" >> $GITHUB_ENV
echo "VERSION=${tagname#v}" >> $GITHUB_ENV
- name: Checkout
uses: actions/checkout@v2
with:
ref: ${{ env.TAGNAME }}

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Build Assets
python-version: '3.8'

- name: Install dependencies
run: |
tag_name="${GITHUB_REF##*/}"
sudo apt install xmlstarlet
./make-lmf.bash "${tag_name}"
- name: Upload
python3.8 -m pip install -r requirements.txt
- name: Build
run: |
./build.sh "$VERSION"
- name: Validate
run: |
tag_name="${GITHUB_REF##*/}"
for asset in ./release/*.xz; do
name=$( basename ${asset%%.tar.xz} )
label=$( grep "^${name}" ./release/index.tsv | cut -f3 )
lgcode=$( grep "^${name}" ./release/index.tsv | cut -f2 )
gh release upload "${tag_name}" "${asset}#${label} [${lgcode}]"
done
gh release upload "${tag_name}" "./index.toml#index.toml"
./validate.sh "$VERSION"
- name: Package and Publish
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
./package.sh --publish "$VERSION" "$TAGNAME"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# files made when preparing a release
log/
build/
release/

# Emacs backups
Expand Down
37 changes: 37 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: OMW Data
message: >-
Please cite this dataset using the metadata from
'preferred-citation'.
type: dataset
authors:
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
affiliation: Nanyang Technological University
- given-names: Michael Wayne
family-names: Goodman
email: goodman.m.w@gmail.com
orcid: 'https://orcid.org/0000-0002-2896-5141'
repository-code: 'https://github.com/bond-lab/omw-data/'
preferred-citation:
type: conference-paper
authors:
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
affiliation: Nanyang Technological University
- family-names: Foster
given-names: Ryan
start: 1352 # First page number
end: 1362 # Last page number
conference:
name: "51st Annual Meeting of the Association for Computational Linguistics: ACL-2013"
title: "Linking and extending an open multilingual wordnet"
year: 2013
url: 'https://aclanthology.org/P13-1133/'
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,39 @@ The raw data (under *wns*) also has the automatically extracted data
for over 150 languages from Wiktionary and the ‎Unicode Common Locale
Data Repository.

## Citation

If you use OMW please cite both the citation below, and the individual wordnets (citation data is included in each wordnet):

Francis Bond and Ryan Foster (2013)
[Linking and extending an open multilingual wordnet](http://aclweb.org/anthology/P/P13/P13-1133.pdf)</a>.
In *51st Annual Meeting of the Association for Computational Linguistics: ACL-2013*.
Sofia. 1352–1362


## Notes

The directory *wns* has the wordnet data from OMW 1.2 with some small fixes
* added a citation for the Icelandic wordnet
* added human readable citations in ``omw-citations.tab``
* added PWN 3.0 and 3.1 in OMW 2.0 format

By default the label is the name of the project. If the project has multiple wordnets, then the language is added in parantheses. E.g.:

If you use OMW please cite both the citation below, and the individual wordnets (citation data is included in each wordnet):
`label = "Multilingual Central Repository (Catalan)"`

The package name (and id) for each wordnet is, by default, `omw-lg`,
with the following exceptions:

* ItalWordnet will be `omw-iwn` not `omw-it` (used by multiwordnet)
* COW will just be `omw-cmn` not `omw-cmn-Hans`
* WN derived from PWN 3.0 will be `omw-en`
* WN derived from PWN 3.1 will be `omw-en31`

We thanks the developers of all of the wordnets! More recent versions
are available for many of these.

Francis Bond and Ryan Foster (2013)
[Linking and extending an open multilingual wordnet](https://aclanthology.org/P13-1133/)</a>.
In *51st Annual Meeting of the Association for Computational Linguistics: ACL-2013*.
Sofia. 1352–1362

101 changes: 101 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash

###
### Build the English Wordnets based on Princeton WordNet and the
### other OMW wordnets.
###
### WordNet 3.0 has a loop in the verb taxonomy that is patched here.
### The build wordnets should not be tracked by Git.
###

if [ $# -ne 1 ]; then
echo "usage: build.sh VERSION"
exit 1
fi

VER=$1

# Configuration ########################################################

BUILD="build/omw-${VER}"

WN30_LABEL="OMW English Wordnet based on WordNet 3.0"
WN31_LABEL="OMW English Wordnet based on WordNet 3.1"
WN_CITATION="Christiane Fellbaum (1998, ed.) *WordNet: An Electronic Lexical Database*. MIT Press."
WN_LICENSE="https://wordnet.princeton.edu/license-and-commercial-use"
WN_EMAIL="bond@ieee.org"

mkdir -p "${BUILD}"
mkdir -p etc


# Auxiliary Files ######################################################

echo "Checking auxiliary files in etc/"

if [ ! -d etc/cili ]; then
git clone https://github.com/globalwordnet/cili.git etc/cili
fi

# WordNet 3.0: retrieve, unpack, patch, and build ######################

if [ ! -d etc/WordNet-3.0 ]; then
wget http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.bz2 -O - | tar -C etc/ -xj
# Patch a simple loop between inhibit and restrain. The original line,
# abbreviated, has this:
# 02423762 41 v 03 inhibit ... @ 02422663 v 0000 ... ~ 02422663 v 0000 ...
sed -i '/^02423762 /{s/@ 02422663 /@ 00612841 /}' etc/WordNet-3.0/dict/data.verb
# NOTE: The above fix is also applied to the NLTK's distribution
# of the Princeton WordNet 3.0, so there is precedent. Please
# refrain from making any further changes to the data.
fi

## make the lexicon
mkdir -p "${BUILD}/omw-en"
python -m scripts.wndb2lmf \
etc/WordNet-3.0/dict/ \
"${BUILD}/omw-en/omw-en.xml" \
--id='omw-en' \
--version="${VER}" \
--label="${WN30_LABEL}" \
--language='en' \
--email="${WN_EMAIL}" \
--license="${WN_LICENSE}" \
--citation="${WN_CITATION}" \
--ili-map=etc/cili/ili-map-pwn30.tab
# below: cat instead of cp to reset permissions
cat etc/WordNet-3.0/LICENSE > "${BUILD}/omw-en/LICENSE"
cat wns/en30/README.md > "${BUILD}/omw-en/README.md"
cat wns/en30/citation.bib > "${BUILD}/omw-en/citation.bib"


# WordNet 3.1: retrieve, unpack, and build #############################

if [ ! -d etc/WordNet-3.1 ]; then
mkdir etc/WordNet-3.1 # WN3.1 is only distributed with the dict/ subdirectory
wget http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz -O - | tar -C etc/WordNet-3.1 -xz
# NOTE: Do not make changes to the data unless necessary for a
# well-formed and loadable WN-LMF document. Errors are meant to be
# fixed in later versions.
fi

mkdir -p "${BUILD}/omw-en31"
python -m scripts.wndb2lmf \
etc/WordNet-3.1/dict/ \
"${BUILD}/omw-en31/omw-en31.xml" \
--id='omw-en31' \
--version="${VER}" \
--label="${WN31_LABEL}" \
--language='en' \
--email="${WN_EMAIL}" \
--license="${WN_LICENSE}" \
--citation="${WN_CITATION}" \
--ili-map=etc/cili/ili-map-pwn31.tab
cat wns/en31/LICENSE > "${BUILD}/omw-en31/LICENSE"
cat wns/en31/README.md > "${BUILD}/omw-en31/README.md"
cat wns/en31/citation.bib > "${BUILD}/omw-en31/citation.bib"


# Other OMW Lexicons ###################################################

python -m scripts.build --version="${VER}"
17 changes: 17 additions & 0 deletions etc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Auxiliary Files for OMW Data Conversion

This directory is for additional files necessary for converting or
analyzing the OMW data. Only files that are not trivial to retrieve
are included in the repository, while the others will be retrieved
when building the lexicons.

* **Included**:
- `wn-core-ili.tab` the core ~5000 concepts derived from the [Princeton WordNet's core word senses](https://wordnet.princeton.edu/download/standoff-files), used for analysis of the lexicons
* Retrieved from <https://github.com/globalwordnet/cili/>:
- `cili/ili-map-pwn30.tab` for mapping WordNet 3.0 synsets to ILIs
- `cili/ili-map-pwn31.tab` for mapping WordNet 3.1 synsets to ILIs
* Retrieved from <https://github.com/globalwordnet/schemas>:
- `WN-LMF-1.1.dtd` for validating the generated XML files
* Retrieved from <http://wordnetcode.princeton.edu/>:
- `WordNet-3.0` WNDB data files for creating the `wn30` lexicon
- `WordNet-3.1` WNDB data files for creating the `wn31` lexicon
File renamed without changes.

0 comments on commit 3393edf

Please sign in to comment.