Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework release scripts for v1.4 #16

Merged
merged 30 commits into from
Nov 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2468d06
Rework release scripts for v1.4
goodmami Oct 27, 2021
f925a36
update labels and urls
fcbond Oct 29, 2021
bc986be
documentation for wordnet 3.0
fcbond Oct 29, 2021
7ad4bf7
Fix a typo and a redundant url
goodmami Oct 29, 2021
d3d325b
Fix a bug wtih lexfile lookup
goodmami Oct 29, 2021
fa7371e
Make wndb2lmf progress reporting more clear
goodmami Oct 29, 2021
6e2d09d
Use hex-codepoints for chars without known escape
goodmami Oct 29, 2021
1fa618a
Add build.sh and tracked etc/ files
goodmami Oct 29, 2021
72d8ad5
Fix path to CILI files in tsv2lmf.py
goodmami Oct 29, 2021
4b31d2d
Use proper string formatting
goodmami Oct 29, 2021
36bde23
Bump Wn version to 0.8.1
goodmami Oct 29, 2021
5e35694
changed Albanian language code to sq; closes #18
fcbond Oct 30, 2021
53e03de
renamed packages to omw-LG; closes #15
fcbond Oct 30, 2021
1445441
Made the README more readable
fcbond Oct 30, 2021
5999d07
harmonized build directory names; closes #15 again
fcbond Nov 3, 2021
d256135
First stab at a CITATION.cff
fcbond Nov 3, 2021
f848ba0
Code cleanup
goodmami Nov 3, 2021
3c8a38a
Fix #17: separate sense_key and sense ID for omw-en
goodmami Nov 3, 2021
0d06a8f
Remove default url in requires
goodmami Nov 3, 2021
9e59716
Copy extra files to LMF package, if present
goodmami Nov 3, 2021
5c06da1
Change WN3.0 build dir from omw-en30 to omw-en
goodmami Nov 3, 2021
adb1627
Add LICENSE and citation.bib files to omw-en* pkgs
goodmami Nov 3, 2021
745da59
Update release workflow and build script
goodmami Nov 3, 2021
7b2ed2b
Attempt fix of workflow file
goodmami Nov 4, 2021
fbb6d99
Merge branch 'main' into omw-1.4
goodmami Nov 4, 2021
bfc90c3
Bump wn dependency version to 0.8.3
goodmami Nov 4, 2021
94c36c3
Remove stored pwn*.xz files; use stored metadata
goodmami Nov 4, 2021
9bc4e14
Remove old make-lmf.bash script
goodmami Nov 4, 2021
67eebd4
Move most shell commands from workflow to scripts
goodmami Nov 4, 2021
76f60fb
Remove workflow_dispatch action option
goodmami Nov 5, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
43 changes: 29 additions & 14 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,41 @@ on:
jobs:
run:
runs-on: ubuntu-latest
env:
DTD: etc/WN-LMF-1.1.dtd

steps:
- uses: actions/checkout@v2
- name: Get release tag
run: |
tagname=${GITHUB_REF##*/}
echo "TAGNAME=$tagname" >> $GITHUB_ENV
echo "VERSION=${tagname#v}" >> $GITHUB_ENV

- name: Checkout
uses: actions/checkout@v2
with:
ref: ${{ env.TAGNAME }}

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Build Assets
python-version: '3.8'

- name: Install dependencies
run: |
tag_name="${GITHUB_REF##*/}"
sudo apt install xmlstarlet
./make-lmf.bash "${tag_name}"
- name: Upload
python3.8 -m pip install -r requirements.txt

- name: Build
run: |
./build.sh "$VERSION"

- name: Validate
run: |
tag_name="${GITHUB_REF##*/}"
for asset in ./release/*.xz; do
name=$( basename ${asset%%.tar.xz} )
label=$( grep "^${name}" ./release/index.tsv | cut -f3 )
lgcode=$( grep "^${name}" ./release/index.tsv | cut -f2 )
gh release upload "${tag_name}" "${asset}#${label} [${lgcode}]"
done
gh release upload "${tag_name}" "./index.toml#index.toml"
./validate.sh "$VERSION"

- name: Package and Publish
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
./package.sh --publish "$VERSION" "$TAGNAME"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# files made when preparing a release
log/
build/
release/

# Emacs backups
Expand Down
37 changes: 37 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: OMW Data
message: >-
Please cite this dataset using the metadata from
'preferred-citation'.
type: dataset
authors:
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
affiliation: Nanyang Technological University
- given-names: Michael Wayne
family-names: Goodman
email: goodman.m.w@gmail.com
orcid: 'https://orcid.org/0000-0002-2896-5141'
repository-code: 'https://github.com/bond-lab/omw-data/'
preferred-citation:
type: conference-paper
authors:
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
affiliation: Nanyang Technological University
- family-names: Foster
given-names: Ryan
start: 1352 # First page number
end: 1362 # Last page number
conference:
name: "51st Annual Meeting of the Association for Computational Linguistics: ACL-2013"
title: "Linking and extending an open multilingual wordnet"
year: 2013
url: 'https://aclanthology.org/P13-1133/'
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,39 @@ The raw data (under *wns*) also has the automatically extracted data
for over 150 languages from Wiktionary and the ‎Unicode Common Locale
Data Repository.

## Citation

If you use OMW please cite both the citation below, and the individual wordnets (citation data is included in each wordnet):

Francis Bond and Ryan Foster (2013)
[Linking and extending an open multilingual wordnet](http://aclweb.org/anthology/P/P13/P13-1133.pdf)</a>.
In *51st Annual Meeting of the Association for Computational Linguistics: ACL-2013*.
Sofia. 1352–1362


## Notes

The directory *wns* has the wordnet data from OMW 1.2 with some small fixes
* added a citation for the Icelandic wordnet
* added human readable citations in ``omw-citations.tab``
* added PWN 3.0 and 3.1 in OMW 2.0 format

By default the label is the name of the project. If the project has multiple wordnets, then the language is added in parantheses. E.g.:

If you use OMW please cite both the citation below, and the individual wordnets (citation data is included in each wordnet):
`label = "Multilingual Central Repository (Catalan)"`

The package name (and id) for each wordnet is, by default, `omw-lg`,
with the following exceptions:

* ItalWordnet will be `omw-iwn` not `omw-it` (used by multiwordnet)
* COW will just be `omw-cmn` not `omw-cmn-Hans`
* WN derived from PWN 3.0 will be `omw-en`
* WN derived from PWN 3.1 will be `omw-en31`

We thanks the developers of all of the wordnets! More recent versions
are available for many of these.

Francis Bond and Ryan Foster (2013)
[Linking and extending an open multilingual wordnet](https://aclanthology.org/P13-1133/)</a>.
In *51st Annual Meeting of the Association for Computational Linguistics: ACL-2013*.
Sofia. 1352–1362

101 changes: 101 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash

###
### Build the English Wordnets based on Princeton WordNet and the
### other OMW wordnets.
###
### WordNet 3.0 has a loop in the verb taxonomy that is patched here.
### The build wordnets should not be tracked by Git.
###

if [ $# -ne 1 ]; then
echo "usage: build.sh VERSION"
exit 1
fi

VER=$1

# Configuration ########################################################

BUILD="build/omw-${VER}"

WN30_LABEL="OMW English Wordnet based on WordNet 3.0"
WN31_LABEL="OMW English Wordnet based on WordNet 3.1"
WN_CITATION="Christiane Fellbaum (1998, ed.) *WordNet: An Electronic Lexical Database*. MIT Press."
WN_LICENSE="https://wordnet.princeton.edu/license-and-commercial-use"
WN_EMAIL="bond@ieee.org"

mkdir -p "${BUILD}"
mkdir -p etc


# Auxiliary Files ######################################################

echo "Checking auxiliary files in etc/"

if [ ! -d etc/cili ]; then
git clone https://github.com/globalwordnet/cili.git etc/cili
fi

# WordNet 3.0: retrieve, unpack, patch, and build ######################

if [ ! -d etc/WordNet-3.0 ]; then
wget http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.bz2 -O - | tar -C etc/ -xj
# Patch a simple loop between inhibit and restrain. The original line,
# abbreviated, has this:
# 02423762 41 v 03 inhibit ... @ 02422663 v 0000 ... ~ 02422663 v 0000 ...
sed -i '/^02423762 /{s/@ 02422663 /@ 00612841 /}' etc/WordNet-3.0/dict/data.verb
# NOTE: The above fix is also applied to the NLTK's distribution
# of the Princeton WordNet 3.0, so there is precedent. Please
# refrain from making any further changes to the data.
fi

## make the lexicon
mkdir -p "${BUILD}/omw-en"
python -m scripts.wndb2lmf \
etc/WordNet-3.0/dict/ \
"${BUILD}/omw-en/omw-en.xml" \
--id='omw-en' \
--version="${VER}" \
--label="${WN30_LABEL}" \
--language='en' \
--email="${WN_EMAIL}" \
--license="${WN_LICENSE}" \
--citation="${WN_CITATION}" \
--ili-map=etc/cili/ili-map-pwn30.tab
# below: cat instead of cp to reset permissions
cat etc/WordNet-3.0/LICENSE > "${BUILD}/omw-en/LICENSE"
cat wns/en30/README.md > "${BUILD}/omw-en/README.md"
cat wns/en30/citation.bib > "${BUILD}/omw-en/citation.bib"


# WordNet 3.1: retrieve, unpack, and build #############################

if [ ! -d etc/WordNet-3.1 ]; then
mkdir etc/WordNet-3.1 # WN3.1 is only distributed with the dict/ subdirectory
wget http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz -O - | tar -C etc/WordNet-3.1 -xz
# NOTE: Do not make changes to the data unless necessary for a
# well-formed and loadable WN-LMF document. Errors are meant to be
# fixed in later versions.
fi

mkdir -p "${BUILD}/omw-en31"
python -m scripts.wndb2lmf \
etc/WordNet-3.1/dict/ \
"${BUILD}/omw-en31/omw-en31.xml" \
--id='omw-en31' \
--version="${VER}" \
--label="${WN31_LABEL}" \
--language='en' \
--email="${WN_EMAIL}" \
--license="${WN_LICENSE}" \
--citation="${WN_CITATION}" \
--ili-map=etc/cili/ili-map-pwn31.tab
cat wns/en31/LICENSE > "${BUILD}/omw-en31/LICENSE"
cat wns/en31/README.md > "${BUILD}/omw-en31/README.md"
cat wns/en31/citation.bib > "${BUILD}/omw-en31/citation.bib"


# Other OMW Lexicons ###################################################

python -m scripts.build --version="${VER}"
17 changes: 17 additions & 0 deletions etc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Auxiliary Files for OMW Data Conversion

This directory is for additional files necessary for converting or
analyzing the OMW data. Only files that are not trivial to retrieve
are included in the repository, while the others will be retrieved
when building the lexicons.

* **Included**:
- `wn-core-ili.tab` the core ~5000 concepts derived from the [Princeton WordNet's core word senses](https://wordnet.princeton.edu/download/standoff-files), used for analysis of the lexicons
* Retrieved from <https://github.com/globalwordnet/cili/>:
- `cili/ili-map-pwn30.tab` for mapping WordNet 3.0 synsets to ILIs
- `cili/ili-map-pwn31.tab` for mapping WordNet 3.1 synsets to ILIs
* Retrieved from <https://github.com/globalwordnet/schemas>:
- `WN-LMF-1.1.dtd` for validating the generated XML files
* Retrieved from <http://wordnetcode.princeton.edu/>:
- `WordNet-3.0` WNDB data files for creating the `wn30` lexicon
- `WordNet-3.1` WNDB data files for creating the `wn31` lexicon
File renamed without changes.