Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
towards plazi patch; related to globalbioticinteractions/nomer#23
- Loading branch information
jhpoelen
committed
Oct 2, 2020
1 parent
07fd4f6
commit 7ad1619
Showing
2 changed files
with
115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
Global Biotic Interactions: Taxon Graph Patch 20201001-01 | ||
|
||
2020-10-01 | ||
|
||
Abstract / Introduction | ||
|
||
Plazi.org indexes existing taxonomic treatments and make them machine readable. This patch adds links to Plazi taxonomic treatments (e.g., http://treatment.plazi.org/id/873F269E609FF24F40BEA54C44321A15), their original publications (e.g., http://dx.doi.org/10.3897/BDJ.3.e6313) and the associated taxon concept (e.g., http://taxon-concept.plazi.org/id/Animalia/Lethe_appalachia_Chermock_1947). | ||
|
||
Methods | ||
|
||
Nomer v0.1.17 [1] is used in provided create_patch.sh script to | ||
link names of taxonMap.tsv.gz in GloBI Taxon Graph v0.3.25 [2] to | ||
Plazi treatments and taxon concepts in Plazi Treatment RDF Archive [3]. | ||
|
||
To apply the patch, run the "patch" program against the original | ||
taxonCache/taxonMap files with the .patch files on any linux-y system. | ||
|
||
Patch files taxonMap.tsv.patch.gz and taxonCache.tsv.patch.gz were | ||
created using: create_patch.sh in period 2020-10-01/2020-10-02 . | ||
|
||
Result/Discussion | ||
|
||
The resulting patch adds Plazi links to GloBI Taxon Graph v0.3.25 . | ||
|
||
References | ||
|
||
[1] Poelen, Jorrit H. (2020). globalbioticinteractions/nomer | ||
(Version 0.1.17). Zenodo. http://doi.org/10.5281/zenodo.4062515 | ||
|
||
[2] Poelen, Jorrit H. (2020). Global Biotic Interactions: Taxon Graph | ||
(Version 0.3.25) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3378125 | ||
|
||
[3] Plazi. (2020). Plazi Treatment RDF Archive (Version 0.1) | ||
[Data set]. Zenodo. http://doi.org/10.5281/zenodo.4062537 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
#!/bin/bash | ||
# | ||
# | ||
|
||
set -xe | ||
|
||
mkdir -p input output | ||
|
||
# get a Plazi Treatments RDF Archive | ||
curl "https://zenodo.org/record/4062537/files/plazi-treatments-rdf.zip" > input/plazi-treatments-rdf.zip | ||
|
||
# get GloBI taxon graph 0.3.25 | ||
curl https://zenodo.org/record/3378125/files/taxonMap.tsv.gz > input/taxonMap.tsv.gz | ||
curl https://zenodo.org/record/3378125/files/taxonCache.tsv.gz > input/taxonCache.tsv.gz | ||
|
||
# get nomer v0.1.17 | ||
curl "https://github.com/globalbioticinteractions/nomer/releases/download/0.1.17/nomer.jar" > input/nomer.jar | ||
|
||
# link names to Plazi | ||
echo 'nomer.schema.input=[{"column":2,"type":"externalId"},{"column": 3,"type":"name"}]' > input/nomer.properties | ||
echo "nomer.plazi.treatments.archive=file://$PWD/input/plazi-treatments-rdf.zip" >> input/nomer.properties | ||
|
||
cat taxonMap.tsv.gz \ | ||
| gunzip\ | ||
| awk -F '\t' '{ print $1 "\t" $2 "\t\t" $4 }'\ | ||
| tail -n+2\ | ||
| sort\ | ||
| uniq\ | ||
| java -jar input/nomer.jar append -p input/nomer.properties plazi\ | ||
| gzip\ | ||
> plazi-matches.tsv.gz | ||
|
||
zcat plazi-matches.tsv.gz\ | ||
| grep -v NONE\ | ||
| cut -f1,2,6,7\ | ||
| gzip | ||
> output/taxonMapPlazi.tsv.gz | ||
|
||
zcat plazi-matches.tsv.gz\ | ||
| grep -v NONE\ | ||
| cut -f6-\ | ||
| gzip | ||
> output/taxonCachePlazi.tsv.gz | ||
|
||
cat input/taxonCache.tsv.gz\ | ||
| head -n1 | ||
| gzip | ||
> output/taxonCache1.tsv.gz | ||
|
||
cat input/taxonCache.tsv.gz output/taxonCachePlazi.tsv.gz\ | ||
| gunzip\ | ||
| tail -n+2\ | ||
| sort\ | ||
| uniq\ | ||
| gzip\ | ||
>> output/taxonCache1.tsv.gz | ||
|
||
cat input/taxonMap.tsv.gz\ | ||
| head -n1 | ||
| gzip | ||
> output/taxonMap1.tsv.gz | ||
|
||
cat input/taxonMap.tsv.gz output/taxonMapPlazi.tsv.gz\ | ||
| gunzip\ | ||
| tail -n+2\ | ||
| sort\ | ||
| uniq\ | ||
| gzip\ | ||
>> output/taxonMap1.tsv.gz | ||
|
||
diff <(cat input/taxonCache.tsv.gz | gunzip) <(cat output/taxonCache1.tsv.gz| gunzip) | gzip > output/taxonCache.tsv.patch.gz | ||
diff <(cat input/taxonMap.tsv.gz | gunzip) <(cat output/taxonMap1.tsv.gz| gunzip) | gzip > output/taxonMap.tsv.patch.gz | ||
|
||
zcat input/taxonCache.tsv.gz > output/taxonCacheToBePatched.tsv | ||
zcat output/taxonCache.tsv.patch.gz | patch -b output/taxonCacheToBePatched.tsv | ||
cat output/taxonCacheToBePatched.tsv | gzip > output/taxonCachePatched.tsv.gz | ||
|
||
zcat input/taxonMap.tsv.gz > output/taxonMapToBePatched.tsv | ||
zcat output/taxonMap.tsv.patch.gz | patch -b output/taxonMapToBePatched.tsv | ||
cat output/taxonMapToBePatched.tsv | gzip > output/taxonMapPatched.tsv.gz |