Skip to content

Commit

Permalink
towards plazi patch; related to globalbioticinteractions/nomer#23
Browse files Browse the repository at this point in the history
  • Loading branch information
jhpoelen committed Oct 2, 2020
1 parent 07fd4f6 commit 7ad1619
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 0 deletions.
35 changes: 35 additions & 0 deletions 20201001-01/README
@@ -0,0 +1,35 @@
Global Biotic Interactions: Taxon Graph Patch 20201001-01

2020-10-01

Abstract / Introduction

Plazi.org indexes existing taxonomic treatments and make them machine readable. This patch adds links to Plazi taxonomic treatments (e.g., http://treatment.plazi.org/id/873F269E609FF24F40BEA54C44321A15), their original publications (e.g., http://dx.doi.org/10.3897/BDJ.3.e6313) and the associated taxon concept (e.g., http://taxon-concept.plazi.org/id/Animalia/Lethe_appalachia_Chermock_1947).

Methods

Nomer v0.1.17 [1] is used in provided create_patch.sh script to
link names of taxonMap.tsv.gz in GloBI Taxon Graph v0.3.25 [2] to
Plazi treatments and taxon concepts in Plazi Treatment RDF Archive [3].

To apply the patch, run the "patch" program against the original
taxonCache/taxonMap files with the .patch files on any linux-y system.

Patch files taxonMap.tsv.patch.gz and taxonCache.tsv.patch.gz were
created using: create_patch.sh in period 2020-10-01/2020-10-02 .

Result/Discussion

The resulting patch adds Plazi links to GloBI Taxon Graph v0.3.25 .

References

[1] Poelen, Jorrit H. (2020). globalbioticinteractions/nomer
(Version 0.1.17). Zenodo. http://doi.org/10.5281/zenodo.4062515

[2] Poelen, Jorrit H. (2020). Global Biotic Interactions: Taxon Graph
(Version 0.3.25) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3378125

[3] Plazi. (2020). Plazi Treatment RDF Archive (Version 0.1)
[Data set]. Zenodo. http://doi.org/10.5281/zenodo.4062537

80 changes: 80 additions & 0 deletions 20201001-01/create_patch.sh
@@ -0,0 +1,80 @@
#!/bin/bash
#
#

set -xe

mkdir -p input output

# get a Plazi Treatments RDF Archive
curl "https://zenodo.org/record/4062537/files/plazi-treatments-rdf.zip" > input/plazi-treatments-rdf.zip

# get GloBI taxon graph 0.3.25
curl https://zenodo.org/record/3378125/files/taxonMap.tsv.gz > input/taxonMap.tsv.gz
curl https://zenodo.org/record/3378125/files/taxonCache.tsv.gz > input/taxonCache.tsv.gz

# get nomer v0.1.17
curl "https://github.com/globalbioticinteractions/nomer/releases/download/0.1.17/nomer.jar" > input/nomer.jar

# link names to Plazi
echo 'nomer.schema.input=[{"column":2,"type":"externalId"},{"column": 3,"type":"name"}]' > input/nomer.properties
echo "nomer.plazi.treatments.archive=file://$PWD/input/plazi-treatments-rdf.zip" >> input/nomer.properties

cat taxonMap.tsv.gz \
| gunzip\
| awk -F '\t' '{ print $1 "\t" $2 "\t\t" $4 }'\
| tail -n+2\
| sort\
| uniq\
| java -jar input/nomer.jar append -p input/nomer.properties plazi\
| gzip\
> plazi-matches.tsv.gz

zcat plazi-matches.tsv.gz\
| grep -v NONE\
| cut -f1,2,6,7\
| gzip
> output/taxonMapPlazi.tsv.gz

zcat plazi-matches.tsv.gz\
| grep -v NONE\
| cut -f6-\
| gzip
> output/taxonCachePlazi.tsv.gz

cat input/taxonCache.tsv.gz\
| head -n1
| gzip
> output/taxonCache1.tsv.gz

cat input/taxonCache.tsv.gz output/taxonCachePlazi.tsv.gz\
| gunzip\
| tail -n+2\
| sort\
| uniq\
| gzip\
>> output/taxonCache1.tsv.gz

cat input/taxonMap.tsv.gz\
| head -n1
| gzip
> output/taxonMap1.tsv.gz

cat input/taxonMap.tsv.gz output/taxonMapPlazi.tsv.gz\
| gunzip\
| tail -n+2\
| sort\
| uniq\
| gzip\
>> output/taxonMap1.tsv.gz

diff <(cat input/taxonCache.tsv.gz | gunzip) <(cat output/taxonCache1.tsv.gz| gunzip) | gzip > output/taxonCache.tsv.patch.gz
diff <(cat input/taxonMap.tsv.gz | gunzip) <(cat output/taxonMap1.tsv.gz| gunzip) | gzip > output/taxonMap.tsv.patch.gz

zcat input/taxonCache.tsv.gz > output/taxonCacheToBePatched.tsv
zcat output/taxonCache.tsv.patch.gz | patch -b output/taxonCacheToBePatched.tsv
cat output/taxonCacheToBePatched.tsv | gzip > output/taxonCachePatched.tsv.gz

zcat input/taxonMap.tsv.gz > output/taxonMapToBePatched.tsv
zcat output/taxonMap.tsv.patch.gz | patch -b output/taxonMapToBePatched.tsv
cat output/taxonMapToBePatched.tsv | gzip > output/taxonMapPatched.tsv.gz

0 comments on commit 7ad1619

Please sign in to comment.