gnverify

IMPORTANT: This project is moved to https://github.com/gnames/gnverifier

Takes a name or a list of names and verifies them against a variety of biodiversity Data Sources

Features
Installation
Usage
Copyright

Features

Small and fast app to verify scientific names against many biodiversity databases.
Has 4 different match levels:
- Exact: complete match with a canonical form or full name-string from a data source.
- Fuzzy: if exact match did not happen, it tries to match name-strings assuming spelling errors.
- Partial: strips middle or last epithets from bi- or multi-nomial names and tries to match what is left.
- PartialFuzzy: the same as Partial but assuming spelling mistakes.
Taxonomic resolution. If a database contains taxonomic information, returns currently accepted name for a name-string, if it is different from the matched name.
Best match is returned according to the match score. Data sources with some manual curation have priority over auto-curated and uncurated datasets. For example Catalogue of Life or WoRMS are considered curated, GBIF auto-curated, uBio not curated.
It is possible to map any name-strings checklist to any of registered Data Sources.
If a Data Source provides classification for a name, it will be returned in the output.
Works for checking just one name-string, or multiple ones written in a file.
Supports feeding data via pipes of an operating system. This feature allows to chain the program together with other tools.

Installation

Using Homebrew on Mac OS, Linux, and Linux on Windows X (WSL2)

Homebrew is a popular package manager for Open Source software originally developed for Mac OS X. Now it is also available on Linux, and can easily be used on Windows 10, if Windows Subsystem for Linux (WSL) is installed.

To use gnverify with Homebrew:

Install Homebrew
Open terminal and run the following commands:

brew tap gnames/gn
brew install gnverify

MS Windows

Download the latest release from github, unzip.

One possible way would be to create a default folder for executables and place gnverify there.

Use Windows+R keys combination and type "cmd". In the appeared terminal window type:

mkdir C:\Users\your_username\bin
copy path_to\gnverify.exe C:\Users\your_username\bin

Add C:\Users\your_username\bin directory to your PATH environment variable.

Another, simpler way, would be to use cd C:\Users\your_username\bin command in cmd terminal window. The gnverify program then will be automatically found by Windows operating system when you run its commands from that directory.

You can also read a more detailed guide for Windows users in a PDF document.

Linux and Mac

Download the latest release from github, untar, and install binary somewhere in your path.

tar xvf gnverify-linux-0.1.0.tar.xz
# or tar xvf gnverify-mac-0.1.0.tar.gz
sudo mv gnverify /usr/local/bin

Compile from source

Install Go according to installation instructions

go get github.com/gnames/gnverify/gnverify

Usage

gnverify takes one name-string or a text file with one name-string per line as an argument, sends a query with these data to remote gnames server to match the name-strigs against many different biodiversity databases and returns results to STDOUT either in JSON or CSV format.

As a web service

gnverify -w 8080

You should be able to access web user interface via a browser at http://localhost:8080

One name-string

gnverify "Monohamus galloprovincialis"

Many name-strings in a file

gnverify /path/to/names.txt

The app assumes that a file contains a simple list of names, one per line.

It is also possible to feed data via STDIN:

cat /path/to/names.txt | gnverify

Options and flags

According to POSIX standard flags and options can be given either before or after name-string or file name.

help

gnverify -h
# or
gnverify --help
# or
gnverify

version

gnverify -V
# or
gnverify --version

web_port

Starts gnverify as a web service using entered port

gnverify -w 8080

This command will run user-interface accessible by a browser at http://localhost:8080

format

Allows to pick a format for output. Supported formats are

compact: one-liner JSON.
pretty: prettified JSON with new lines and tabs for easier reading.
csv: (DEFAULT) returns CSV representation.

gnverify -f compact file.txt
# or
gnverify --format="pretty" file.csv

Note that a separate JSON "document" is returned for each separate record, instead of returning one big JSON document for all records. For large lists it significantly speeds up parsin of the JSON on the user side.

sources

By default gnverify returns only one "best" result of a match. If a user has a particular interest in a data set, s/he can set it with this option, and all matches that exist for this source will be returned as well. You need to provide a data source id for a dataset. Ids can be found at the following URL. Some of them are provided in the gnverify help output as well.

Data from such sources will be returned in preferred_results section of JSON output, or with CSV rows that start with "PreferredMatch" string.

gnverify file.csv -s "1,11,172"
# or
gnverify file.tsv --sources="12"
# or
cat file.txt | gnverify -s '1,12'

preferred_only

Sometimes a users wants to map a list of names to a DataSource. They are not interested if name matched anywhere else. In such case you can use the preferred_only flag.

gnverify -p -s '12' file.txt
# or
gnverify --preferred_only --sources='1,12' file.tsv

jobs

If the list of names if very large, it is possible to tell gnverify to run requests in parallel. In this example gnverify will run 8 processes simultaneously. The order of returned names will be somewhat randomized.

gnverify -j 8 file.txt
# or
gnverify --jobs=8 file.tsv

Sometimes it is important to return names in exactly same order. For such cases set jobs flag to 1.

gnverify -j 1 file.txt

Configuration file

If you find yourself using the same flags over and over again, it makes sense to edit configuration file instead. It is located at $HOME/.config/gnverify.yaml. After that you do not need to use command line options and flags.

gnverify file.txt

Copyright

Authors: Dmitry Mozzherin

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
config		config
ent		ent
gnverify		gnverify
io		io
testdata		testdata
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
gnverify.go		gnverify.go
gnverify_test.go		gnverify_test.go
go.mod		go.mod
go.sum		go.sum
interface.go		interface.go
use-gnverify-windows.pdf		use-gnverify-windows.pdf
version.go		version.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gnverify

Features

Installation

Using Homebrew on Mac OS, Linux, and Linux on Windows X (WSL2)

MS Windows

Linux and Mac

Compile from source

Usage

As a web service

One name-string

Many name-strings in a file

Options and flags

help

version

web_port

format

sources

preferred_only

jobs

Configuration file

Copyright

About

Releases

Packages

Languages

gnames/gnverify

Folders and files

Latest commit

History

Repository files navigation

gnverify

Features

Installation

Using Homebrew on Mac OS, Linux, and Linux on Windows X (WSL2)

MS Windows

Linux and Mac

Compile from source

Usage

As a web service

One name-string

Many name-strings in a file

Options and flags

help

version

web_port

format

sources

preferred_only

jobs

Configuration file

Copyright

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages