The goal of the GNames project is to provide an accurate and fast verification of scientific names in unlimited quantities. The verification should be fast (at least 1000 names per second) and include exact and fuzzy matching of input strings to scientific names aggregated from a large number of data-sources.
In case if you do not need exact records of matched names from data-sources, and just want to know if a name-string is known, you can use GNmatcher instead of this project. The GNmatcher is significantly faster and has simpler output.
- Usage as API
- Usage with GNverifier
- Known limitations of the verification
- Fast verification of unlimited number of scientific names.
- Multiple levels of verification:
Exactmatching (exact string match for viruses, exact canonical form match for Plantae, Fungi, Bacteria, and Animalia).
Fuzzymatching detects human and/or Optical Character Recognition (OCR) errors without producing large number of false positives. To avoid false positives uninomial names only checked for exact match.
PartialExactmatching happens when a match for the full name-string is not found. In such cases middle or end words are removed and each variant is verified. Matches of names with the last word intact does have a preference.
PartialFuzzymatching is provided for partial matches of species and infraspecies. To avoid false positives uninomials only checked for exact match.
Virusmatching provides viruses verification.
FacetedSearchallows to use flexible query language for searching.
- Providing names information from data-sources that contain a particular name.
- Returning the "best" result. The
BestResultis calculated by a scoring algorithm.
- Optionally, limiting results to data-sources that are important to a GNames user.
- Returning the "best" result. The
- Providing outlink URLs to some data-sources websites to show the original record of a name.
- Providing meta-information about aggregated data-sources.
Most of the users do not need to install GNames and can use remote GNames
API service at
http://verifier.globalnames.org/api/v1 or use a command line
client GNverifier. Nevertheless, it is possible to install a local copy of
- A Linux-based operating system.
- At least 32GB of memory.
- At least 50GB of a free disk space.
- Fast Internet connection during installation. After installation GNames can operate without remote connection.
- PostgreSQL database.
We are not covering basics of PostgreSQL administration here. There are many tutorials and resources for Linux-based operating systems that can help.
Create a database named
gnames. Download the gnames database dump. Restore the database with:
gunzip -c gnames_latest.tar.gz |pg_restore -d gnames
Refer to the GNmatcher documentation for its installation.
Download the latest release of GNames, unpack it and place somewhere in the
gnames -V. It will show you the version of
GNamesand also generate
$HOME/.config/gnames.yamlaccording to your preferences.
Try it by running
gnames rest -p 8888
To load service automatically you can create systemctl configuration for the service, if your system supports systemctl.
Alternatively you can use docker image to run GNames. You will need to create a file with corresponding environment variables that are described in the .env.example file.
docker pull gnames/gnames:latest docker run -env_file path_to_env_file -d -i -t -p 8888:8888 \ gnames/gnames:latest rest -p 8888
We provide an example of environment file. Environment variables override configuration file settings.
Configuration settings can either be given in the config file
$HOME/.config/gnames.yaml, or by setting the following
The meaning of configuration settings are provided in the default gnames.yaml.
Please note, that currently developed API (documentation) is
publically served at
If you installed GNames locally and want to run its API, run:
# to change from default 8888 port
gnames rest -p 8787
Refer to GNames' RESTful API Documentation about interacting with GNames API.
GNverifier also provides web-based user interface to GNames. To launch it use something like:
gnverifier -p 8777
By default, Logs are not shown. To enable the service logs change
true in the configuration file.
To aggregate logs with an NSQ messaging service, provide an address for
TCP service of
nsqd, for example
0.0.0.0:4150 by changing
WebLogsNsqdTCP in configuration file, or
Exact matches of misspellings that might exist in poorly curated databases prevent to find fuzzy matches from better curated sources.
To increase performance we stop any further tries if a name matched successfully. This prevents fuzzy-matching if a misspelled name is found somewhere. It is helpful to check 'curation' field of returned result, and see how many data-sources do contain the name.
Fuzzy matching of a name where genus string is broken by a space.
For example, we cannot match 'Abro stola triplasia' to 'Abrostola triplasia'. There is only 1 edit distance between the strings, however we stem specific epithets, so in reality we fuzzy-match 'Abro stol triplas' to 'Abrostola triplas'. That means now we have edit distance 2 which is usually beyond our threshold.
- Install Go language for your Linux operating system.
- Create PostgreSQL database as described in installation.
- Clone the GNames code.
- Clone the GNmatcher and set it up for development.
- Install docker and docker compose.
- Go to your local
- In another terminal window run
go test ./...
GNames code is released under MIT license.