Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how does offline name-finding (non-verification) work? #122

Closed
abubelinha opened this issue Apr 4, 2022 · 2 comments
Closed

how does offline name-finding (non-verification) work? #122

abubelinha opened this issue Apr 4, 2022 · 2 comments

Comments

@abubelinha
Copy link

abubelinha commented Apr 4, 2022

I am wondering how the "verification": false option works.

I would expect that gnfinder checks words which look like known Latin/latinized names or epithets, and "guesses" they are part of a scientific name.
So I tried to change one letter in names (i.e. from Quercus toza to Quercus tozza in example #121), to check if they were returned as fuzzy matches (with verification=true and then with verification=false)

Although Quercus toza is found to be a scientific name in both cases, Quercus tozza is not (and gnfinder returns just "Quercus" genus match).

I can understand a genus-only match when verification=true if for some reason the fuzzy algorithm was not able to match toza to tozza.
But with verification=false, I was expecting gnfinder to find anything that "looks like" a scientific name, even if it was never published, just by looking at its separate words as being part of other names (i.e., "Homo sylvestris")

... includes names verification against many biological databases. For full functionality it requires an Internet connection
... no external dependencies, only binary gnfinder or gnfinder.exe (~15Mb) is needed. However the internet connection is required for name-verification

So ... what should I expect with verify=false? How does gnfinder make decissions in that case?
If no internet connection is used, why does gnfinder say that Quercus toza is a valid name, but Quercus tozza is not?
It looks like as if names are being "verified" against name sources anyway (although no verification is output in json result).

@abubelinha abubelinha changed the title how does offline matching (non-verification) work? how does offline name-finding (non-verification) work? Apr 4, 2022
@dimus
Copy link
Member

dimus commented Apr 4, 2022

Name finding uses dictionaries and heuristic rules as the first pass, Bayes as a second pass. So if a name is simple and has a misspelling, usually it will be ignored. However, if Bayes algorithms collect enough "points" they will register a name candidate. The rules are relaxed, and verification is an important step to weed out name-like combinations.

@dimus
Copy link
Member

dimus commented Apr 4, 2022

Internet connection is not needed if -U option is used, and the input is a plain utf-8 text

@dimus dimus closed this as completed Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants