Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse nasty names with ambivalent specific epithets #53

Closed
dimus opened this issue Dec 18, 2020 · 10 comments
Closed

Parse nasty names with ambivalent specific epithets #53

dimus opened this issue Dec 18, 2020 · 10 comments
Assignees

Comments

@dimus
Copy link
Member

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/53

@diatomsRcool, @KatjaSchulz and @joelnitta found the following names:

Acrostichum nudum
Adiantum nudum
Africanthion nudum
Agathidium nudum
Aphaniosoma nudum
Aspidium nudum
Athyrium nudum
Bembidion satellites
Blechnum nudum
Bolivina prion
Boreophilia nomensis
Bottaria nudum
Erateina satellites
Gnathopleustes den
Ithomeis satellites
Lycopodium nudum
Navicula bacterium
Nephodia satellites
Nephrodium nudum
Paralvinella dela
Phelodon nomene
Polypodium nudum
Polystichum nudum
Psilotum nudum
Ruteloryctes bis
Selenops ab
Tortolena dela
Trachyphloeosoma nudum
Xestia cfuscum
Zodarion van

We need to double check that they are 'real' and whitelist the real ones in rules O.o

@dimus dimus self-assigned this Dec 18, 2020
@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/43

mentioned in issue #86

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/44

changed the description

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/45

changed the description

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/46

Thanks for the kind words and more names for this ticket @joelnitta. I added them to the description of the issue.

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/47

changed the description

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @joelnitta at https://gitlab.com/gogna/gnparser/-/issues/48

Thanks for the great program! This is a lifesaver for taxonomic workflows.
These are some additions (all names of ferns) for the whitelist whenever that happens:

Acrostichum nudum, Adiantum nudum, Aspidium nudum, Athyrium nudum, Blechnum nudum, Lycopodium nudum, Nephrodium nudum, Polypodium nudum, Polystichum nudum, Psilotum nudum

@dimus
Copy link
Member Author

dimus commented Dec 18, 2020

created by @dimus at https://gitlab.com/gogna/gnparser/-/issues/49

Also from GlobalNamesArchitecture/gnparser#331

Looks like "le" is used as part of an author and as part of a specific epithet. I also have a suspision that names with "le" as specific epithet are really have epithet separated by a space!

http://gni.globalnames.org/name_strings?commit=Search&page=2&search_term=sp%3Ale

We probably should make a dictionary where it is an author and where it is a name, and normalize them accordingly... Big job.

@KatjaSchulz
Copy link

Hi Dima,

Here are a few more names for your whitelist. These are all from the current version of the Catalogue of Life (COL-2021-06-10):

Navicula bacterium (diatom)
Xestia cfuscum (moth)
Bolivina prion (foraminifer)
Bembidion satellites (beetle)
Erateina satellites (moth)
Ithomeis satellites (butterfly)
Nephodia satellites (moth)

Also, names don’t get parsed if the generic name is too short, but there are a few two letter genera:

Do holotrichius (beetle)
Oo spinosum (arachnid)
Nu aakhu (annelid)

Maybe these could get whitelisted, too?

@dimus
Copy link
Member Author

dimus commented Aug 1, 2021

@KatjaSchulz thanks for more 'nasty' names, I am going to bump priority up for this issue

@dimus
Copy link
Member Author

dimus commented Aug 1, 2021

Oh, I thought I have all two-letter genera accounted for:

TwoLetterGenus <- ('Ca' / 'Ea' / 'Ge' / 'Ia' / 'Io' / 'Ix' / 'Lo' / 'Oa' /
'Ra' / 'Ty' / 'Ua' / 'Aa' / 'Ja' / 'Zu' / 'La' / 'Qu' / 'As' / 'Ba')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants