Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing genus names with more than one hyphen #203

Closed
tobymarsden opened this issue Nov 13, 2021 · 4 comments
Closed

Parsing genus names with more than one hyphen #203

tobymarsden opened this issue Nov 13, 2021 · 4 comments

Comments

@tobymarsden
Copy link

Parsing fails with the genus Prunus-lauro-cerasus. Though this is a synonym, it does appear in the literature so parsing would be helpful, and I can't see any prohibitions in the ICBN against more than one hyphen in a genus name.

@dimus
Copy link
Member

dimus commented Nov 14, 2021

Good catch @tobymarsden. I would like to have this rule more strict.

I checked gnverifier names
with ripgrep: rg "^([\p{L}]+-[\p{L}]+){2,}.*?\b" all-names-2021-11-14.txt and it looks like there is nothing reasonable with more than 2 dashes, and only these genera seem to be 'real enough` (with various capitalizations):

Iulo-eido-coprolites
Johnson-sea-linkia
Para-bary-thelphusa
Para-lio-thelphusa
Para-peri-thelphusa
Prunus-lauro-cerasus
Tsugo-piceo-picea

I see nothing useful with 3 or more dashes.

Searching with rg "\b[a-z]([a-z]*-[a-z]*){2,}.*?\b" all-names-2021-11-14.txt gives quite a few 2-dash specific epithets, and there are even a few that seem to be real when I search for 3 dashes or more with rg "\b[a-z]([a-z]*-[a-z]*){3,}.*?\b" all-names-2021-11-14.txt

So I am on a fence about this one. It seems that allowing up to 2 dashes would keep most of false positives unparsed, but also
would ignore 2 epithets that have more than 2 dashes. Let me talk to our botanists and zoologists on Monday.

I recalled that we did have this conversation about epithets already with out taxonomists, and, as a result, multi-dashes are allowed.
So I think for genera it makes sense to limit them to 2 dashes for now, and if necessity arises, allow for multi-dashes. What do you think @tobymarsden?

@tobymarsden
Copy link
Author

@dimus Thanks for the explanation! And on your weekend, too. I've updated the PR to accept up to two dashes for genera.

@dimus dimus closed this as completed in aedb0b2 Nov 14, 2021
dimus added a commit that referenced this issue Nov 14, 2021
Support genus names with multiple hyphens; closes #203
@dimus
Copy link
Member

dimus commented Nov 14, 2021

Looks good @tobymarsden I am going to add a couple of more tests after merge

dimus added a commit that referenced this issue Nov 14, 2021
@tobymarsden
Copy link
Author

@dimus Amazing - thanks! Now that Kew parses I'll check World Flora 😂

dimus added a commit that referenced this issue Nov 14, 2021
dimus added a commit that referenced this issue Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants