Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of complex author strings #184

Closed
KatjaSchulz opened this issue Aug 9, 2021 · 4 comments
Closed

Improve parsing of complex author strings #184

KatjaSchulz opened this issue Aug 9, 2021 · 4 comments

Comments

@KatjaSchulz
Copy link

Some names with complex author strings don’t get parsed properly, resulting in part of the author string being interpreted as an epithet. I have seen this happen if the following declarations are included in the author string: non|nec|fide|vide|ms

Examples:

names string > gnparser FullCanonical
Eulima excellens Verkrüzen fide Paetel, 1887 > Eulima excellens fide
Amathia tricornis Busk ms in Chimonides, 1987 > Amathia tricornis ms
Crisia eburneodenticulata Smitt ms in Busk, 1875 > Crisia eburneodenticulata ms
Procamallanus (Spirocamallanus) soodi Lakshmi & Kumari, 2001 nec (Gupta & Masood, 1988) > Procamallanus soodi nec
Membranipora minuscula Canu, 1911 non Hincks, 1882 > Membranipora minuscula non
Hornera radians Defrance, 1821 non (Lamarck, 1816) > Hornera radians non
Hornera verrucosa Reuss, 1851 non Reuss, 1848 > Hornera verrucosa non
Crisina excavata (d'Orbigny, 1853) non (d'Orbigny, 1853) > Crisina excavata non
Proboscina subechinata Canu & Bassler, 1920 non d'Orbigny, 1853 > Proboscina subechinata non
Diaperoecia rugosa Canu & Bassler, 1920 non Osburn, 1940 > Diaperoecia rugosa non
Plagioecia parvipora (Canu & Bassler, 1929) non Canu, 1922 > Plagioecia parvipora non
Diastopora papyracea (d'Orbigny, 1853) non d'Orbigny, 1851 > Diastopora papyracea non
Mesenteripora foliacea (d'Orbigny, 1852) non (Lamouroux, 1821) > Mesenteripora foliacea non
Crisisina carinata (Römer, 1840) non (Reuss, 1846) > Crisisina carinata non
Berenicea undata Canu & Bassler, 1920 non Canu, 1931 > Berenicea undata non
Berenicea stipata Canu & Bassler, 1920 non Canu, 1917 > Berenicea stipata non
Multicrescis mamillosa Canu & Bassler, 1926 non (Römer, 1840) > Multicrescis mamillosa non
Calloporella lamellaris (Bekker, 1921) non (Modzalevskaya, 1955) > Calloporella lamellaris non
Homotrypa similis Foord, 1883 non Caley, 1936 > Homotrypa similis non
Monticulipora affinis Počta, 1902 non (Ulrich, 1890) > Monticulipora affinis non
Stenopora permiana Yang, 1958 non (Bassler, 1929) > Stenopora permiana non
Stenopora meekana (Girty, 1907) non Ulrich, 1890 > Stenopora meekana non
Meliceritites transversa Canu & Bassler, 1926 non (d'Orbigny, 1852) > Meliceritites transversa non
Antedon longicirra (AH Clark, 1912) non Carpenter, 1888 > Antedon longicirra non
Porina reussi Meneghini in De Amicis, 1885 vide Neviani (1900) > Porina reussi vide

As far as I know non, nec, ms, fide, or vide are not legitimate epithets for any species or subspecies. Catalogue of Life has a few ciliates with “non” as the infraspecific epithet, but the GSD that provides these names has all kind of other data quality problems, so I think these epithets are probably also artifacts due to similar parsing errors in the past.

@dimus
Copy link
Member

dimus commented Aug 10, 2021

@KatjaSchulz, do I understand correctly that 'Aus bus Beck in Ken', 'Aus bus Beck ms in Ken', 'Aus bus Beck ex Ken' are all variants of the same?

@KatjaSchulz
Copy link
Author

Yes, I think these can be variants of the same name, albeit with slight differences in meaning. They all indicate that the author of the name (Beck) is not the author of the work in which the name was published. The ms (or sometimes also MS) apparently indicates that the name originated in an unpublished manuscript, e.g., here's the Chimonides, 1987 reference for "Amathia tricornis Busk ms in Chimonides, 1987": https://www.biodiversitylibrary.org/page/2301924

@dimus
Copy link
Member

dimus commented Aug 11, 2021

What do you think about treating it this way @KatjaSchulz? It is similar to how we currently do it for in and ex

{
  "parsed": true,
  "quality": 2,
  "qualityWarnings": [
    {
      "quality": 2,
      "warning": "Ex authors are not required"
    }
  ],
  "verbatim": "Amathia tricornis Busk ms in Chimonides, 1987",
  "normalized": "Amathia tricornis Busk ex Chimonides 1987",
  "canonical": {
    "stemmed": "Amathia tricorn",
    "simple": "Amathia tricornis",
    "full": "Amathia tricornis"
  },
  "cardinality": 2,
  "authorship": {
    "verbatim": "Busk ms in Chimonides, 1987",
    "normalized": "Busk ex Chimonides 1987",
    "authors": [
      "Busk"
    ],
    "originalAuth": {
      "authors": [
        "Busk"
      ],
      "exAuthors": {
        "authors": [
          "Chimonides"
        ],
        "year": {
          "year": "1987"
        }
      }
    }
  }
}

@dimus dimus closed this as completed in 425441e Aug 11, 2021
@KatjaSchulz
Copy link
Author

Looks perfect, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants