-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Materials vitation validation problem #43
Comments
@gsautter: I'm surprised by this. In the TaxPub DTD you are using, is this not the content model of material-citation:
That model allows a mix of text and other elements or just text. |
Thanks ... seems like the server component really uses the wrong DTD (downloaded from Pensoft) ... the validation error I'm getting from that is this:
Can you give me a link to the (obviously more permissive) version of the DTD you mention above? |
The "latest" release of TaxPub is here: https://github.com/plazi/TaxPub/tree/v1.0-gamma (point to https://github.com/plazi/TaxPub/blob/v1.0-gamma/tax-treatment-NS0-v1.dtd) and the latest release candidate of what will the next version is here: https://github.com/plazi/TaxPub/tree/v1.0.0-rc2 (point to https://github.com/plazi/TaxPub/blob/v1.0.0-rc2/tax-treatment-NS0-v1.dtd) Either should have the updated material-citation model. |
Thanks a lot, this is exactly what I've been looking for ... will the latest release always be under the same URL? |
While testing validation against the URL provided version of the TaxPub DTD (still thanks for the link), I encountered one error: |
@gsautter see: plazi/TaxPub#53 (comment). That is, download the "official" JATS from https://ftp.ncbi.nih.gov/pub/jats/publishing/1.1/JATS-Publishing-1-1-MathML3-DTD.zip and place these files from https://github.com/plazi/TaxPub/tree/v1.0-gamma
alongside the downloaded JATS files and validate against tax-treatment-NS0-v1.dtd Does that work? |
It does, see also my comment in plazi/TaxPub#53 However, this kind of surprise and need for the mentioned workaround is lurking for each and everyone attempting to use TaxPub straight from its home repo ... to add to the confusion, the TaxPub repo does contain a good bunch of the required Another option would be to somewhere prominently state that the TaxPub repo is an extension to the JATS DTD, all of whose files are available at some URL that is linked to right in that very explanation (or maybe something like this is already in place and I simply missed it). |
The above approach (downloading JATS from https://ftp.ncbi.nih.gov/pub/jats/publishing/1.1/JATS-Publishing-1-1-MathML3-DTD.zip and adding the TaxPub specific files from the repo) doesn't work out of the box, either ... looks as though the NCBI provided JATS ZIP has a few MathML issues in itself, namely seeking Not our issue, but an upstream one that affects us, any anyone who tries to validate their TaxPub XML ... so maybe we should make the repo self-contained after all, if only to be able to provide the extra files under the required names. |
@tcatapano looks as though there is a validation problem with marking materials citations in TaxPub treatments, but not including the details ... https://tb.plazi.org/GgServer/taxPubL1/038187E51607FFA6FF09E976FF5BF838 appears to be perfectly fine, but comes up as invalid anyway, and I think the reason is that
tp:material-citation
does not accept textual content.A possible solution for the plain materials citations is to enclose the whole string in an extra
named-content
element withcontent-type
set todwc:verbatimLabel
... redundant, but it'd most likely fix this one.A bigger problem will arise at higher detail levels, when we start including the detail annotations ... the details proper transform into
named-content
just fine, but the punctuation marks in between these details will cause the same validation problem described above, as will any interspersed plain text ... dropping all the plain text portions might solve the validation problem, but on the other hand removes any way of using the TaxPub treatments as training data for a materials citation tagger or parser, simply because it would render recovering the plain materials citation string irrecoverable and hence thwart any meaningful use as training data.Yet another problem arises from how to represent the implied details that result from resolving phrases like "same data as holotype" against preceding materials citations ...
One way of representing this might be to only ever represent materials citations as a single
named-content
with typedwc:verbatimLabel
, and adding a sibling below the sametp:material-citation
to act as a container for the parsed and normalized details ... not sure if TaxPub can handle that approach in its current status, though ...The text was updated successfully, but these errors were encountered: