-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandoc metadata as representation of JATS metadata #8359
Comments
There isn't currently a standardized structured metadata format that will work optimally with all formats pandoc supports. The JATS writer supports JATS-specific structured metadata, as you've illustrated. But should the JATS reader produce this too? That would be very useful if you're going to re-render as JATS. (Then again, converting JATS to JATS is not so useful.) But if you're going to be rendering some other format, then you'd prefer to have something every pandoc format can handle, which is what the JATS reader currently gives you. |
I think @tarleb has done some thinking about standardizing structured metadata, e.g. in his scholarly markdown project, so he may want to comment. |
Great point that I very much agree with. |
For reference, I will use this closed issue as a high-level level nexus for other more specific issues that relate to pandoc metadata representing JATS metadata. "JATS" is ambiguous since there are so many dialects of JATS. I can suggest some names for dialects. I list them in rough order from least specific to most specific:
|
@kamoe, here's a summary of issues with pandoc attempting to represent JATS metadata. There are issues where the pandoc reader incorrectly represent metadata in JATS: Then there's PMC & pandoc JATS metadata that isn't read at all and absent from pandoc metadata from the reader: Last but not least, in addition to the above, there are more JATS elements documented on https://pandoc.org/jats.html and show up in PMC XML but do not appear pandoc metadata from the JATS reader:
My solution to all these problems is the not use pandoc and instead use an XML parser. The fixes and enhancements that I would actually use are improvements/fixes to processing of not metadata, but rather marked-up text (e.g. #8847). |
Thanks for this @castedo. I note all your comments and concerns, and will take a good look at this. I'm very interested from the perspective of the implications for a future BITS reader, so this is all very relevant. The more bugs JATS gets addressed, the less issues BITS inherits! |
In using pandoc I've encountered issues that I'm not sure whether to consider inside or outside the scope of what pandoc should handle.
This issue/feature of pandoc metadata representing JATS metadata can probably be closed, but I wanted to share my usage scenario and double check what is outside of scope. To frame the scope, I suspect the following question is useful:
What is the pandoc metadata for JATS supposed to be? Is it:
Currently it seems the answer is primarily 1) and optionally 2), and not 3). I'd say pandoc currently does a poor job doing 3) which I hope is because that's out of scope.
Here's a concrete usage case that I'm affected by which illustrates some of the issues. In my YAML header I have the following metadata for pandoc:
which outputs the following JATS XML:
That JATS XML if converted back into YAML+markdown via pandoc becomes:
If pandoc metadata is supposed to be primarily 1) and secondarily 2) then this seems fine, and this issues can be closed. If not, then I can file some more issues. I am currently starting to use separate Python libraries to extract metadata from JATS XML.
Thank y'all for such a wonderful tool!
[1] https://en.wikipedia.org/wiki/Passive_data_structure
The text was updated successfully, but these errors were encountered: