New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BITS reader #7740
Comments
If BITS is an extension of JATS, then it might be good to explore developing this capacity as a modification of the current JATS writer, rather than a new module. (That avoids lots of duplicated code.) Note that the JATS writer already exports several functions for different JATS variants; the same strategy could be used, perhaps, for BITS? (Just to be clear, I wouldn't want to merge a separate BITS module if BITS is too similar to JATS; that just makes maintenance difficult going forward.) |
Absolutely, the question of reusing JATS code as much as possible is very
relevant. I've been looking into this today. The thing is, I realise we
cannot say that any BITS document is also a JATS document, and in that
sense I think we need to still have two separate readers, if that makes
sense? BITS content models do borrow from JATS models, and also expand in
other ways. I am getting familiar with the code to make sense of the best
ways to model this.
…On Wed, 8 Dec 2021 at 16:58, John MacFarlane ***@***.***> wrote:
If BITS is an extension of JATS, then it might be good to explore
developing this capacity as a modification of the current JATS writer,
rather than a new module. (That avoids lots of duplicated code.) Note that
the JATS writer already exports several functions for different JATS
variants; the same strategy could be used, perhaps, for BITS?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7740 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53G4SLCM5T6623VM6PG7LUP6FERANCNFSM5JTON77Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
If BITS is strictly an extension of JATS, then they could be handled in the same reader. The reader could have something in State, for example, that tells it whether to allow BITS extensions. It could export a separate function readBits that enables this. |
If by extension we mean that a JATS XML document should pass a validation
against a BITS DTD or Schema, then no, BITS is not strictly an extension of
JATS.
JATS elements are not a subset of BITS elements. The valid root elements
are different, to start with. BITS just borrows from some JATS content
models, that's all.
The NCBI describes BITS as a "JATS extension", but it's more of an
intersection of content models, really.
…On Wed, 8 Dec 2021 at 18:51, John MacFarlane ***@***.***> wrote:
If BITS is strictly an extension of JATS, then they could be handled in
the same reader. The reader could have something in State, for example,
that tells it whether to allow BITS extensions. It could export a separate
function readBits that enables this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7740 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53G4UBGYKXSXBRAZWX3FTUP6SLBANCNFSM5JTON77Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Even if BITS isn't strictly a superset of JATS, it still might make sense to implement it as a variant in the JATS module -- it depends on the extent of the divergences, I guess. Another alternative would be to extract some of the common code into an internal module. |
From what I now understand of the JATS reader (15 months after my first comment!), it seems to me that the easiest thing to do would be to just enhance the existing JATS reader (to also support BITS). Just by adding the cases:
to the |
If BITS is basically just JATS plus a few extra elements, then I think that's definitely the way to go. One way to handle this is to have a parameter in the JATSState that controls the "variant" -- settings could be BITS and JATS. The reader could check this variant in places where the behavior would diverge. The module could then export two functions, |
Makes sense. Just to summarize and double check the proposed approach:
Am I getting this right? |
Actually, I just realized there is already a boolean "variant" parameter in the JATSState: pandoc/src/Text/Pandoc/Readers/JATS.hs Lines 53 to 60 in 714be93
Given that the JATS reader was written based on the DocBook reader, and that that spec supports both articles and books, it makes sense that boolean variant existed there (called In DocBook, when the document encounters book-only content, this variant is set to true: pandoc/src/Text/Pandoc/Readers/DocBook.hs Lines 894 to 895 in 509cb9b
pandoc/src/Text/Pandoc/Readers/DocBook.hs Lines 960 to 961 in 509cb9b
And when dealing with article content, it is set to false: pandoc/src/Text/Pandoc/Readers/DocBook.hs Lines 958 to 959 in 509cb9b
Seems like pandoc/src/Text/Pandoc/Readers/JATS.hs Lines 324 to 325 in 714be93
Until now. Seems like we could use this to start to model BITS for |
@jgm After a thorough look, I believe it is possible to have a minimal BITS-enabled reader purely by adding a few lines to the JATS reader. I think this is the simplest way to do it. The main point is to make use of the existing I created a first draft for you to have an idea of what I mean here: #9016 This should already produce a decent AST from a BITS document, but it is by no means definitive. I would add a few more lines to account for a few additional BITS-only elements, via alternative treatment relying on the What do you think? |
Agreed, this plan make sense. |
Update: I have written a new clean-slate PR here. This incorporates the minimal required BITS behaviours for an equivalent BITS reader (equivalent coverage to JATS, same limitations, etc). This should still be consistent with older JATS behaviours, but cannot guarantee that until I have finished the unit tests I'd like (hence still marking as draft). I will try and complete those this week, and then I think this should be in good shape for a first review. |
@jgm All Unit tests finished and passed. See my latest comment on the PR. |
New BITS reader
Support for BITS XML, the book extension of JATS XML.
As part of an academic project, I am exploring ways to develop a tool to transform BITS XML into DOCX. This is relevant for the use case of academic book publishing, where XML archives of previous editions need to be transformed into DOCX for authors to work in the new edition. This is a recurrent scenario, and academic publishers spend today considerable time and money in third party conversions that could easily and efficiently be handled in house.
Since this is a scheduled project with time and deadlines assigned to it (full or partial completion by September 2022 at the latest), I will develop a version of a full or partial tool.
As per recent discussion (https://groups.google.com/g/pandoc-discuss/c/E5J9-qevSEk) this seems to be a relevant and welcome addition to Pandoc.
Alternatives
I have explored OxGarage and transpect as well, and also the option of a completely standalone java tool developed from scratch. A pandoc BITS reader (and later a Pandoc BITS writer) seem to be the easiest and straightforward solution as of now.
The text was updated successfully, but these errors were encountered: