Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect JATS XML for header markup when a template is used (Pandoc versions 3.1.4+) #9092

Closed
gaj23 opened this issue Sep 21, 2023 · 1 comment
Labels

Comments

@gaj23
Copy link

gaj23 commented Sep 21, 2023

Background

When upgrading our applications pandoc version from 3.1.3 to the newest version (3.1.8), we discovered the resulting JATS XML failed to group certain elements in a template as done previously in the past. Specifically, when multiple headers are used in a template, the JATS XML fails to compile as expected.

How to Repeat the behavior:

  1. create a file called markdown.md & leave it blank
  2. create a template.xml with the contents: <abstract>$abstract$</abstract>
  3. create a meta.json file with the contents:
{
    "abstract": "# Background\nerat velit scelerisque in dictum non consectetur a erat nam at lectus urna duis convallis convallis tellus id interdum velit laoreet id donec ultrices tincidunt arcu non sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus\n\n# Methods\nnon sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus orci ac auctor augue mauris augue neque gravida in fermentum et sollicitudin ac orci phasellus egestas tellus rutrum tellus pellentesque eu tincidunt tortor aliquam nulla facilisi\n"
}
  1. While on Pandoc version 3.1.3* run the following:
    pandoc --from markdown-space_in_atx_header-auto_identifiers --to jats -o previous_version.xml --template template.xml --wrap none --metadata-file meta.json markdown.md
  2. Make note of the resulting previous_version.xml file
    Pandoc v3.1.3 xml
<abstract>
    <sec>
    	<title>Background</title>
     	<p>erat velit scelerisque in dictum non consectetur a erat nam at lectus urna duis convallis convallis tellus id interdum velit laoreet id donec ultrices tincidunt arcu non sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus
	    </p>
    </sec>
    <sec>
		<title>Methods</title>
		<p>non sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus orci ac auctor augue mauris augue neque gravida in fermentum et sollicitudin ac orci phasellus egestas tellus rutrum tellus pellentesque eu tincidunt tortor aliquam nulla facilisi</p>
	</sec>
</abstract>
  1. While on Pandoc version 3.1.4 (or any new version) run the following:
    pandoc --from markdown-space_in_atx_header-auto_identifiers --to jats -o newer_versions.xml --template template.xml --wrap none --metadata-file meta.json markdown.md
  2. Make note of the resulting newer_versions.xml file

Pandoc v 3.1.4 (including all newer releases) xml

<abstract>
	<title>Background</title>
 	<p>erat velit scelerisque in dictum non consectetur a erat nam at lectus urna duis convallis convallis tellus id interdum velit laoreet id donec ultrices tincidunt arcu non sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus
    </p>
	<title>Methods</title>
	<p>non sodales neque sodales ut etiam sit amet nisl purus in mollis nunc sed id semper risus in hendrerit gravida rutrum quisque non tellus orci ac auctor augue mauris augue neque gravida in fermentum et sollicitudin ac orci phasellus egestas tellus rutrum tellus pellentesque eu tincidunt tortor aliquam nulla facilisi</p>
</abstract>

Problem

After combing through the changelog, I saw no indications that there was an intentional change in this area for this behavior to occur.

Additionally, when there's only one header in the template, the resulting xml behaves as expected between 3.1.3 and 3.1.4+ versions. If we translate the markdown to JATS XML without the use of a template, this issue does not occur and the headers are sectioned off as expected.

Thus, I've concluded this is a bug.

Solution?

I'm not versed in haskel, but comparing the changes between 3.1.3 and 3.1.4, perhaps the changes made to src/Text/Pandoc/Writers/JATS.hs in this area is what introduced this issue?

@gaj23 gaj23 added the bug label Sep 21, 2023
@jgm
Copy link
Owner

jgm commented Sep 21, 2023

I suspect it's 4f44058
and in particular this change:

  metadata <- metaToContext opts
-                fromBlocks
+               (blocksToJATS opts)
                 (fmap chomp . inlinesToJATS opts)
                 meta

The old fromBlocks not only converted blocksToJATS, it also used makeSections to impose section structure. That accounts for the change you're seeing.

@jgm jgm closed this as completed in a19cd39 Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants