Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JATS reader - add support for new table features #8408

Open
Opussci opened this issue Nov 1, 2022 · 11 comments
Open

JATS reader - add support for new table features #8408

Opussci opened this issue Nov 1, 2022 · 11 comments

Comments

@Opussci
Copy link

Opussci commented Nov 1, 2022

Hi @tarleb The table with colspan and rowspan is not transformed correctly from JATS to HTML or Docx. Maybe the JATS reader needs to be adjusted to support these table features.

@jgm
Copy link
Owner

jgm commented Nov 1, 2022

Please give pandoc version, exact command line used, and a small input sample sufficient to reproduce the issue.
If you're not using the latest version, please try with that or on https://pandoc.org/try.

@coryschires
Copy link

Hi! I'd like to revive this issue, as I've been able to recreate the problem using the latest version of Pandoc.

Problem

Table styles – specifically colspan and rowspan – are not retained when using the JATS reader.

Goal

I'd like to retain at least colspan and rowspan as I know there is increasingly support for this feature in various reader and writers.

Depending on the difficulty, we may be willing to make a pull request to add this functionality. So, if y'all are not interested in coding this feature in the near future, I'd appreciate a little nudge in right direction, so we can take a crack at it ourselves.

Thanks!


Pandoc Version

pandoc --version
pandoc 3.1
Features: +server +lua
Scripting engine: Lua 5.4

CLI command to reproduce

pandoc sample.xml -f jats -t html -o sample.html

sample.xml

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "https://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1" xml:lang="en">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Survey Practice</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>How Weighting by Past Vote Can Improve Estimates of Voting Intentions</article-title>
      </title-group>
    </article-meta>
  </front>
  <body>
    <table-wrap id="attachment-147394">
      <table>
        <thead>
          <tr>
            <td>
              <bold>Benchmark</bold>
            </td>
            <td>
              <bold>Population %</bold>
            </td>
            <td>
              <bold>Source</bold>
            </td>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td colspan="3">
              <bold>Age</bold>
            </td>
          </tr>
          <tr>
            <td>18–24 years </td>
            <td>12.2 </td>
            <td rowspan="6">ABS Estimated Residential Population (ERP), March 2019 adjustment </td>
          </tr>
          <tr>
            <td>25–34 years </td>
            <td>19.3 </td>
          </tr>
          <tr>
            <td>35–44 years </td>
            <td>17.1 </td>
          </tr>
          <tr>
            <td>45–54 years </td>
            <td>16.5 </td>
          </tr>
          <tr>
            <td>55–64 years </td>
            <td>14.9 </td>
          </tr>
          <tr>
            <td>65 or more years </td>
            <td>20.1 </td>
          </tr>
          <tr>
            <td colspan="3">
              <bold>Gender</bold>
            </td>
          </tr>
          <tr>
            <td>Female </td>
            <td>50.9 </td>
            <td rowspan="2">ABS ERP, March 2019 adjustment </td>
          </tr>
          <tr>
            <td>Male </td>
            <td>49.1 </td>
          </tr>
          <tr>
            <td colspan="3">Education </td>
          </tr>
          <tr>
            <td>Bachelor and above </td>
            <td>25.5 </td>
            <td rowspan="2">ABS Census 2016 with ERP March 2019 adjustment </td>
          </tr>
          <tr>
            <td>Below Bachelor </td>
            <td>74.5 </td>
          </tr>
          <tr>
            <td colspan="3">
              <bold>Age by Education</bold>
            </td>
          </tr>
          <tr>
            <td>18–24 </td>
            <td>12.2 </td>
            <td rowspan="11">ABS Census 2016 with ERP March 2019 adjustment </td>
          </tr>
          <tr>
            <td>25–34  </td>
            <td>7.4 </td>
          </tr>
          <tr>
            <td>25–34 Below Bachelor </td>
            <td>11.8 </td>
          </tr>
          <tr>
            <td>35–44 Bachelor and above </td>
            <td>6.2 </td>
          </tr>
          <tr>
            <td>35–44 Below Bachelor </td>
            <td>10.9 </td>
          </tr>
          <tr>
            <td>45–⁠54 Bachelor and above</td>
            <td>4.3 </td>
          </tr>
          <tr>
            <td>45–54 Below Bachelor </td>
            <td>12.2 </td>
          </tr>
          <tr>
            <td>55–64 Bachelor and above </td>
            <td>3.3 </td>
          </tr>
          <tr>
            <td>55–64 Below Bachelor </td>
            <td>11.6 </td>
          </tr>
          <tr>
            <td>65+ Bachelor and above </td>
            <td>2.7 </td>
          </tr>
          <tr>
            <td>65+ Below Bachelor </td>
            <td>17.4 </td>
          </tr>
          <tr>
            <td colspan="2">
              <bold>Geography</bold>
            </td>
            <td rowspan="16">ABS Census 2016 with ERP March 2019 adjustment </td>
          </tr>
          <tr>
            <td>Greater Sydney </td>
            <td>20.7 </td>
          </tr>
          <tr>
            <td>Rest of NSW </td>
            <td>11.3 </td>
          </tr>
          <tr>
            <td>Greater Melbourne </td>
            <td>19.8 </td>
          </tr>
          <tr>
            <td>Rest of VIC </td>
            <td>6.3 </td>
          </tr>
          <tr>
            <td>Greater Brisbane </td>
            <td>9.6 </td>
          </tr>
          <tr>
            <td>Rest of QLD </td>
            <td>10.2 </td>
          </tr>
          <tr>
            <td>Greater Adelaide </td>
            <td>5.5</td>
          </tr>
          <tr>
            <td>Rest of SA </td>
            <td>1.6</td>
          </tr>
          <tr>
            <td>Greater Perth </td>
            <td>8.1</td>
          </tr>
          <tr>
            <td>Rest of WA </td>
            <td>2.2</td>
          </tr>
          <tr>
            <td>Greater Hobart </td>
            <td>0.9</td>
          </tr>
          <tr>
            <td>Rest of TAS </td>
            <td>1.2</td>
          </tr>
          <tr>
            <td>Greater Darwin </td>
            <td>0.6</td>
          </tr>
          <tr>
            <td>Rest of NT </td>
            <td>0.4</td>
          </tr>
          <tr>
            <td>Australian Capital Territory </td>
            <td>1.7</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
  </body>
</article>

@coryschires
Copy link

FYI: I reposted this question to the Google Group: https://groups.google.com/g/pandoc-discuss/c/rMee9YDlBkI

@jgm
Copy link
Owner

jgm commented Mar 21, 2023

Smaller test case:

<table>
  <thead>
    <tr>
      <td>
        <bold>Benchmark</bold>
      </td>
      <td>
        <bold>Population %</bold>
      </td>
      <td>
        <bold>Source</bold>
      </td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td colspan="3">
        <bold>Age</bold>
      </td>
    </tr>
    <tr>
      <td>18–24 years </td>
      <td>12.2 </td>
      <td rowspan="2">ABS Estimated Residential Population (ERP), March 2019 adjustment </td>
    </tr>
    <tr>
      <td>25–34 years </td>
      <td>19.3 </td>
    </tr>
  </tbody>
</table>

@jgm
Copy link
Owner

jgm commented Mar 21, 2023

For the fix see l. 301 of src/Text/Pandoc/Readers/JATS.hs.
We have

                      let toRow = Row nullAttr . map simpleCell

where simpleCell is from Text.Pandoc.Builder in pandoc-types. Its definition is here: https://hackage.haskell.org/package/pandoc-types-1.23/docs/src/Text.Pandoc.Builder.html#simpleCell.

So simpleCell hardcodes a rowspan and colspan of 1. Instead of this, you'll want to use cell and add the appropriate alignment, rowspan, and colspan as defined by the cell's XML attributes. This will presumably require changes to parseRow too and maybe some other things.

@tarleb
Copy link
Collaborator

tarleb commented Mar 21, 2023

The JATS table model is close to that used by HTML; it'd probably be possible to re-use large parts of the HTML table parser.

@jgm
Copy link
Owner

jgm commented Mar 21, 2023

I think the changes should actually be quite easy; most of the structure needed for creating tables is already there in the JATS reader. The only thing that needs changing is to make cell creation sensitive to attributes.

@jgm
Copy link
Owner

jgm commented Mar 21, 2023

It's something I could do, but if you want to have a go at it @coryschires we're always looking to get more contributors to pandoc!

@noahmalmed
Copy link
Contributor

Heyyo! I'm one of Cory's devs and I think I'm going to take a crack at this! Thanks for the feedback/direction @jgm and @tarleb

@noahmalmed
Copy link
Contributor

@jgm @tarleb

Well, I took a crack at it: #8724

I think I'm most of the way there, but I believe I need a bit more guidance. Would appreciate some help when you have a moment!

Thanks!

@noahmalmed
Copy link
Contributor

Here's the finished PR btw:

#8726

jgm pushed a commit that referenced this issue Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants