Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown writer breaks math in multiline table #1911

Closed
bjpe opened this issue Jan 29, 2015 · 11 comments
Closed

Markdown writer breaks math in multiline table #1911

bjpe opened this issue Jan 29, 2015 · 11 comments
Labels

Comments

@bjpe
Copy link
Contributor

bjpe commented Jan 29, 2015

Hi, I discovered that the Markdown writer breaks my Math code in a multiline table cell. For instance, consider the following markdown document:

-------------------------------------------------------------------------------------------------------------------------------------
Anweisung             Maschinencode          Con                    Pos              Kommentar                       Schritte
--------------------  ---------------------  ---------------------  ---------------  ------------------------------  ----------------
`x <- g`              $R \leftarrow M[M_g]$  $R \mapsto \{x\}$      $x \mapsto R$    `getreg()` $= R$ für `x`        b2/getreg2
-------------------------------------------------------------------------------------------------------------------------------------

When I convert this document using

pandoc -t markdown+multiline_tables document.markdown

I get the following result:

  --------------------------------------------------------------------------
  Anweisung   Maschinencod Con          Pos       Kommentar         Schritte
              e                                                     
  ----------- ------------ ------------ --------- ----------------- --------
  `x <- g`    $R \leftarro $R \mapsto \ $x \mapst `getreg()` $= R$  b2/getre
              w M[M_g]$    {x\}$        o R$      für `x`           g2
  --------------------------------------------------------------------------

Here, the math macro \leftarrow is wrapped, which causes a error when converting the result document to HTML+Mathjax or PDF.

I expected the Markdown writer to do no line wrapping, but a smarter wrapping strategy may also work. My pandoc version is

pandoc 1.13.2
Compiled with texmath 0.8.0.1, highlighting-kate 0.5.11.1.
@jgm
Copy link
Owner

jgm commented Jan 29, 2015

It looks as if your source table is considerably wider than
the default. Try using --columns to increase the width.
Pandoc keeps the relative widths of the columns the same as
in the source, so if you shrink the whole table, things are
just not going to fit.

+++ bjpe [Jan 29 15 06:29 ]:

Hi, I discovered that the Markdown writer breaks my Math code in a
multiline table cell. For instance, consider the following markdown

document:


Anweisung Maschinencode Con Pos
Kommentar Schritte



x <- g $R \leftarrow M[M_g]$ $R \mapsto {x}$ $x \mapsto R

$ getreg() $= R$ für x b2/getreg2


When I convert this document using
pandoc -t markdown+multiline_tables document.markdown

I get the following result:


Anweisung Maschinencod Con Pos Kommentar Schritte
e


x <- g $R \leftarro $R \mapsto \ $x \mapst getreg() $= R$ b2/getre
w M[M_g]$ {x}$ o R$ für x g2


Here, the math macro \leftarrow is wrapped, which causes a error when
converting the result document to HTML+Mathjax or PDF.

I expected the Markdown writer to do no line wrapping, but a smarter
wrapping strategy may also work. My pandoc version is
pandoc 1.13.2
Compiled with texmath 0.8.0.1, highlighting-kate 0.5.11.1.


Reply to this email directly or [1]view it on GitHub.

References

  1. Markdown writer breaks math in multiline table #1911

@bjpe
Copy link
Contributor Author

bjpe commented Jan 30, 2015

Thanks, that works for me. However, I realized two other things:

  1. When I add then option --no-wrap, text paragraphs are no longer wrapped but tables still are.
  2. When I specify a very small column size, pandoc goes into an infinite loop.

For the second, you may try the above input and the command timeout 5 pandoc -t markdown --columns=4 document.markdown (the timeout is for keeping your system alive). On my system, pandoc used all available memory before I could kill it.

@mpickering
Copy link
Collaborator

Wrt the high memory usage - this might be the same as #1785 ?

@bjpe
Copy link
Contributor Author

bjpe commented Jan 30, 2015

I'm not sure, I tried to create a profiling, but even on a 16-core machine with 32 GiB of RAM it finally failed:

$ time dist/build/pandoc/pandoc +RTS -p -RTS -t markdown --columns=4 document.markdown
Killed

real    3m57.644s
user    2m11.736s
sys 0m28.190s

Unfortunately, it didn't output anything to the profiling file. Maybe tracing down the control flow in the code may be a better option.

@bjpe
Copy link
Contributor Author

bjpe commented Feb 2, 2015

Hi, using some trace debugging I finally found the source of error. The problem occurs for the combination of a narrow table column and a narrow output width. I could also reproduce the problem using the table

------------------------------------------------
Header 1   Header 2     Header 3     Header 4
---------  -----------  -----------  -----------
bla bla    bla bla bla  bla bla bla  bla bla bla
------------------------------------------------

and the option --columns=4. When rendering to Markdown, pandoc calls Text.Pandoc.Writers.Markdown.pandocTable, where the parameter widths equals [0.22,0.26,0.26,0.26]. Then, the output width per column in number of characters is computed as widthsInChars, giving [0,1,1,1]. Note the leading zero! Using these values, lblock is called, which in turn calls Text.Pandoc.Pretty.chop with n = 0. This is the source of the infinite loop, because chop 0 cs will in turn call take 0 xs : chop 0 (drop 0 xs ++ ys), thus, chop 0 cs again. I could manually fix this by adding an equation

chop n cs | n <= 0 = chop 1 cs

but you may even have a better solution.

@jgm
Copy link
Owner

jgm commented Feb 2, 2015

Thanks for taking the time to track this down, @bjpe. I don't have a better idea than fixing chop in the way you describe.

@bjpe
Copy link
Contributor Author

bjpe commented Feb 2, 2015

I think the question is whether chop should be changed to also accept n <= 0, or to throw an error in chop for this case if cs is non-empty, and adjust pandocTable to calculate a minimal column width of 1. Both are reasonable, so it ahould be a question of taste between explicit and implicit invariant. Either way, I'd be happy to provide a pull request.

@jgm
Copy link
Owner

jgm commented Feb 2, 2015

It looks like chop is only used by block in Text.Pandoc.Pretty.
It's an auxiliary function, not an exported function.

So rather than adjusting chop (in either of the ways you suggest),
it's probably better to adjust block.

I can think of one more option in addition to your two:

  1. block filler width | width < 1 = const empty
  2. block filler width | width < 1 = block filler width 1
  3. block filler width | width < 1 = error "block called with width < 1"

The first option would just make blocks of width 0 disappear. Of course
this is not going to be what's intended, but neither is a block of width
1!

Probably the best approach would be 3, but this would require
adjustments not just to the Markdown writer, but to other writers that
calculate table cell widths (I can't recall which those are at the
moment, so this would take some looking).

One might also consider whether the formula for calculating block widths
in these tables should be adjusted to ensure that words are kept
together; this would mean that a cell of width 2 with the contents "the
cat is on the mat" would get a block of width 3.

@bjpe
Copy link
Contributor Author

bjpe commented Feb 3, 2015

Please take a look at https://github.com/bjpe/pandoc/tree/fix-aligned-tables where I compute the minimal width for a column to avoid line breaks inside single words. By the way, I also removed the line wrapping in tables when --no-wrap is given. Is this what you intended?

Note that I also added an error message for block, but with an additional check since the Markdown writer contains lblock 0 empty calls. Is this reasonable?

In addition, the pipeTable function in the Markdown writer, the Haddock writer and the RST writer share the same code, so they should be updated, too.

@mpickering mpickering added the bug label Feb 9, 2015
@jgm jgm closed this as completed in 2761fec Nov 19, 2016
@jgm
Copy link
Owner

jgm commented Nov 19, 2016

Better late than never! I merged your code @bjpe.

@bjpe
Copy link
Contributor Author

bjpe commented Nov 23, 2016

Thanks a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants