Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagebreak filter does not work when converting latex to to docx #152

Closed
joelnitta opened this issue Jan 8, 2021 · 6 comments
Closed

pagebreak filter does not work when converting latex to to docx #152

joelnitta opened this issue Jan 8, 2021 · 6 comments
Labels

Comments

@joelnitta
Copy link

I tried converting the sample md file using the pagebreak filter.

pandoc --to docx --lua-filter=pagebreak.lua -o sample.docx sample.md

The resulting docx file has no page break:

image

Tried with both pandoc 2.7.3 and 2.11.3.2.

joelnitta added a commit to joelnitta/moorea_filmies that referenced this issue Jan 9, 2021
@tarleb
Copy link
Member

tarleb commented Jan 9, 2021

This is the expected output. Compare with the sample.md file and scroll down in the created docx: there is a second page (and a third).

Please reopen if there really isn't more content, then also include the software used to open the file.

@tarleb tarleb closed this as completed Jan 9, 2021
@tarleb tarleb added the invalid label Jan 9, 2021
@joelnitta joelnitta changed the title pagebreak filter does not work when outputting to docx pagebreak filter does not work when converting latex to to docx Jan 9, 2021
@joelnitta
Copy link
Author

D'oh! You're totally correct, sorry for not actually scrolling through the output. My bad.

I realized the problem is not converting markdown to docx but latex to docx:

Here is an example latex file sample_latex.tex:

\documentclass{article}
\begin{document}
  
this is the first page

\newpage

and this is the second page

\end{document}

When I run pandoc --to docx --lua-filter=pagebreak.lua -o sample_latex.docx sample_latex.tex,

this is the output in MS Word:

image

MS Word v 16.44 on Mac OSX 10.15.7
pandoc v 2.11.3.2

@tarleb
Copy link
Member

tarleb commented Jan 9, 2021

Ah yes, that makes sense. Pandoc drops unknown TeX commands when reading LaTeX. You can still get the expected result by adding --from=latex+raw_tex.

@joelnitta
Copy link
Author

Thanks! That fixed it.

If you don't mind another question, I'm still a little confused though... what is the difference between --from=latex and --from=latex+raw_tex? I don't understand why one would need to "extend" latex since \newpage is already standard latex.

joelnitta added a commit to joelnitta/moorea_filmies that referenced this issue Jan 9, 2021
@jgm
Copy link
Member

jgm commented Jan 10, 2021

Yes it's standard latex, but because \newpage doesn't correspond to anything in the pandoc AST, pandoc can ignore it or pass it through as raw tex. It only does the latter if you explicitly enable raw_tex, which isn't on by default for latex input.

@joelnitta
Copy link
Author

I see, thanks for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants