-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docx reader produces level-0 headers #3830
Comments
The attached input docx file demonstrates the behavior when writing a reStructuredText file. Note that the docx file includes as-yet empty lists to eventually show page numbers for figures and tables. |
Sure -- I'll try to take a look this afternoon. |
Okay -- so the issue is that this file has a class named "Heading0", in addition to the more normal "Heading1". And "Heading0" is at a lower organizational level than "Heading1". So there are two ways we could deal with this:
At first I had been inclined toward option 1, but I think I talked myself into option 2. @jgm? |
I'd be okay with option 2.
+++ Jesse Rosenthal [Aug 03 17 20:26 ]:
… Okay -- so the issue is that this file has a class named "Heading0", in
addition to the more normal "Heading1". And "Heading0" is at a lower
organizational level than "Heading1". So there are two ways we could
deal with this:
1. Use the lowest level of headings as level-1 header (so in this case
"Heading0" becomes a level 1 header). This would not know what to
do if you named your classes "HeadingA" and "HeadingB", but it
would do the right thing in this case.
2. Just bump {n<1}-level headers up to level 1 and leave the rest the
same. This would mess up the structure, but it would avoid adding
an extra layer of code for what seems like a very rare case (it's
been about 3 years before we saw this.)
At first I had been inclined toward option 1, but I think I talked
myself into option 2. ***@***.***?
—
You are receiving this because you were mentioned.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.
References
1. https://github.com/jgm
2. #3830 (comment)
3. https://github.com/notifications/unsubscribe-auth/AAAL5AtN_Lv3oEuYGwXGt-By-vt86ZkEks5sUiz4gaJpZM4OsqbN
|
Here is a snippet of the parse tree produced from a docx file by pandoc 1.19.2.1:
(Still waiting for a docx file to use for testing.)
This causes a problem when rendering to RST, because the RST writer has:
and when level == 0, we get a runtime error for using
!!
with a negative index.The Markdown writer doesn't crash, but its output is not ideal either; the Header 0 is not rendered as a header at all.
The readers should never produce Header n with n < 1. Note: in some of the writers, we use Header 0 internally to represent chapters, when
--top-level-division=chapter
is used. (And we use -1 to represent apart
!) This is a bit of a hack, and maybe we should code differently. In any case, the readers should never produce Header 0.@jkr, can you see why the docx reader might produce Header 0, and can you see how to fix?
Linked pandoc-discuss thread.
The text was updated successfully, but these errors were encountered: