-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Title/Date styles are lost when writing docx #1933
Comments
It works as expected when I tested it in a slightly different setting: MS Word 2011 on Mac (same Pandoc). The styles were present in both the ref and the test document, and you can see the change in title style (changed in the ref doc). Also I can see that Pandoc is writing the fields with expected styles in the code. |
Works as expected under Win 7 (32bit), MS Word 2013 (as yours) with 1.13.2. |
Could you post or link to the `reference.docx` you're using?
Unfortunately I don't have MS Word 2013 to test with.
Also, can you clarify: when you created the reference.docx, did
you edit it at all, or did you just open and save the default
reference.docx using MS Word 2013?
|
I have tested it a couple of times (with modifying the ref.docx) under Win7/MS Office 2013 and I cannot replicate the problem.
|
thank you for your checking. i uploaded files of actual and expected here ([3410480.zip をダウンロードします。] -> <ダウンロードする | click here to start download. >)
It appears when I used just open and saved the default reference.docx file by MS Word 2013. |
I checked both the actual and expected folders (under Windows 7 with Office 2013). You're right - when using the reference file in the 'actual' folder, the resulting docx doesn't have the styles assigned to the title and author lines as expected. But, strangely enough, when I saved that same reference file under a different name and used it as reference, the result was correct. Could you please do the same and let me know what happens? Also, the result is correct when using the test.docx from 'expected' folder as the reference file. So could you modify the test.docx and used it the reference and let me know? Since the Pandoc output versions and the files I saved from Word seem to work as expected (as reference files), I'm wondering whether the files you saved in Word are somehow different due to regional edition or locale settings. |
case A' I tried that open reference.docx in 'actual' directory by Word, and save it. then I execute Pandoc.
case E' I tried that open correct test.docx in 'expected' directory by Word, and save it. then I execute Pandoc using the test.docx as reference-docx. I tried them here and both of them lost styles of title and date.
I think my Word writes a wrong file. My locale setting is Japanese generally. |
Saw a similar issue here - may worth trying: https://askleo.com/why_does_my_microsoft_word_document_display_differently_on_different_computers/
|
This looks very much like good old word internationalization issue. See #1607 and #1692 for examples. Me and @jkr resolved some of the issues (heading styles and block quotes for reader), but not all of them, not by a long shot. In all honesty, OOXML spec wildly diverges from what internationalized Word actually does. So, @nenono, do you happen to use internationalized version of Word? |
@lierdakil Didn't realize quite a bit of work was done on this. I don't currently have access to international version of Word 2013; I'm curious to know what happens when one attaches a template in Word itself, overwriting the styles. f. ex.
|
I almost sent a post on why it will not work, and then realized that it just might. Hold on, let me test. |
Yes, attaching a template works (at least for document title style, which is mangled). But it feels like jumping through hoops. P.S. Word 2013, Russian version. |
@lierdakil Thanks for testing; I wasn't suggesting any permanent workarounds :) I'm not at all familiar with the docx writer; I'm wondering whether a template based generation can be an option (https://worddocgenerator.codeplex.com). |
Short version: No, not unless we want to lock docx output to Mac and Windows and require users to have Word installed. Long version: Pandoc actually constructs document.xml based on OOXML specs and Word conventions. Problem with international Word versions is that it mangles some A valid option is to somehow guess which styles are mangled and parse styles.xml for those. #1716 implements this for headings (as something very common), but that's it. I could probably add the same for title and date, that's not rocket science, but I lack a comprehensive list of what styles are mangled and how to guess which is which, so this will be a slow process. Having a US-Word-saved reference.docx and Normal.dotm to boot would help a little, but I have no access to US version of Word. |
@lierdakil I also don't want to lock down docx output to having MS Word installed. Looking at the following helped me to understand what's happening better: So it looks like a good solution would be to use localized style names for those styles that match Word's built-in styles: f.ex. Title and Subtitle styles match the built-in ones, hence they are localized; whereas Author style is not among the built-in ones so it will remain unchanged. @jgm @mpickering The built-in names can be viewed using this tool. I can also post a doc showing those if needed. The macro from the link above can then be modified to list just the Pandoc's styles to assist with creation of mapping file. |
+++ nkalvi [Feb 21 15 10:18 ]:
Sure, this seems a decent approach to me.
|
@jkr is the one who can comment best. I've not been following very closely. |
It suddenly dawned on me, but something like this lierdakil@5cdd117 will probably work. It's a proof-of-concept, so there are a couple bugs that need catching, but it does indeed work for headings, title and date styles. |
@lierdakil Pardon me, as I'm not familiar with Haskell. But I'm wondering whether it would be easier to replace all the 'hard coded' names to constants/variables and assign the values (or defaults) while initializing the options? |
@nkalvi btw, tool you linked does indeed show localized style names. Only problem, it shows style names as they are shown in Word GUI, which has little resemblance of what IDs are used in actual xml. |
If someone needs an explanation on why lierdakil/pandoc@5cdd117 would work, here it is: By random convention, Word seems to keep Of course, this is something I slapped together in an afternoon, so there has to be a better implementation. But I think concept itself is solid(-ish). |
How should I check that? |
Basically if your word gui is not US-English.
|
No need to check, also in the screen capture you posted earlier shows that the menu etc. are not in English. Besides, the styles.xml (which can be seen when you unpack the docx), shows the difference in naming. This localization of the style names happens when you edit and save the doc in international edition of Word. In your example, the title style has the id 'a3' instead of 'title'; so with the pandoc's output (with this as reference) opened in Word, Word will not find the style and set it to 'normal'. |
Thanks. I understand. |
This should be fixed by #1968, which is merged. |
I tried to export Word docx with custom
reference.docx
modified by Microsoft Word 2013. I found that styles of Title and Date are lost in the output. It seems a bug.steps to reproduce
I created a markdown text like this. (test.md)
and I created reference.docx by the following steps.
pandoc --print-default-data-file reference.docx > reference.docx
save as
menu) and close the file.then I exported docx by the command that :
pandoc test.md --reference-docx=reference.docx -o test.docx
.actual results
open the created test.docx file in MSWord, I found the following.
Some Document Title
is the Normal style.Some Document Author
is the Author style.2015-02-10
is the Normal style.expected results
using original version of reference.docx(created at the step 1), I verified the following.
Some Document Title
is the Title styleSome Document Author
is the Author style.2015-02-10
is the Date style.environment
The text was updated successfully, but these errors were encountered: