Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title/Date styles are lost when writing docx #1933

Closed
nenono opened this issue Feb 10, 2015 · 26 comments
Closed

Title/Date styles are lost when writing docx #1933

nenono opened this issue Feb 10, 2015 · 26 comments

Comments

@nenono
Copy link

nenono commented Feb 10, 2015

I tried to export Word docx with custom reference.docx modified by Microsoft Word 2013. I found that styles of Title and Date are lost in the output. It seems a bug.

steps to reproduce

I created a markdown text like this. (test.md)

% Some Document Title
% Some Document Author
% 2015-02-10

body

and I created reference.docx by the following steps.

  1. execute command : pandoc --print-default-data-file reference.docx > reference.docx
  2. open reference.docx by MSWord.
  3. overwrite(by save as menu) and close the file.

then I exported docx by the command that : pandoc test.md --reference-docx=reference.docx -o test.docx.

actual results

open the created test.docx file in MSWord, I found the following.

  • Some Document Title is the Normal style.
  • Some Document Author is the Author style.
  • 2015-02-10 is the Normal style.

expected results

using original version of reference.docx(created at the step 1), I verified the following.

  • Some Document Title is the Title style
  • Some Document Author is the Author style.
  • 2015-02-10 is the Date style.

environment

  • Pandoc 1.13.2
  • Microsoft Word 2013 (32bit, Microsoft Office Proffesional Plus 2013)
  • Windows 7 (64bit)
@nkalvi
Copy link

nkalvi commented Feb 10, 2015

It works as expected when I tested it in a slightly different setting: MS Word 2011 on Mac (same Pandoc). The styles were present in both the ref and the test document, and you can see the change in title style (changed in the ref doc). Also I can see that Pandoc is writing the fields with expected styles in the code.

pandoc-docx test

@nkalvi
Copy link

nkalvi commented Feb 10, 2015

Works as expected under Win 7 (32bit), MS Word 2013 (as yours) with 1.13.2.

@jgm
Copy link
Owner

jgm commented Feb 10, 2015 via email

@nkalvi
Copy link

nkalvi commented Feb 10, 2015

I have tested it a couple of times (with modifying the ref.docx) under Win7/MS Office 2013 and I cannot replicate the problem.
I’ll be more than happy to take a look at the ref.docx used once we get it. I have to MS Office 2011, Office 2013 and Office 2014.

On Feb 10, 2015, at 11:35 AM, John MacFarlane notifications@github.com wrote:

Could you post or link to the reference.docx you're using?
Unfortunately I don't have MS Word 2013 to test with.

Also, can you clarify: when you created the reference.docx, did
you edit it at all, or did you just open and save the default
reference.docx using MS Word 2013?

Reply to this email directly or view it on GitHub #1933 (comment).

@nenono
Copy link
Author

nenono commented Feb 12, 2015

thank you for your checking.

i uploaded files of actual and expected here ([3410480.zip をダウンロードします。] -> <ダウンロードする | click here to start download. >)

Also, can you clarify: when you created the reference.docx, did
you edit it at all, or did you just open and save the default
reference.docx using MS Word 2013?

It appears when I used just open and saved the default reference.docx file by MS Word 2013.

@nkalvi
Copy link

nkalvi commented Feb 12, 2015

I checked both the actual and expected folders (under Windows 7 with Office 2013).

You're right - when using the reference file in the 'actual' folder, the resulting docx doesn't have the styles assigned to the title and author lines as expected.

But, strangely enough, when I saved that same reference file under a different name and used it as reference, the result was correct. Could you please do the same and let me know what happens?

Also, the result is correct when using the test.docx from 'expected' folder as the reference file. So could you modify the test.docx and used it the reference and let me know?

Since the Pandoc output versions and the files I saved from Word seem to work as expected (as reference files), I'm wondering whether the files you saved in Word are somehow different due to regional edition or locale settings.

@nenono
Copy link
Author

nenono commented Feb 12, 2015

But, strangely enough, when I saved that same reference file under a different name and used it as reference, the result was correct. Could you please do the same and let me know what happens?

case A' I tried that open reference.docx in 'actual' directory by Word, and save it. then I execute Pandoc.


Also, the result is correct when using the test.docx from 'expected' folder as the reference file. So could you modify the test.docx and used it the reference and let me know?

case E' I tried that open correct test.docx in 'expected' directory by Word, and save it. then I execute Pandoc using the test.docx as reference-docx.


I tried them here and both of them lost styles of title and date.

I'm wondering whether the files you saved in Word are somehow different due to regional edition or locale settings.

I think my Word writes a wrong file. My locale setting is Japanese generally.

@nkalvi
Copy link

nkalvi commented Feb 12, 2015

Saw a similar issue here - may worth trying:

https://askleo.com/why_does_my_microsoft_word_document_display_differently_on_different_computers/

February 22, 2012 at 4:32 am
“I had a similar issue, one of our clients PC suddenly decided to go a bit weird and display all the Word docs they usually use differently to everyone else. It also decided to screw up some of the Outlook fonts too, but not as bad as it screwed Word, which is odd.
Solution in the end was to copy fonts over from a good PC and then for the hell of it go into regional settings, and then to the tab with roman, japanese etc on. from here tick the tickbox at the bottom to reapply language (and i was hoping font size and regularity too). Did a restart after both those things and worked a charm!
Think I got a bit lucky but worth a try if you’ve tried nearly everything else :)
Posted by: Neil at June 7, 2010 2:06 PM”
Just wanted to post my thanks for this, had a verry similar issue at working using a clients custom fonts, installed them to a few machines. Same document, connected to same printers and same word settings, a number of extra pages would randomly been added to any documents using the fonts but revert back when moved to a good machine. Been searching for a week and done the same as above seems to have solved it!!!

@lierdakil
Copy link
Contributor

This looks very much like good old word internationalization issue. See #1607 and #1692 for examples. Me and @jkr resolved some of the issues (heading styles and block quotes for reader), but not all of them, not by a long shot. In all honesty, OOXML spec wildly diverges from what internationalized Word actually does.

So, @nenono, do you happen to use internationalized version of Word?

@nkalvi
Copy link

nkalvi commented Feb 21, 2015

@lierdakil Didn't realize quite a bit of work was done on this.

I don't currently have access to international version of Word 2013; I'm curious to know what happens when one attaches a template in Word itself, overwriting the styles. f. ex.

  1. Create a reference Word file using the Pandoc and modify the styles and save as a Word template.
  2. Create a Word file using Pandoc, that have elements using the modified style(s).
  3. Open this file in Word and attach the template, choosing the option to apply the styles. See whether the styles are applied.
  1. Open the document you want to apply a template to. Click the "File" tab on the Office Ribbon and press the "Options" button.
  2. Select the "Add-Ins" option from the navigation menu on the left side of the Options dialog.
  3. Click on the "Manage" drop-down list and choose "Templates" from the list of options. Press the "Go" button to open the Templates and Add-Ins Window.
  4. Press the "Attach" button to open the Template Attachment Dialog. Select the template you want to attach to your document and press "Open" to close the dialog window.
  5. When prompted, select "Automatically Update Styles" to change the styles of your document to match the styles of your template. Click "Ok" to close the Options window.

@lierdakil
Copy link
Contributor

I almost sent a post on why it will not work, and then realized that it just might. Hold on, let me test.

@lierdakil
Copy link
Contributor

Yes, attaching a template works (at least for document title style, which is mangled). But it feels like jumping through hoops.

P.S. Word 2013, Russian version.

@nkalvi
Copy link

nkalvi commented Feb 21, 2015

@lierdakil Thanks for testing; I wasn't suggesting any permanent workarounds :)

I'm not at all familiar with the docx writer; I'm wondering whether a template based generation can be an option (https://worddocgenerator.codeplex.com).

@lierdakil
Copy link
Contributor

@nkalvi

Short version: No, not unless we want to lock docx output to Mac and Windows and require users to have Word installed.

Long version: Pandoc actually constructs document.xml based on OOXML specs and Word conventions. Problem with international Word versions is that it mangles some styleId fields in styles.xml, which Pandoc doesn't know about. I'm inclined to believe that it's an implementation bug, because wording in spec suggests that styleId should be more or less immutable, but that's something outside of our control. Anyway, since styles between document.xml and styles.xml are matched by styleId, and modified reference.docx contains mangled styles, this leads to a lot of confusion on Word side of things. Applying a template works only because both document and template have styles mangled in the same way (Word mangles styles on opening, believe it or not). So no, unless we leverage Word itself for generation, template-based generation doesn't look like a valid option.

A valid option is to somehow guess which styles are mangled and parse styles.xml for those. #1716 implements this for headings (as something very common), but that's it. I could probably add the same for title and date, that's not rocket science, but I lack a comprehensive list of what styles are mangled and how to guess which is which, so this will be a slow process. Having a US-Word-saved reference.docx and Normal.dotm to boot would help a little, but I have no access to US version of Word.

@nkalvi
Copy link

nkalvi commented Feb 21, 2015

@lierdakil I also don't want to lock down docx output to having MS Word installed.

Looking at the following helped me to understand what's happening better:
http://python-docx.readthedocs.org/en/latest/user/styles-using.html
http://www.thedoctools.com/index.php?show=mt_create_style_name_list

So it looks like a good solution would be to use localized style names for those styles that match Word's built-in styles: f.ex. Title and Subtitle styles match the built-in ones, hence they are localized; whereas Author style is not among the built-in ones so it will remain unchanged.

@jgm @mpickering
Would it be possible to add an option to specify style name mapping when using docx writer?

The built-in names can be viewed using this tool. I can also post a doc showing those if needed. The macro from the link above can then be modified to list just the Pandoc's styles to assist with creation of mapping file.

@jgm
Copy link
Owner

jgm commented Feb 21, 2015

+++ nkalvi [Feb 21 15 10:18 ]:

@jgm @mpickering
Would it be possible to add an option to specify style name mapping when using docx writer?

Sure, this seems a decent approach to me.

The built-in names can be viewed using this tool. I can also post a doc showing those if needed. The macro from the link above can then be modified to list just the Pandoc's styles to assist with creation of mapping file.


Reply to this email directly or view it on GitHub:
#1933 (comment)

@mpickering
Copy link
Collaborator

@jkr is the one who can comment best. I've not been following very closely.

@lierdakil
Copy link
Contributor

It suddenly dawned on me, but something like this lierdakil@5cdd117 will probably work. It's a proof-of-concept, so there are a couple bugs that need catching, but it does indeed work for headings, title and date styles.

@nkalvi
Copy link

nkalvi commented Feb 21, 2015

@lierdakil Pardon me, as I'm not familiar with Haskell. But I'm wondering whether it would be easier to replace all the 'hard coded' names to constants/variables and assign the values (or defaults) while initializing the options?

@lierdakil
Copy link
Contributor

@nkalvi
Where would you get said values I wonder? As user input? While that would be fine if Word used readable/meaningful style identifiers, it looks much worse when you consider that it doesn't. Anyway, it would be much more involved, spanning multiple source files and touching Pandoc core, not just docx writer.

btw, tool you linked does indeed show localized style names. Only problem, it shows style names as they are shown in Word GUI, which has little resemblance of what IDs are used in actual xml.

@lierdakil
Copy link
Contributor

If someone needs an explanation on why lierdakil/pandoc@5cdd117 would work, here it is:

By random convention, Word seems to keep w:name child of w:style nodes in english locale. This allows to search for styles based on its val attribute. That's basically where hard-coded constants come from (since those seem to be hard-coded in Word as well). And then there are custom styles, which have to have hard-coded names.

Of course, this is something I slapped together in an afternoon, so there has to be a better implementation. But I think concept itself is solid(-ish).

@nenono
Copy link
Author

nenono commented Feb 22, 2015

@lierdakil

So, @nenono, do you happen to use internationalized version of Word?

How should I check that?

@lierdakil
Copy link
Contributor

lierdakil commented Feb 22, 2015 via email

@nkalvi
Copy link

nkalvi commented Feb 22, 2015

No need to check, also in the screen capture you posted earlier shows that the menu etc. are not in English. Besides, the styles.xml (which can be seen when you unpack the docx), shows the difference in naming. This localization of the style names happens when you edit and save the doc in international edition of Word.

In your example, the title style has the id 'a3' instead of 'title'; so with the pandoc's output (with this as reference) opened in Word, Word will not find the style and set it to 'normal'.

@nenono
Copy link
Author

nenono commented Feb 22, 2015

Thanks. I understand.

@lierdakil
Copy link
Contributor

This should be fixed by #1968, which is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants