Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<w:bCs> [complex script] being ignored when convert from docx to anything #4947

Closed
remy33 opened this issue Oct 4, 2018 · 5 comments
Closed

Comments

@remy33
Copy link

remy33 commented Oct 4, 2018

Tested on last and nighties.

An important thing to note is that properties make a distinction between the two groups of characters, normal and complex script (Arabic, for instance), and that the properties have a different tag depending on which type of character it’s affecting.

Most normal script property tags have a matching complex script tag with an added “C” specifying the property is for complex scripts. For example: <w:i> (italic) becomes <w:iCs>, and the bold tag for normal script, <w:b>, becomes <w:bCs> for complex script.

Sometimes there is both so there is no bug, but at some documents there is only <w:bCs> tags and thus the bolding doesn't work.

In short a docx with this
`-<w:r w:rsidRPr="00597F15">

-<w:rPr>

<w:rFonts w:hint="cs"/>

<w:b/>

<w:bCs/>

<w:rtl/>

</w:rPr>

<w:t>שלום</w:t>

</w:r>`

Will work, but this:
`-<w:r>

-<w:rPr>

<w:rFonts w:cs="David" w:hint="cs"/>

<w:bCs/>

<w:rtl/>

</w:rPr>

<w:t>שלום</w:t>

</w:r>`
Won't work

But if I'll open the zip and replace all the <w:bCs/>into <w:b/> it would work flawless.

Basically that's a complex script support problem.

Thanks in advance.

@remy33
Copy link
Author

remy33 commented Oct 4, 2018

An idea of how to fix it would be to work like MSword does, if he sees w:hint="cs" he is only looks for the tags ending with Cs.
In the example above you can see that if you edit the XML so it would have:
`<w:rFonts w:cs="David" w:hint="cs"/>

<w:b/>`
Word won't show the bold ( but Pandoc would convert with bold ).

@jgm
Copy link
Owner

jgm commented Oct 4, 2018

It would be easy to have pandoc bold anything that has w:bCs in its properties (treating this just like w:b). This wouldn't have exactly the same semantics, because w:bCs just says to bold the complex script texts, but my sense is that it would work well enough for 99% of the cases -- is that right?

Also, it would be helpful to know whether there are other similar Cs properties we should be looking at, e.g. w:iCs?

@remy33
Copy link
Author

remy33 commented Oct 4, 2018

As far as I understood , w:bCs is the same as w:b but it's for complex script (foreign languages ), and thus if there is no reason to have w:bCs tag on regular English or w:b in Hebrew.
But if you do use w:bCs without use the w:hint="cs, (for example mixing old Word versions or editing the XML file directly) this 1% will cause an extremely hard bug.
But from what I sew in word 2016 it added both tags ( like in my example above ).

Forth on your second question I only saw this misbehave with bold and underline, for some reason italic was fine even I didn't look in dept to see why.

jgm added a commit that referenced this issue Oct 4, 2018
These are variants for "complex scripts" like Arabic and
are now treated just like b, i (bold, italic).

Colses #4947.
@jgm
Copy link
Owner

jgm commented Oct 4, 2018

I've pushed a fix. It could use testing, if you want to grab the nightly tomorrow.

@jgm jgm closed this as completed Oct 4, 2018
@remy33
Copy link
Author

remy33 commented Oct 11, 2018

Working flawless. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants