-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docx reader: common class names don't work in different languages. #1692
Comments
I would be thankful for tips on how to avoid using |
Well, I don't really like the string argument to Beyond that, though, I'd say that the reuse of
It would be clearer, and avoid the extra parameter. |
I'd be happy to try to build off this patch to implement those changes. It might take a day or two for me to get to it, though. |
And thanks for your work on this! |
By the way, I take it you have an internationalized version of word, right? What about block quotes? |
Yes, I do have internationalized version of Word. Russian, to be precise. My workflow is not using blockquotes, however, so I never checked. Let me check and get back to you. Also, please see my patches for #1607 when you have some time. I think those turned out a little better. |
I'll take a look at the writer patches soon -- though probably not till I'll pull these changes into a branch of mine, and then see if I can Lists are actually something else that should be dealt with here, but I
|
Okay, about block quotes. Same as with headers, |
Could you email me your Russian doc? Email address in the header of the docx reader source. |
Sure. No idea what you might want besides quotes and headings, but those are included. |
Signed-off-by: Jesse Rosenthal <jrosenthal@jhu.edu>
Okay, I made my code a bit more clean and extensible in separate branch. I dislike the fact that I had to either use ad-hoc polymorphism, or have two variants of each function called from archiveToStyles, but at least this code is easily extended for other special cases. |
Or here is another bright idea: we could probably just "demangle" paragraph |
I much prefer the approach in 1692-alt, and it passes tests -- but it doesn't pick up on the block quotes, because
There are some refinements (do we combine two consecutive blockquotes with different styles the way we do two normal blockquotes, or not?) but this will be an improvement. Also: there should be a more infomative commit message. Could you repush (or just squash and |
Right. I should have these bools as guards. Not sure what I was thinking when pushing. Just reordering is no-go, since that won't catch inherited styles. I will push in a minute. |
This patch builds paragraph styles tree, then checks if paragraph has style.styleId or style/name.val matching predetermined patterns. Works with "Heading#" (name.val="heading #") for headings and "Quote"|"BlockQuote"|"BlockQuotation" (name.val="Quote"|"Block Text") for block quotes.
Ok, all done. Squashed into 16a51f7 |
This patch builds paragraph styles tree, then checks if paragraph has style.styleId or style/name.val matching predetermined patterns. Works with "Heading#" (name.val="heading #") for headings and "Quote"|"BlockQuote"|"BlockQuotation" (name.val="Quote"|"Block Text") for block quotes.
Pushed. Thanks so much for keeping on top of this! I'll close this issue, since any further refinements should probably be its own thing. |
Same as issue #1607, but for docx reader. There is limited internationalization support in reader with prefix list, but this approach is rather clunky.
I advocate use of
<name val="heading #">
fallback for non-english locales.The text was updated successfully, but these errors were encountered: