New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognition of LTR(Right to Left) Word(s) in a *RTL* document #5558
Comments
It's unfortunate in this case that Word marks rtl but not ltr. Here's a step toward a solution. Convert the Word to markdown (even with the released version, since the rtl spans are useless here), and explicitly add spans marking the English passages, like so:
Now convert the resulting markdown file to a PDF, using
You should see proper alignment for the English and Persian phrases. (This approach uses pandoc's default polyglossia.) Maybe not quite what you want, and requires manual intervention via an intermediary markdown file, but a start. What we need is a way to automatically mark up the English bits as english (or alternatively as This could probably be done using a lua filter, but it's a bit complex since you have to put the span over multiple consecutive elements. |
@jgm Thank you, So you mean I go into the markdown file and include each instance of english word or group of words in a
It will incrementally find and replace the instances of English words. In some cases, the English group boundaries is not recognized correctly, so I have to stop the |
If I understand correctly, the underlying problem is that word doesn't have a representation for rtl-documents (like latex, html and pandoc-markdown have). In Word, all documents are ltr-documents, and some (or all) parts of the text are marked-up as rtl. But if you know it's actually a rtl-document, you could use a lua-filter to transform pandoc's internal document AST to what the LaTeX writer expects, as @jgm mentioned. With the current pandoc nightly-build, the filter would:
It's not the most straight-forward filter, but shouldn't be impossible either. |
@mb21 I'm not a programmer but as far as I know neither markdown nor latex have annotations for rtl. In case of latex if you load a package called bidi you can mark rtl and print it in the output. |
The code itself looks good. I do remember, though, that there were some subtleties that kept me from implementing it (or stalled out my motivation): It seems likely that your implementation will take care of the majority. What we want to handle (taking English and Arabic as example languages) is something
The locale is mainly important here, because of how the default Offhand, I'm not sure if your changes would cover these bases. Unfortunately it's a bit of a hectic week, so I might not get to look at it more closely for a few days. But this sounds like a job for TDD anyway. My brother-in-law is a Hebrew philologist working in the UK, and I think he works on both English and Israeli computers, so I might be able to get the above collection of docs from him, though, if that would help. |
Yes, it would be helpful to have some real-world test documents. |
#5545
I think I explained my request(and not really a problem with Pandoc) very clearly, but I repeat it here again, however, if you find any part ambiguous, please ask me to elaborate further.
fmpandoc.docx
Github doesnt support tex files so I couldn't upload Pandoc converted tex file. Suppose you convert this file to tex with Pandoc by:
For the sake of simplicity you can safely remove your tex preamble and add these lines before
\begin{document}
:When compiling this file with xelatex, the English group is rendered reversely, that is, (The Wild Flower Key. Frederick Warne & Co. p. 310.) is rendered (.310 p. Co. & Warne Fredrick Key. Flower Wild The). If you put it inside \lr{...} command the order of English sentence is rendered correctly.
What I had in mind was not distinguishing LTR and RTL words, solely LTR words. I asked if it was possible to put ltr words inside an
\lr{}
command using Pandoc?I would like to appreciate your efforts in creating and developping Pandoc, it is really great. Thank you.
Best.
\LTRfootnote{}
and if it contains both rtl and ltr it is\RTLfootnote{}
.bidi pkg
, you wouldn't needxepersian
. You will have to define a Persian font family and put you RTL words in\RL{...}
command or your paragraph in\begin{RTL}...\end{RTL}
(case-sesitive), of course with the inclusion of your persian font command in both. In case you were interested, I would upload a mwe.The text was updated successfully, but these errors were encountered: