Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTL support #56

Closed
hussasad opened this issue Dec 15, 2015 · 23 comments
Closed

RTL support #56

hussasad opened this issue Dec 15, 2015 · 23 comments

Comments

@hussasad
Copy link

I am trying to write Arabic text using the writeText function, but I am unable to figure out how to set the direction to be RTL. It is mentioned here that RTL support was added to the rendering service, so i guess there must be some way in Hummus to specify the direction?

@galkahana
Copy link
Owner

Hi,
Yes, you should be able to write RTL using a similar methodology to what i am using in the hummusrenderer project, which is based on hummus.

The following defines a method to compute how bidirectional text should be written if looking from a regular computerized left-to-write writing (which is how writeText expects things to be)

bidi = require('icu-bidi');

function computeTextForItem(inText,inDirection)
{
    var p = bidi.Paragraph(inText,{paraLevel: inDirection == 'rtl' ? bidi.RTL:bidi.LTR});
    return p.writeReordered(bidi.Reordered.KEEP_BASE_COMBINING);
}

The text to pass to writeText should now be computeTextForItem(theText).
The extra parameter should be 'rtl' if the text that you are writing is intended to imitate a paragraph that's right to left. normally this shouldn't matter other than if the starting characters are numbers.

If you want multiline support, then i suggest you look into the hummusrenderer relevant code, and we can discuss it if you want.

@hussasad
Copy link
Author

Yes, I used bidi as you suggested above and that worked in reversing the direction, but i still have an issue with the characters not joining as they should for arabic text (like they would for cursive fonts). so if i want to write the text حمّص , in the PDF i get it as ح مّ ص

this happens in hummusrenderer as well. you can notice it if you try the following object

{
    "externals": {
        "fbLogo": "http://pdfrendering.herokuapp.com/profileImage.jpg"
    },
    "pages": [
        {
            "width": 595,
            "height": 842,
            "boxes": [
                {
                    "bottom": 500,
                    "left": 10,
                    "text": {
                        "text": "خمص",
                        "options": {
                            "fontPath": "./resources/fonts/arial.ttf",
                            "size": 40,
                            "color": "pink"
                        }
                    }
                },
                {
                    "bottom": 600,
                    "left": 10,
                    "image": {
                        "external": "fbLogo"
                    }
                }
            ]
        }
    ]
}

@galkahana
Copy link
Owner

yes. i kinda figured this would come next. but i'm not sure i know what to do about this.

At this point writeText only knows to translate unicode characters to their matching glyphs in the font. With Arabic characters there are several choices, but the simplistic selection code just picks the same one, which would create the not-connecting effect.

To overcome this something needs to be changed in hummus to choose the right glyph according to position, or that you can provide the glyph ids directly.

If you already know the correct glyphs, than i can assist you with knowing which hummus commands to use to place those glyphs, otherwise this needs some changes in hummus, which i'd be glad if you can add...otherwise it'll have to wait till i get to it...and its a lower priority now. So i'm sorry here.

Gal.

@amrnablus
Copy link

@galkahana i'm interested to patch this, can you show me where to change? Also is it ok to add a dependency? I think the best way to go here is to create a sort of "glyph resolver" library and make Hummus depend on it to pick the correct glyph.

@galkahana
Copy link
Owner

@amrnablus That would be lovely!
Actually hummus provides a method for placement of text through glyph IDs, so if you can provide them hummus will take care of the rest.

Here is how it works:

writeText is simply an abstraction over lower level text placement commands. This example shows how to use them [ill explain below:]:
https://github.com/galkahana/HummusJS/blob/master/tests/SimpleTextUsageTest.js

This is how to place text using the lower level commands:

        var page = pdfWriter.createPage(0,0,595,842);
        var font = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/BrushScriptStd.otf');
        var fontK = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/KozGoPro-Regular.otf');
        pdfWriter.startPageContentContext(page)
            .BT()
            .k(0,0,0,1)
            .Tf(font,1)
            .Tm(30,0,0,30,78.4252,662.8997)
            .Tj('abcd')
            .ET()
        pdfWriter.writePage(page).end();

Instead of calling writeText on the context you call a series of commands (which is basically calling them):

-- BT - to start text object
-- k - set the color using CMYK values. 0,0,0,1 means black
-- Tf - sets the font, use a regular PDFUsedFont object as you would with writeText
-- Tm - sets the size an position of font. 30 in the example (put twice) is the size of the text (30), and position is (78.4252,662.8997)
-- Tj - is the command placing the actual text

In regular usage Tj accepts a unicode string as in the example, but you can also pass glyph ids.
This is shown in here:
https://github.com/galkahana/HummusJS/blob/master/tests/SimpleTextUsageTest.js#L79

        var page = pdfWriter.createPage(0,0,595,842);
        var font = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/arial.ttf');
        pdfWriter.startPageContentContext(page)
            .BT()
            .k(0,0,0,1)
            .Tf(font,1)
            .Tm(30,0,0,30,78.4252,662.8997)
            .Tj([[68,97],[69,98],[70,99],[71,100]])
            .ET();
        pdfWriter.writePage(page).end();

Note that Tj now has an array of Arrays:
[[68,97],[69,98],[70,99],[71,100]]

Each item in the array marks a glyph. The first number is the glyph ID. The 2nd number is the unicode value that matches it. there might be a third value for surrogate unicode values, but that's CJK characters so Arabic is in the clear.

So, if you can prepare something that can take a font and text and get the right glyph IDs that would be awesome.
Is the information provided clear?

@amrnablus
Copy link

I'm a but rusty on unicode, so; just to confirm the problem:

  1. the client developer passes an arabic text to hummus, something like "ابجد"
  2. what hummus receives is the disconnected code point for each letter, "ا ب ج د"
  3. hummus "forwards" this text to the pdfwriter which ends up printing disconnected letters as it doesn't know how to get the proper glyphs

If that's the case, i prefer to fix this on the pdfwriter level by writing a cpp converter which will take the "ا ب ج د" and convert it to "ابجد" with the proper code points, the font should take care of the rest once the proper glyphs are selected.

Can you please confirm

@galkahana
Copy link
Owner

if you wanna do that in PDFWriter all the better. I can point you to the areas in the code that do the translation in there. will that be good?

@amrnablus
Copy link

Yeah i'd rather do that. Please point me to the code (and if there are cpp samples).

@hussasad
Copy link
Author

@galkahana - it would be more easier to convert the arabic unicode characters from the default block to Arabic Presentation Forms-B block. This way I don't need to figure out the glyphs. Granted this block is only meant for compatibility with older systems, but it works fine for my case. I just used this library to convert the string before calling writeText. I just had to reverse the output string for it to render correctly

@amrnablus
Copy link

I'm considering using either fribidi (http://fribidi.org/) or pango (http://www.pango.org/) to do the trick. I'll run some POCs and post my findings here.

@galkahana
Copy link
Owner

@hussasad - brilliant! Didn't realize that this was an option. i'll keep that in mind. definitely the easier path.

@amrnablus - note that fribidi should give you RTL ordering, not the glyphs [it's what i'm using in hummus as 'bidi', you can look at the source to see how to use it, 'cause it does solve part of the deal (the rtl) - ].
As for sources in PDFWriter. The method that translated string to glyphs is PDFUsedFont::TranslateStringToGlyphs. you can see it here -
https://github.com/galkahana/PDF-Writer/blob/master/PDFWriter/PDFUsedFont.cpp#L75
currently it simply uses freetype to translate each character to a glyph code, and this is where one can place a helper to change that.
When this is done all text commands will be affected, in particular WriteText.
Examples for using the C++ code with its matching WriteText you can find in:
https://github.com/galkahana/PDF-Writer/blob/master/PDFWriterTestPlayground/HighLevelContentContext.cpp#L107

Thanks and good luck!
Gal.

@amrnablus
Copy link

I got this from the fribidi mailing list:

On 15-12-20 03:55 PM, Amr Shahin wrote:

Hello,
Does fribidi provide a functionality to convert a set of arabic codepoints to
their corresponding form-b representation? If so i would appreciate a quick
guide on how to do it.

It does. See fribidi_shape_arabic(). I don't remember the details.

b

Will try to write some POCs converting a regular string into it's form-b format, if that works out fine i'll try to apply the same to hummus.

@galkahana
Copy link
Owner

that's super cool :)

@amrnablus
Copy link

@galkahana So i tried to write a simple cpp application that demonstrates the problem, but it's printing gibberish instead of the arabic letters (Not even the disconnected version).
The sample code is in the attachments, would appreciate if you can take a look
P.S: I'm using the latest version of PDF-Writer compiled from source, the command i'm using for my sample is
g++ -std=c++17 -o /tmp/testArabic testArabic.cpp -I PDFWriter Build/PDFWriter/libPDFWriter.a Build/LibJpeg/libLibJpeg.a Build/FreeType/libFreeType.a Build/ZLib/libZlib.a Build/LibTiff/libLibTiff.a
testArabic.cpp.zip

@galkahana
Copy link
Owner

Hi,
The strings that you should provide to PDFWriter should be unicode encoded in utf-8. It seems like what you are trying to do is to provide plain ascii encoding, which will not work.

Regards,
Gal.

@galkahana
Copy link
Owner

You can use UnicodeString to help you with this.

@amrnablus
Copy link

Thanks Gal, it was actually a font issue, i'm using "FreeSerif.otf" now and it's working

@amrnablus
Copy link

@galkahana does the same issue exist for Hebrew? I tested the POC code for both Arabic and Farsi and it seems to be working fine.

@galkahana
Copy link
Owner

Cool.
Hebrew should be fine with the basic RTL approach of icu-bidi (or any similar algirhtm that you are using). for some letters there are different ending characters (for instance, מ appearing in the end of word would be ם) but they are simply created by hitting different keys. so no need for automatic replacement.

In terms of your implementation (PR), if i may. If you look into adding this directly to PDFWriter (which i'd love if you could, and thank you a lot) my preference would be that it will be used as part of AbstractContentContext::WriteText (already at its begining). for the sake of providing some manual override it would be super nice if you could:

  1. Use a separate class/method for your implementation to do the translation (so i can use externally in scenarios that dont go through writeText), utf8->utf8.
  2. Add another options struct to WriteText, which by default will use this translation, but will have a single boolean to check, and if "true" (the not default) will bypass the translation, and got to the current implementation of writeText.

I'm also good with leaving it as something that someone can calls to before calling writetext, if you deem this better. i can add the WriteText implementation on top of it (just keep it utf8 std::string to utf8 std::string please). I'll also take care of integrating this into HummusJS.

If you'll do that i'll thank you very much, and i'm sure others will as well.

@amrnablus
Copy link

Thanks, that's exactly what i'm doing, except i didn't really separate the bidi conversion outside AbstractContentContext (now that mentioned it, it makes much more sense).
Is there a certain place in the codebase where you would place such a utility class, say something like 'UnicodeTextUtils'

@galkahana
Copy link
Owner

Something like UnicodeTextUtils sounds great

@amrnablus
Copy link

Ok so the code is done and working fine, regarding the commit, should I add fribidi as a "sit-in"? dependency the same way you're using LibJpeg, LibTiff, etc ...? If so should it be a git submodule or just clone the repo and copy it to the sources directory?

@galkahana
Copy link
Owner

Cool :).
Good question. i vote for a sit-in, like LibJpeg and LibTiff. copy the relevant sources in a folder, like libtiff etc. Note that they have conditions in the cmake files. If possible i'd rather that you will use something similar with the new addition in term of a flag that allows not having the icu bidi functionality. in that case, please also make sure that compilation is done safely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants