RTL flips non-RTL characters/symbols #514

Closed
simplej opened this Issue Jul 11, 2013 · 13 comments

3 participants

@simplej

RTL mode flips A-Z, 0-9, and characters like "(" and ")" which makes it slightly more difficult than it should be to use with strings loaded from a file (or user input).

@practicingruby
prawnpdf member

Sorry that it took us so long to respond. I can see this might be a problem, but I'm unsure how it's solved in other applications, or what the usual use cases are like. The issue actually seems that Prawn does NOT flip these characters, but instead renders them right-to-left character by character, so something like "(foo)" becomes")oof("

I don't know that there is a smart way for us to detect characters that would need to be flipped, and I'd hesitate to do so unless there was a well-defined spec or a common convention used elsewhere. There may well be one, but this is not an area I'm very familiar with.

Does anyone have insights on this? I will leave this open for a few weeks in the hopes that some extra research gets added to the ticket.

@elad

Hello,

Prawn is the only PDF writer that can deal with RTL so I'll take a look, please keep this issue open but be patient since I'm not familiar with Ruby.

I'm not familiar with the standards but I did find a Unicode Bidirectional Algorithm and an implementation in Ruby. I will see if I can use it to make Prawn generate properly formatted RTL text.

I also found cscott/node-icu-bidi. Because my project is primarily javascript and I only call Ruby to generate the PDF, I'll try using that before sending the text to Ruby and see if that might be a workaround... (I assume that if it works, another option would be to port that to Ruby if the first option fails.)

@practicingruby
prawnpdf member

Thanks @elad, what you've linked here is already helpful. Strong internationalization support is one of the things we care a lot about, so even if you don't implement this, the research will help me or someone else write a patch.

@elad

Sure! It looks like the Ruby bidi code might be useful because all it takes is a string and paragraph direction, both of which are available to us in Prawn. I'll try to test it out as soon as I get a chance. :)

@elad

It looks like the Ruby BiDi library I linked to above helps solving this problem. I found out that if you use its to_visual function and then reverse the text you get properly rendered RTL text.

Here's my code:

require "prawn"
require "./bidi"

def render_text(s)
  return Bidi.new().to_visual(s).reverse!
end

Prawn::Document.generate("hello.pdf") do
  font_families.update("Arial" => {
    :normal => "Arial.ttf",
  });
  font("Arial")

  self.text_direction = :rtl

  english_and_hebrew = "משפט עם עברית ו-English. מספרים: 12345 (וגם כל מיני סימני פיסוק) וגם סימן קריאה!\n"
  english_only = "This is an English-only sentence."
  hebrew_only = "זהו משפט בעברית בלבד."

  text render_text(english_and_hebrew)
  text render_text(english_only)
  text render_text(hebrew_only)
end

Note that GitHub doesn't render the sentence correctly, so here's an image of what it should look like:

screen shot 2014-11-04 at 3 41 55 pm

(it says "A sentence with Hebrew and English. Numbers: 12345 (and punctuation) and an exclamation mark!")

Here's a picture of the PDF the code above generated:

screen shot 2014-11-04 at 3 43 49 pm

As you can see it looks exactly the way it should, and it doesn't affect strings that have no RTL characters in them so there's no need to pass the direction.

@elad

Following up... I guess the question is how should the integration be done. There are two obvious options:

  • A documentation fix telling people who need BiDi support where and how to get it
  • Bundle the BiDi code with Prawn (it would add <1.5mb, 1.3mb of it is Unicode data) and expose a render_text kind function

Thoughts?

@practicingruby
prawnpdf member

@elad I think we can combine both approaches. We should create a prawn-bidi gem that depends on Ruby BiDi (either bundled or as a gem dependency), and then add a note in Prawn's documentation that those who need bidrectional support. Our "no third party dependencies" rule only applies to the prawn gem, it's OK for extensions to have dependencies.

I almost wonder if we should look towards putting most or all RTL-text related functionality into its own gem, so that people can continue to improve it independently of Prawn's release cycle. But that could be a problem to look into later.

@elad

I guess the first step should be to publish Ruby BiDi as a gem, because right now it's only available as a zip from SourceForge... I sent an email to the author and I hope we can get this done soon. I'll also expose a function that reverses the string so developers won't have to wrap it. Once that's done a temporary solution could be a simple documentation note, for example:

If you plan on writing right-to-left PDFs, consider using ruby-bidi and wrapping strings you pass to Prawn's text rendering functions with render_string.

Since I'm not a regular contributor and I'm not familiar with the RTL-related code (amount, complexity, maintainers, rate of change, etc.) I don't think my input will be useful to a decision on whether or not it should be factored out as prawn-bidi.

I'll follow up once the gem is up. I'll also be more than happy to write any documentation notes on how to use it and provide examples.

@practicingruby
prawnpdf member

@elad: Sounds like you have the first steps sorted out. Work on those and then we can figure out the next steps from there.

@elad

I have put the Ruby BiDi code in elad/ruby-bidi and made it into a gem that can be installed with gem install bidi. There are still a few bugs but those are related to the packaging, not the code, and are probably just my complete lack of experience writing Ruby. ;)

After installing it, I can create a file (say, hello.rb) in an empty directory, put this code in it:

require "prawn"
require "bidi"

Prawn::Document.generate("hello.pdf") do
  font_families.update("Arial" => {
    :normal => "Arial.ttf",
  });
  font("Arial")

  self.text_direction = :rtl

  bidi = Bidi.new
  text bidi.render_visual "משפט עם עברית ו-English. מספרים: 12345 (וגם כל מיני סימני פיסוק) וגם סימן קריאה!\n"
end

Throw in the Unicode Arial font, run ruby hello.rb and I get hello.pdf with properly rendered Hebrew and English text.

Should I write some documentation blurb about this for Prawn?

@practicingruby
prawnpdf member

Sure, a documentation patch would be welcome, though the best bet is to put this code snippet in your repository's README (or some other page) and then just link to it from Prawn, so that it doesn't get out of date on our end.

To make a prawn-bidi gem would be even better. It'd do something like automatically call bidi.render_visual on input text whenever :direction => :rtl was set. It's something you can work on if you'd like, but what you have already is a good starting point.

@elad

Sure, a documentation patch would be welcome, though the best best is to put this code snippet in your repository's README (or some other page) and then just link to it from Prawn, so that it doesn't get out of date on our end.

Done.

I'll submit a pull request with a patch for manual/text/right_to_left_text.rb, a very quick glance suggests the text for the manual is constructed from the comments. If that's not the case, please point me in the right direction.

To make a prawn-bidi gem would be even better. It'd do something like automatically call bidi.render_visual on input text whenever :direction => :rtl was set. It's something you can work on if you'd like, but what you have already is a good starting point.

That's a great idea, but I don't think I'm the best person to write an official gem for Prawn (see bugs referenced earlier :) but I'll help with anything I can if pinged.

@practicingruby
prawnpdf member

@simplej: Please give @elad's solution a try. Closing for now, but will re-open if we find this does not solve the problem for most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment