Encoding::CompatibilityError with shrink_to_fit #603

Open
stayhero opened this Issue Dec 17, 2013 · 13 comments

Projects

None yet

4 participants

@stayhero

Today the following exception was raised when my application tried to create a PDF:

Encoding::CompatibilityError
incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

in line_wrap.rb, method remember_this_fragment_for_backward_looking_ops:

This line raised the exception:

@previous_fragment_ended_with_breakable = pf =~ /[#{break_chars}]$/

The encoding of break_chars was UTF-8, the encoding of pf was ASCII-8BIT. Unfortunately I was not yet able to find a proper fix. It seems to me this only happens with text to be printed in UTF-8 (in this example czech) with some special chars, and when using the DejaVu font (but probably happens with other fonts too).

I forked the repo and created a branch with a failing test case ( spec/utf8_shrink_to_fit.rb): https://github.com/stayhero/prawn/tree/utf8_shrink_to_fit_bug

Or you can simply copy & paste it from here:

# encoding: utf-8
require File.join(File.expand_path(File.dirname(__FILE__)), "spec_helper")

describe "#shrink_to_fit with special utf-8 text" do
  it "Should not throw an exception" do
    pages = 0
    doc = Prawn::Document.new(page_size: 'A4', margin: [2, 2, 2, 2]) do |pdf|
      add_unicode_fonts(pdf)
      pdf.bounding_box([1, 1], :width => 90, :height => 50) do
        broken_text = " Sample Text\nSAMPLE SAMPLE SAMPLEoddělení ZMĚN\nSAMPLE"
        pdf.text broken_text, :overflow => :shrink_to_fit
      end
    end
  end
end


def add_unicode_fonts(pdf)
  dejavu = "#{::Prawn::BASEDIR}/data/fonts/DejaVuSans.ttf"
  pdf.font_families.update("dejavu" => {
    :normal      => dejavu,
    :italic      => dejavu,
    :bold        => dejavu,
    :bold_italic => dejavu
  })
  pdf.fallback_fonts = ["dejavu"]
end

A quick fix would be to simply check for incompatible encodings before doing the regex. But I guess it would be more appropriate to prevent @previous_fragment to be in ASCII-8BIT.

Note that this does not happen when not specifying DejaVu as fallback font and not even everytime: only when the bounding box has such a width that Prawn tries to break the line at a specific point.

@practicingruby
Member

Hi, thanks for the detailed bug report and tests. It looks like you're running into this issue when you use one of the built-in fonts. Have you checked if this error happens when you explicitly use a TTF font for the default font as well as the fallback?

I'll verify that myself when I get around to looking at this issue, but I thought I'd mention it as it might be a workaround.

@stayhero

Hi.

It does work with the built-in font if it is explicetely specified like this:

# encoding: utf-8
require File.join(File.expand_path(File.dirname(__FILE__)), "spec_helper")

describe "#shrink_to_fit with special utf-8 text" do
  it "Should not throw an exception" do
    doc = Prawn::Document.new(page_size: 'A4', margin: [2, 2, 2, 2]) do |pdf|
      add_unicode_fonts(pdf)
      pdf.bounding_box([90, 800], :width => 90, :height => 50) do
        pdf.stroke_bounds
        broken_text = " Sample Text\nSAMPLE SAMPLE SAMPLEoddělení ZMĚN\nSAMPLE"
        pdf.font 'dejavu'
        pdf.text broken_text, :overflow => :shrink_to_fit
      end
    end
    file = doc.render
    File.write("/tmp/test.pdf", file)
  end
end


def add_unicode_fonts(pdf)
  dejavu = "#{::Prawn::BASEDIR}/data/fonts/DejaVuSans.ttf"
  pdf.font_families.update("dejavu" => {
    :normal      => dejavu,
    :italic      => dejavu,
    :bold        => dejavu,
    :bold_italic => dejavu
  })
  pdf.fallback_fonts = ["dejavu"]
end

If I keep relying on automatic fallback fonts using the not-built-in font DejaVu in crashes the same way. So yes explicetely using the DejaVu font works.

Don't worry about a quick fix but thanks for the hint. ;-) My current workaround was simply catching the exception and use pdf.text with a smaller font without shrink_to_fit option. :) I'm using Prawn (master-branch) in production since a year or so and it never has any problems and this problem seems to be a very rare one.

BTW: Thanks for creating Prawn. It's a very useful piece of software. ;-) I would love to help fixing the bug but I guess for you (or someone familar with the code) it's maybe easier to find the problem than if I would try to fix it by myself. Let me know if you guess that you could fix it relatively easily. If not I would try to take some time and understand your code. ;-)

@practicingruby
Member

I've been away from the project for a long time so my research lead time is very slow. If you want to put some effort into investigating the issue, please do! Even getting a little bit farther with it will help me resolve it faster once I get around to looking at it.

@practicingruby
Member

Hi, your test does not fail on master for me right now, so I'm not sure how to investigate further. Can you double check your code to be sure it's failing on current master?

@practicingruby
Member

Closing but will re-open once you confirm the failure.

@stayhero

Yes, I'm sure. :-) I forked your repo just yesterday. Did you copy&paste the test from above? Make sure you copy the first pasted version as the second one is the "fixed" version. Or checkout from

https://github.com/stayhero/prawn/tree/utf8_shrink_to_fit_bug

I just rebased with current master but there were no changes. Maybe I did not correctly add the test to rspec (I'm not familiar with it)... to see the error you need to run

rspec spec/utf8_shrink_to_fit.rb

I don't know why the test is ignored when simply calling "rspec" :)

I ran it with MRI 1.9.3-p194 and 2.0.0-p247.

Today I investigated it a bit more and found out that the error occurs when remember_this_fragment_for_backward_looking_ops is called when @previous_fragment == "lení". The char í is recognized as ASCII format (which I guess is correct) but somehow the break_char stays encoded in UTF-8 which is why the regex fails.

@practicingruby
Member

Thanks, trying the code from your fork did the trick for me, maybe I just messed something up before.

I have not merged the test yet because I want to revise it a little bit, but the bug is confirmed! Feel free to keep researching a fix, otherwise anyone else is welcome to work on this one if I don't get to it first.

@bpinto
bpinto commented Feb 12, 2014

I don't know why the test is ignored when simply calling "rspec" :)

@stayhero You have to append _spec to the test filename in order to fix this bug.

@stayhero

@bpinto Thanks. :-)

@practicingruby practicingruby added this to the 1.0 Wishlist milestone Feb 24, 2014
@johnnyshields
Contributor

+1 for this. Using Prawn in a UTF-8 char environment and this issue is causing random breakage in production. Thank you all!

@johnnyshields johnnyshields added a commit to johnnyshields/prawn that referenced this issue Apr 27, 2014
@johnnyshields johnnyshields Commit failing test case for issue #603, based on stayhero/utf8_shrin…
…k_to_fit_bug
103b75b
@johnnyshields johnnyshields added a commit to johnnyshields/prawn that referenced this issue Apr 27, 2014
@johnnyshields johnnyshields commit workaround for failing #603 test case 41d1561
@johnnyshields
Contributor

committed workaround here: #714

@practicingruby
Member

This is probably not its own distinct bug, but a result of #729 and #779. The (unfortunate) reality at present is that if you want to mix AFM (PDF built-in) fonts and TTF fonts in the same Prawn document, you can expect bugs and bad behaviors.

The solution is to use TTF throughout your document whenever you need internationalization support. This is going to most likely continue to be the recommendation even after we clarify behaviors, but in the future you'll get better warnings / errors to detect this sort of problem.

@practicingruby
Member

I've confirmed that not even #793 seems to fix this issue, so reopening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment