New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change text box to return remaining text as UTF-8, improve Win1252 handling internally, raise errors or warnings rather than silently replacing invalid glyphs #793

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@practicingruby
Member

practicingruby commented Oct 19, 2014

Currently if using an AFM-font,Prawn's text box will return overflow text in Win1252 encoding, even though the user must provide their text in UTF-8 format. To render this overflowed text, the user needs to pass :skip_encoding, which is a fairly awkward workflow.

In an ideal setting, we should treat any transcoding to Win1252 (which isn't even supported by Ruby, so we implement it ourselves) as an implementation detail, and never allow the transcoded text to cross the public API boundary. In other words, Prawn should be UTF-8 in, UTF-8 out to keep things simple for users.

See related discussion on #777, and on #779.

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 19, 2014

This patch almost works... we see one test failure and one manual example failure currently. I'm a little nervous about merging this even if we can fix the remaining failure, because it constitutes an API change to the 1.0 stable API.

That said, we can preserve nearly perfect backwards compatibility in practice by making :skip_encoding a no-op. This would make it so that old code which passed remaining text back into text_box would still work correctly. The only possible breakage would be if someone was actually doing some of their own processing on the win1252 text to convert it back into UTF-8, but that seems unlikely.

@@ -101,6 +101,10 @@ def normalize_encoding(text)
"Arguments to text methods must be UTF-8 encoded"
end

def to_utf8(text)
text.bytes.pack("U*")

This comment has been minimized.

@practicingruby

practicingruby Oct 19, 2014

Member

TODO: This may not actually be a correct way of converting back into UTF-8. Needs closer investigation.

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 20, 2014

In theory, this should be working now! The only thing I haven't completely dealt with yet is making :skip_encoding a no-op, or removing it entirely.

@airblade: Can you please try this code in your original use case and see if it solves the problem for you? It implements UTF-8 in, UTF-8 out, so you shouldn't get any encoding surprises anymore.

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 20, 2014

@packetmonkey: This is another "I'm worried it's going to break stuff" pull request. Are you using built-in fonts in any of your non-trivial Prawn code? If so, please take this for a spin.

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 20, 2014

Note to all... I am not sure why we were using our own Win1252 implementation... perhaps it wasn't supported in Ruby 1.8 back when we still needed to support 1.8? It's possible to use Ruby's M17N system now for this purpose, so this patch switches to using Ruby's implementation rather than our own under the hood.

@packetmonkey

This comment has been minimized.

Contributor

packetmonkey commented Oct 21, 2014

We use both the built in fonts and additional fonts. I have a lot of mock data with letter accents. All of my text gets added via a call to #formatted_text_box.

I'll give it a look over and see if I trip into anything.

@airblade

This comment has been minimized.

Contributor

airblade commented Oct 21, 2014

@sandal This fixes the problem for me. Thank you!

@practicingruby practicingruby changed the title from Change text box to return remaining text as UTF-8 to Change text box to return remaining text as UTF-8, improve Win1252 handling internally Oct 21, 2014

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 21, 2014

a68eef6 contains a scary but probably good change: It raises an error if non-Windows-1252 glyphs are attempted to be rendered in AFM fonts. The behavior of existing versions of Prawn is to replace with an underscore, which is pretty much never what users will want, and masks bugs.

This latest also changes or gets rid of tests that actually only worked because they were exploiting this bad behavior.

@practicingruby practicingruby changed the title from Change text box to return remaining text as UTF-8, improve Win1252 handling internally to Change text box to return remaining text as UTF-8, improve Win1252 handling internally, raise errors or warnings whenever we might be rendering garbage text Oct 21, 2014

@practicingruby practicingruby changed the title from Change text box to return remaining text as UTF-8, improve Win1252 handling internally, raise errors or warnings whenever we might be rendering garbage text to Change text box to return remaining text as UTF-8, improve Win1252 handling internally, raise errors or warnings rather than silently replacing invalid glyphs Oct 21, 2014

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 30, 2014

@packetmonkey: I'm going to merge this, but if anything breaks open up issues and we'll get them sorted out.

practicingruby added a commit that referenced this pull request Oct 30, 2014

Vastly improve Prawn's AFM font encoding handling
A squashed merge of #793, containing the following improvements:

* Text for all Prawn methods is now UTF-8-in, UTF-8-out, so the user
does not need to handle Windows-1252 strings.

* Internally, we're now using Ruby's M17n system to handle the encoding
into Windows-1252, so text.encoding will come back as Windows-1252
when `AFM#normalize_encoding` is called, rather than `ASCII-8Bit`

* When using AFM fonts + ASCII only text, no warning will be seen.

* When using AFM fonts + non-ASCII characters that are supported in
 Windows-1252, users will see a warning about the limited
internationalization support, along with a recommendation to use a TTF
 font instead.

* The warning includes instructions on how to disable it (just set
`Prawn::Font::AFM.hide_m17_warning = true`)

* When using AFM fonts + non-ASCII characters that are NOT supported in
* WIndows-1252, an exception will be raised rather than replacing w.
 `_`.

* None of the above will apply to anyone using TTF fonts with sane UTF-8
support, everything should "just work" for those folks.
@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 30, 2014

Merged in adaf18c!

@airblade

This comment has been minimized.

Contributor

airblade commented Oct 30, 2014

@sandal Great news!

@isaiah

This comment has been minimized.

isaiah commented Oct 22, 2015

a68eef6 contains a scary but probably good change: It raises an error if non-Windows-1252 glyphs are attempted to be rendered in AFM fonts. The behavior of existing versions of Prawn is to replace with an underscore, which is pretty much never what users will want, and masks bugs.

@practicingruby I think this change makes the fallback_fonts ignored.
I have a document that contains Cyrillic, CJK, Arabic, Greek or anything, so I cannot really tell the document which font to use, previously I depends on the fallback_fonts feature, the UTF-8 glyphs are rendered correctly, but it's broken since 2.0.0

@practicingruby

This comment has been minimized.

Member

practicingruby commented Oct 22, 2015

@isaiah Do not mix AFM fonts with fallback fonts... use TTF fonts throughout and you'll be fine!

@isaiah

This comment has been minimized.

isaiah commented Oct 22, 2015

@practicingruby How can I do that? Currently I monkey patched the Prawn::Document#font_families method and #fallback_fonts method, to add my own TTF fonts. Should I disable the original fonts completely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment