Skip to content

Commit

Permalink
Change TrueType subsetter to avoid generating characters that need to…
Browse files Browse the repository at this point in the history
… be escaped

If text is output into a content stream, it needs to be serialized
according to the PDF spec rules. For serializing strings HexaPDF has
chosen the literal string syntax because the hexadecimal string syntax
would need twice as many characters.

For literal strings there are four characters -- \r, (, ), \ -- that
need to be escaped. If any of theses characters appears in the string,
String#gsub! has to actually do some work modifying the string.

By making sure that the encoded text using a TrueType font doesn't
contain those characters we avoid this work.

This change saves about 31% of object allocations and about 5% runtime
on the raw_text 10x TrueType benchmark.
  • Loading branch information
gettalong committed Jan 16, 2021
1 parent b461323 commit cdb8723
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions lib/hexapdf/font/true_type/subsetter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ def initialize(font)
def use_glyph(glyph_id)
return @glyph_map[glyph_id] if @glyph_map.key?(glyph_id)
@last_id += 1
# Handle codes for ASCII characters \r, (, ) and \ specially so that they never appear in
# the output (PDF serialization would need to escape them)
if @last_id == 13 || @last_id == 40 || @last_id == 41 || @last_id == 92

This comment has been minimized.

Copy link
@mwlang

mwlang Jan 22, 2021

I'm wondering if there's a bug here. What happens if @last_id is 40 here? My take is that you'll enter this conditional block, update the @last_id to 41 and use that (which is also invalid!).

Perhaps, this instead?

        def use_glyph(glyph_id)
          return @glyph_map[glyph_id] if @glyph_map.key?(glyph_id)	          return @glyph_map[glyph_id] if @glyph_map.key?(glyph_id)
          @last_id += 1 
          # Handle codes for ASCII characters \r, (, ) and \ specially so that they never appear in
          # the output (PDF serialization would need to escape them)
          @last_id += 1 while [13, 40, 41, 92].include?(@last_id)
          @glyph_map[:"s#{@last_id}"] = @last_id
        end

This comment has been minimized.

Copy link
@gettalong

gettalong Jan 22, 2021

Author Owner

You are right, there is a bug. Thanks for making me aware!

This comment has been minimized.

Copy link
@gettalong

gettalong Jan 22, 2021

Author Owner

@mwlang I just released 0.14.2 with the fix! Thanks again!

@glyph_map[:"s#{@last_id}"] = @last_id
@last_id += 1
end
@glyph_map[glyph_id] = @last_id
end

Expand Down Expand Up @@ -107,7 +113,7 @@ def build_glyf_table
locations = []

@glyph_map.each_key do |old_gid|
glyph = orig_glyf[old_gid]
glyph = orig_glyf[old_gid.kind_of?(Symbol) ? 0 : old_gid]
locations << table.size
data = glyph.raw_data
if glyph.compound?
Expand All @@ -134,7 +140,7 @@ def build_hmtx_table
hmtx = @font[:hmtx]
data = ''.b
@glyph_map.each_key do |old_gid|
metric = hmtx[old_gid]
metric = hmtx[old_gid.kind_of?(Symbol) ? 0 : old_gid]
data << [metric.advance_width, metric.left_side_bearing].pack('n2')
end
data
Expand Down Expand Up @@ -166,7 +172,10 @@ def build_maxp_table(nr_of_glyphs)
# Adds the components of compound glyphs to the subset.
def add_glyph_components
glyf = @font[:glyf]
@glyph_map.keys.each {|gid| glyf[gid].components&.each {|cgid| use_glyph(cgid) } }
@glyph_map.keys.each do |gid|
next if gid.kind_of?(Symbol)
glyf[gid].components&.each {|cgid| use_glyph(cgid) }
end
end

end
Expand Down

0 comments on commit cdb8723

Please sign in to comment.