Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

color emojis #223

Closed
jacques-quidu opened this issue Aug 3, 2023 · 16 comments
Closed

color emojis #223

jacques-quidu opened this issue Aug 3, 2023 · 16 comments

Comments

@jacques-quidu
Copy link

Color emojis seem to not be supported: when i use emojis with "Segoe UI Emoji" font for instance, emojis are rendered as monochrome emojis in PDF.
I guess i can workaround by extracting a color bitmap image from the emoji character with Freetype (i think you can also extract svg image for some fonts) but i would like to know if you intend to add support for color emojis in PDF-Writer ? Or is there a option i missed in PDF text methods for enabling color emojis ?

@galkahana
Copy link
Owner

No current such intention. fonts are assumed to be monochrome, where the color is determined by the graphic context.
frankly up to this ask i wasn't aware that such things are represented by plain fonts.
wonder how a PDF containing those things looks like. i mean how is this represented in PDF where a font is just glyphs drawing and color is determined externally.

@galkahana
Copy link
Owner

galkahana commented Aug 3, 2023

reading about this exactly i see that what you suggest, as in drawing the glyphs as images or lineart, is a common practice in such cases. given that the regular font support in PDF doesn't allow for that. i wonder how this works with copying and pasting the text then. support of understanding text as text in PDF is pretty much reliant on defining text as something drawn with fonts.

do you have an example of a PDF that shows the emojis correctly? like from maybe indesign or some app that might do it properly?

@jacques-quidu
Copy link
Author

i have attached 2 PDF files: one printed with Microsoft Print to PDF and one generated using Hummus PDF-Writer (with my fork more precisely).
You can see that in the printed PDF, emojis are rendered with color shapes while with PDF-Writer they are rendered with monochrome shapes.
testPrint.pdf
testExportPDFWriter.pdf

@jacques-quidu
Copy link
Author

you can reproduce also by printing to PDF some text using font Segoe UI Emoji and emojis from Microsoft Word for instance.

@jacques-quidu
Copy link
Author

jacques-quidu commented Aug 5, 2023

the font file is bigger in the printed PDF so i guess because of the emojis colored glyphs: they are not vectorized so in the page content but stored within the font file.

@jacques-quidu
Copy link
Author

by the way by tracing the Freetype font face with Segoe UI Emoji i found that it contains 3 cmaps: so i tried switching to the other cmap with the same platform id as the cmap selected by default, but it did not fix anything. Actually the 2 cmaps return the same glyph index from the unicode character code so i guess it is more complicated than switching the cmap with Freetype in order to fix this issue...
But as i suggested i could workaround i guess by extracting a color bitmap from the emoji glyph using FT_Load_glyph and draw the bitmap but the character itself.

@galkahana
Copy link
Owner

it makes sense that switching cmaps didn't work. fonts in pdf (unless something changed recently) are inherently monochrome, meaning color info is expected to come from outside and all the PDF glyph drawing operators are just about creating paths. So there has to be some alternative implementation which might be exactly what you are trying to do. it'd be interesting to see waht word did (or indesign, if it supports those...this used to be my go to). one can use pdfhumuus recrypt method to create a version of a pdf with decrypted content streams. if i'll get to it, i'll check and let us know what they did.

@galkahana
Copy link
Owner

galkahana commented Aug 6, 2023

ok, so actually it's not as bad as i thought.
here's the decrypted pdf:
testPrintDecrypt.pdf

if you read the text placement code in the page content stream you'll see that the way each emoji character is drawn (that starts in line 77) by superimposing 4 chars, each with different color values.
like the smiling emoji is basically placing the following glyphs one on top of each other, each time with a different RGB color value (im translating the source 0..1 range to 0...255 so it's easier to understand):

0451 with (1.000000 0.690196 0.180392) = [255, 176, 46] (the yellow part)
0452 with (0.972549 0.192157 0.184314) = [248, 49, 47] (the red part)
0453 with (0.250980 0.164706 0.196078) = [64, 42, 50] (the sort of dark area)
0464 with (0.250980 0.164706 0.196078) = [64, 42, 50] (the sort of dark area)

So, a couple of conclusions from this:

  1. it's not a direct translation to pdf. you can also see that copying and pasting the text does NOT provide for a recreation of it (you will see some bs chars instead of the emojis). so there's no direct support in PDF for it (at least as far as we can tell from this file)
  2. i don't believe word has got something internal to come up with the color and which chars to use, so i think this might be internal to the true type font (Segue ui on my pc at least is true type, which i guess it's also the thing on your com). oh...looks like the wikipedia page for open type emojis explains this well. there's a COLR table for that. (there's also other options cause of course there are, where some fonts implement emojis as svgs or raster with another table describing them...all the fun).
  3. you don't have to draw bitmaps. rather, figure out the chars and colors...and you can just place with regular text operators, using the relevant glyph ids (this is where the GlyphUnicodeMapping input to TJ and its other text operator friends might come in handy) , or char ids if they have any.

p.s.
later having read the specs of colr table...it's quite a lot. maybe if there's rasters and this option i shorter better do that as a quick solution.
and look at all this fun :) to support color fonts all one has to do is:

  • support that colr tables. two variants - 0 is simple super imposing, 1 includes matrix changes and support of gradients. multiple type of gradients that is AND blend modes (that's transparency layering modes.
  • support raster images via CBDT and sbix tables. the latter may include embedded PDFs (which allows for lineart). luckily hummus supports all of the possible image formats. We'll have to have a strategy to reuse all this bollocks so it doesn't grow too much when chars are repeated.
  • for dessert: SVG-in-OpenType which basically means being able to render SVG to pdf. well...if we got this than at least we'll get an additional support of SVG which is not all that a bad thing to have in general.

just terrific :)

not ones weekend project exactly. i might start something, but can't make promises. there's some nice outcomes (especially the SVG rendering) if i go through with this...but it's a lot.

@jacques-quidu
Copy link
Author

jacques-quidu commented Aug 7, 2023

Thanks for this detailed decrypt of colored emojis ;)
And wow it seems a lot to implement indeed for full support of it (especially for svg glyphs in font).
Actually i would need only support for Segoe UI Emoji color glyphs for now (as i use only this font for color emojis on Windows): if Segoe UI Emoji color emojis are encoded so with COLR table and color palette in font file i do not need support for SVG glyphs in font for now.
Also you are right that PDF does not read and copy/paste well these colored emojis which is weird...
So monochrome emojis seems to be better for correct PDF text copy/pasting, and because of that using colored glyphs should still remain a option in PDF.
By the way, the decrypted PDF uses only monochrome emojis, and not colored emojis ?

@jacques-quidu
Copy link
Author

i think what we need is using FT_Get_Color_Glyph_Layer to get each color layer glyph index and color from the base glyph index ?

@galkahana
Copy link
Owner

yes this could work

@galkahana
Copy link
Owner

@jacques-quidu I sat down to implement this colr table 0 version, which is the simplest one. if still useful to you, you can pull it. usage is like you originally did, but now the color emojis would show up.
attaching a sample PDF.

ColorEmojiColr.pdf

funny enough, copying and pasting the text from the pdf to someplace else both on my win and mac acrobats seem to correctly write the unicode text. im guessing that the word version doesn't set the original char value as the unicode value of the glyphs but rather attempts to map the parts, which creates the incorrect output. by some mysterious way (im guessing the algo of acrobat) this also doesn't result in multiplication of the chars per the layer, but just a nice single char per the text. so what i end up getting is the expected:
Segoe UI Emoji:☺☺☺, and some later text

anyways. figured i'll share. and we got an opacity setting operator now in content context as a side effect.

@galkahana
Copy link
Owner

oh. one thing. I did implement this on WriteText operator. not on Tj and its friends. It's a rather high level function, and goes way beyond placing a Tj or something, so figured would only do this on the high level operator to avoid mistakes.

you can see the usage here.

@jacques-quidu
Copy link
Author

@galkahana thanks Gal i will give it a look when i find some time.
But yes i use low-level Tj operator and not high level writeText operator to draw text.

@jacques-quidu
Copy link
Author

jacques-quidu commented Aug 25, 2023

ok it works well indeed with writeText method: so only with font Segoe UI Emoji on Windows i use now high-level writeText method instead of low-level text methods in order Segoe UI color emojis to be properly rendered in PDF: i still use low level methods with other fonts to avoid to reset font or color each time i write text.

@galkahana
Copy link
Owner

Thanks. Thats good input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants