Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "Traditional Arabic Windows 3.1 font page" #681

Closed
khaledhosny opened this issue Jan 6, 2018 · 39 comments · Fixed by #3063
Closed

Support "Traditional Arabic Windows 3.1 font page" #681

khaledhosny opened this issue Jan 6, 2018 · 39 comments · Fixed by #3063

Comments

@khaledhosny
Copy link
Collaborator

khaledhosny commented Jan 6, 2018

(This comment is edited by Ebrahim) Currently it needs Uniscribe support as the client needs to use ScriptGetCMap() or so since the fonts don’t have an Arabic cmap subtable.

However these fonts are still rather popular among Arabic users and there are hundreds of them floating around the web and people often complain about them not working on HarfBuzz-using applications (for example libass/libass#292).

I don’t know if we can just do PUA shaping similar to Thai or if there is someway to identify these fonts like the Windows API do.

@behdad
Copy link
Member

behdad commented Jan 7, 2018

I like to do this, but don't think we'd have ANY way to detect these.

@dscorbett
Copy link
Collaborator

dscorbett commented Jan 17, 2018

(Edited by Ebrahim: this comment is related to #992, ignore it for this issue)

It should be possible to detect them. ssee1256.fon declares its dfCharSet as 178 (Arabic), which corresponds to Windows-1256. The font doesn’t fully comply with Windows-1256 but it is similar. Compare with ssee1255.fon, which uses a modification of Windows-1255.

@behdad
Copy link
Member

behdad commented Jan 17, 2018

(Edited by Ebrahim: this comment is related to #992, ignore it for this issue)

Right. But we don't support .fon files. Client font-funcs should support them before we can shape anything, and they should synthesize the cmap.

I'm open to coming up with a way to implement .fon and Type 1 in HarfBuzz itself if it's not too intrusive and does not distract from OpenType stuff.

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 12, 2018

(Edited by Ebrahim: this comment is related to #992, ignore it for this issue)

Does the implementation have to be cross platform (and afterward support big endian machines also)? If so I guess we will have so fun with parsing EXE resources (as FON is just an alias, IIUC) and then FNT, writing a limited set of little endian int parsers. And do we need just to implement something equivalent to ScriptGetCMap or we need more? I mean are the other font rendering stack layers ready for it? It will be like my .dfont work but way cooler! (but I don't think if I can reach to it)

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 17, 2018

There is two fonts with the similar to "MS Sans Serif" name, one "MS Sans Serif" with "sserife.fon" file name which is ".fon" file and "Microsoft Sans Serif" with "micross.ttf" file name. Which one is fixed here and referred on here @jfkthame? The bitmap one or the sfnt one? "micross.ttf" is not getting passed on #986 check AFAICS, maybe I am checking a newer version and not the version used to be? And maybe this is a different bug and #991 is headed on a wrong direction?

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 17, 2018

Ok, I guess I am now more confident that we were talking about two separate issues, lets separate them, #992 for .fon support and this for "Traditional Arabic Windows 3.1 font page" support as @khaledhosny's finding here.

@ebraminio ebraminio changed the title Enable Windows-1256 private shaping on other platforms Support "Traditional Arabic Windows 3.1 font page" Apr 17, 2018
@ebraminio
Copy link
Collaborator

ebraminio commented Apr 19, 2018

I've found just A LOT of them here

This is a survey of them, fonts.txt

Note the number of fonts HarfBuzz can't render.

Of them 468 ones are "Traditional Arabic Windows 3.1 font page", 13 ones are "Simplified Arabic Windows 3.1 font page" and 3 with 0xEE00 (unknown font page, "Phyllis ATT Italic.TTF", "LUCASIT.TTF" and "SIGNETRO.TTF")

@ebraminio
Copy link
Collaborator

Aha, now I see, as originally was stated by Khaled and David, this is about Windows-1256 and probably can use some of the things as f28b1c8 but fixing this completely is not limited to that. The interesting thing is OS2's font page list can be completed with fonts dfCharSet as was noted by David but even more interesting, we have found some of the dfCharSet values were not available on the link!

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 19, 2018

Or maybe not, 0xBA refers to Simplified Farsi Windows 3.1 font page on OS2 font pages list but to BALTIC_CHARSET on dfCharSet but at least I guess the fonts I found with 0xEE00 were referring to Windows-1250.

@ebraminio
Copy link
Collaborator

I guess the fonts I found with 0xEE00 were referring to Windows-1250.

Yes, that fonts had only Windows-1250 related glyphs.

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 19, 2018

I think I got a clue, there was a specific version of Windows 3.1 for Middle Eastern countries

image

(link)

So they could use 0xBA both for Baltic script and "Farsi" on different Windows versions.

@ebraminio ebraminio self-assigned this Apr 19, 2018
@ebraminio
Copy link
Collaborator

Now I think I understand the codes around more, hopefully I can fix this eventually.

@ebraminio
Copy link
Collaborator

Found one with OEM_ARABIC_FONT_PAGE on a font collection, its name is SHosseinItalic and it uses 0xf020 to 0xf0f7 range instead 0xf200 to 0xf2fe that most the others (0xB2 and 0xB3 types use).

@ebraminio
Copy link
Collaborator

My findings:

  • Firefox even doesn't work with such fonts also which means ScriptGetCMap() and f28b1c8 weren't enough. IE works however.
  • f28b1c8 was about .fon files with fixed indices of characters. Using dump-fon and seeing what is the order of glyphs that numbers all make sense but I still wonder where Behdad brought that numbers from.
  • arabic_fallback_synthesize_lookup is the place to look for, we should perhaps write another one for each of the font pages.

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 21, 2018

arabic_fallback_synthesize_lookup is the place to look for, we should perhaps write another one for each of the font pages.

No, we should polypill these fonts for that code path so digits could be supported as well. Let me see if I can reach to it.

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 21, 2018

A working rendering of the font.

image

What is logically needed is some sort of cmap rewriting from that PUA codes to Arabic presentation considering the values existing already but that doesn't work when "font-funcs" is not "ot" so I guess we should inject something like below when OS2 has the condition. Does that sound right?

// Arabic-Indic digits
if (unicode == 0x0660) unicode = 0xF230;
if (unicode == 0x0661) unicode = 0xF231;
if (unicode == 0x0662) unicode = 0xF232;
if (unicode == 0x0663) unicode = 0xF233;
if (unicode == 0x0664) unicode = 0xF234;
if (unicode == 0x0665) unicode = 0xF235;
if (unicode == 0x0666) unicode = 0xF236;
if (unicode == 0x0667) unicode = 0xF237;
if (unicode == 0x0668) unicode = 0xF238;
if (unicode == 0x0669) unicode = 0xF239;

// Aleph
if (unicode == 0x0627) unicode = 0xF242;
if (unicode == 0xFE8D) unicode = 0xF245;
if (unicode == 0xFE8E) unicode = 0xF242;

// Beh
if (unicode == 0x0628) unicode = 0xF24C;
if (unicode == 0xFE8F) unicode = 0xF24C;
if (unicode == 0xFE90) unicode = 0xF24B;
if (unicode == 0xFE91) unicode = 0xF249;
if (unicode == 0xFE92) unicode = 0xF24A;

// Teh
if (unicode == 0x062A) unicode = 0xF250;
if (unicode == 0xFE95) unicode = 0xF250;
if (unicode == 0xFE96) unicode = 0xF24F;
if (unicode == 0xFE97) unicode = 0xF24D;
if (unicode == 0xFE98) unicode = 0xF24E;

@ebraminio
Copy link
Collaborator

ebraminio commented Apr 21, 2018

The more appropriate way is to create a preprocess_text_arabic which I drafted something here (too hacky obviously), and that works but the arabic_fallback_synthesize_lookup_single shaping is not. Is that in a right direction?

@ebraminio
Copy link
Collaborator

A more complete list that I extracted using issuing an altered version of "Adobe Blank" on Uniscribe (using some python scripts) to see what happens if I put 0xB3 fsSelection but it is far from prefect.

static hb_codepoint_t
traditional_arabic_windows31_font_page (hb_codepoint_t u)
{
  switch (u)
  {
  // E'rab
  case 0x064B: return 0xF2E7;
  case 0x064C: return 0xF2E8;
  case 0x064D: return 0xF2EB;
  case 0x064E: return 0xF2E4;
  case 0x064F: return 0xF2E5;
  case 0x0650: return 0xF2EA;
  case 0x0651: return 0xF2E9;
  case 0x0652: return 0xF2E6;

  // Digits
  case 0x0660: return 0xF230;
  case 0x0661: return 0xF231;
  case 0x0662: return 0xF232;
  case 0x0663: return 0xF233;
  case 0x0664: return 0xF234;
  case 0x0665: return 0xF235;
  case 0x0666: return 0xF236;
  case 0x0667: return 0xF237;
  case 0x0668: return 0xF238;
  case 0x0669: return 0xF239;

  // Alphabet
  case 0x0622: return 0xF245;
  case 0x0623: return 0xF243;
  case 0x0624: return 0xF2DA;
  case 0x0625: return 0xF247;
  case 0x0626: return 0xF2D9;
  case 0x0627: return 0xF241;
  case 0x0628: return 0xF24C;
  case 0x0629: return 0xF2D1;
  case 0x062A: return 0xF250;
  case 0x062B: return 0xF254;
  case 0x062C: return 0xF258;
  case 0x062D: return 0xF260;
  case 0x062E: return 0xF264;
  case 0x062F: return 0xF265;
  case 0x0630: return 0xF267;
  case 0x0631: return 0xF269;
  case 0x0632: return 0xF26B;
  case 0x0633: return 0xF270;
  case 0x0634: return 0xF274;
  case 0x0635: return 0xF278;
  case 0x0636: return 0xF27E;
  case 0x0637: return 0xF2A2;
  case 0x0638: return 0xF2A3;
  case 0x0639: return 0xF2AA;
  case 0x063A: return 0xF2AE;
  case 0x0641: return 0xF2B2;
  case 0x0642: return 0xF2B6;
  case 0x0643: return 0xF2BA;
  case 0x0644: return 0xF2BE;
  case 0x0645: return 0xF2C2;
  case 0x0646: return 0xF2C6;
  case 0x0647: return 0xF2CA;
  case 0x0648: return 0xF2CB;
  case 0x0649: return 0xF2D4;
  case 0x064A: return 0xF2D0;

  // Presentation form
  case 0xFE81: return 0xF245;
  case 0xFE82: return 0xF246;
  case 0xFE83: return 0xF243;
  case 0xFE85: return 0xF2DA;
  case 0xFE86: return 0xF2DB;
  case 0xFE87: return 0xF247;
  case 0xFE88: return 0xF248;
  case 0xFE89: return 0xF2D9;
  case 0xFE8A: return 0xF2D8;
  case 0xFE8B: return 0xF2D6;
  case 0xFE8C: return 0xF2D7;
  case 0xF8C6: return 0xF241;
  case 0xFE8F: return 0xF24C;
  case 0xFE90: return 0xF24B;
  case 0xFE91: return 0xF249;
  case 0xFE92: return 0xF24A;
  case 0xFE93: return 0xF2D1;
  case 0xFE94: return 0xF2D2;
  case 0xFE95: return 0xF250;
  case 0xFE96: return 0xF24F;
  case 0xFE97: return 0xF24D;
  case 0xFE98: return 0xF24E;
  case 0xFE99: return 0xF254;
  case 0xFE9A: return 0xF253;
  case 0xFE9B: return 0xF251;
  case 0xFE9C: return 0xF252;
  case 0xFE9D: return 0xF258;
  case 0xFE9E: return 0xF257;
  case 0xFE9F: return 0xF255;
  case 0xFEA0: return 0xF256;
  case 0xFEA1: return 0xF260;
  case 0xFEA2: return 0xF25C;
  case 0xFEA3: return 0xF259;
  case 0xFEA4: return 0xF25A;
  case 0xFEA5: return 0xF264;
  case 0xFEA6: return 0xF263;
  case 0xFEA7: return 0xF261;
  case 0xFEA8: return 0xF262;
  case 0xFEA9: return 0xF265;
  case 0xFEAA: return 0xF266;
  case 0xFEAB: return 0xF267;
  case 0xFEAC: return 0xF268;
  case 0xFEAD: return 0xF269;
  case 0xFEAE: return 0xF26A;
  case 0xFEB0: return 0xF26C;
  case 0xFEAF: return 0xF26B;
  case 0xFEB1: return 0xF270;
  case 0xFEB2: return 0xF26F;
  case 0xFEB3: return 0xF26D;
  case 0xFEB4: return 0xF26E;
  case 0xFEB5: return 0xF274;
  case 0xFEB6: return 0xF273;
  case 0xFEB7: return 0xF271;
  case 0xFEB8: return 0xF272;
  case 0xFEB9: return 0xF278;
  case 0xFEBA: return 0xF277;
  case 0xFEBB: return 0xF275;
  case 0xFEBC: return 0xF276;
  case 0xFEBD: return 0xF27E;
  case 0xFEBE: return 0xF27C;
  case 0xFEBF: return 0xF279;
  case 0xFEC0: return 0xF27A;
  case 0xFEC1: return 0xF2A2;
  case 0xFEC2: return 0xF2A1;
  case 0xFEC3: return 0xF27F;
  case 0xFEC4: return 0xF2F1;
  case 0xFEC5: return 0xF2A3;
  case 0xFEC6: return 0xF2A5;
  case 0xFEC7: return 0xF2A3;
  case 0xFEC8: return 0xF2A4;
  case 0xFEC9: return 0xF2AA;
  case 0xFECA: return 0xF2A9;
  case 0xFECB: return 0xF2A7;
  case 0xFECC: return 0xF2A8;
  case 0xFECD: return 0xF2AE;
  case 0xFECE: return 0xF2AD;
  case 0xFECF: return 0xF2AB;
  case 0xFED0: return 0xF2AC;
  case 0xFED1: return 0xF2B2;
  case 0xFED2: return 0xF2B1;
  case 0xFED3: return 0xF2AF;
  case 0xFED4: return 0xF2B0;
  case 0xFED5: return 0xF2B6;
  case 0xFED6: return 0xF2B5;
  case 0xFED7: return 0xF2B3;
  case 0xFED8: return 0xF2B4;
  case 0xFED9: return 0xF2BA;
  case 0xFEDA: return 0xF2B9;
  case 0xFEDB: return 0xF2B7;
  case 0xFEDC: return 0xF2B8;
  case 0xFEDD: return 0xF2BE;
  case 0xFEDE: return 0xF2BD;
  case 0xF8E0: return 0xF2BC;
  case 0xFEE1: return 0xF2C2;
  case 0xFEE2: return 0xF2C1;
  case 0xFEE3: return 0xF2BF;
  case 0xFEE4: return 0xF2C0;
  case 0xFEE5: return 0xF2C6;
  case 0xFEE6: return 0xF2C5;
  case 0xFEE7: return 0xF2C3;
  case 0xFEE8: return 0xF2C4;
  case 0xFEE9: return 0xF2CA;
  case 0xFEEA: return 0xF2C9;
  case 0xFEEB: return 0xF2C7;
  case 0xFEEC: return 0xF2C8;
  case 0xFEED: return 0xF2CB;
  case 0xFEEE: return 0xF2CC;
  case 0xFEF0: return 0xF2D3;
  case 0xFBE8: return 0xF2D4;
  case 0xFBE9: return 0xF2D4;
  case 0xFEF1: return 0xF2D0;
  case 0xFEF2: return 0xF2CF;
  case 0xFEF3: return 0xF2CD;
  case 0xFEF4: return 0xF2CE;
  case 0xFE4B: return 0xF2E7;
  case 0xFE4C: return 0xF2E8;
  case 0xFE4D: return 0xF2EB;
  case 0xFE4E: return 0xF2E4;
  case 0xFE4F: return 0xF2E5;
  case 0xFE50: return 0xF2EA;
  case 0xFE51: return 0xF2E9;
  default: return u;
  }
}

@behdad
Copy link
Member

behdad commented Apr 24, 2018

f28b1c8 was about .fon files with fixed indices of characters. Using dump-fon and seeing what is the order of glyphs that numbers all make sense but I still wonder where Behdad brought that numbers from.

I either checked a table on Wikipedia, or opened the font in fontforge.

@ebraminio
Copy link
Collaborator

Now it has public documentation also,

https://docs.microsoft.com/en-us/typography/legacy/legacy_arabic_fonts

@ebraminio ebraminio removed their assignment Jun 5, 2018
@ebraminio
Copy link
Collaborator

Weren't sure about the needed approach so removed the assignment, perhaps @khaledhosny or @behdad can have another look also.

ebraminio added a commit to ebraminio/harfbuzz that referenced this issue Jul 17, 2018
It was a part of history... but added in an attempt to fix harfbuzz#681
which later found to be unrelated.
ebraminio added a commit to ebraminio/harfbuzz that referenced this issue Jul 17, 2018
It had interesting stuffs like EXE parsing and
big-endian parsers but added in an attempt to find
a solution for harfbuzz#681 which later found not related.
ebraminio added a commit to ebraminio/harfbuzz that referenced this issue Jul 17, 2018
It had interesting stuffs like EXE parsing and
big-endian parsers but added in an attempt to find
a solution for harfbuzz#681 which later found not related.
ebraminio added a commit that referenced this issue Jul 17, 2018
It had interesting stuffs like EXE parsing and
big-endian parsers but added in an attempt to find
a solution for #681 which later found not related.
fanc999 pushed a commit to fanc999/harfbuzz that referenced this issue Jul 25, 2018
It had interesting stuffs like EXE parsing and
big-endian parsers but added in an attempt to find
a solution for harfbuzz#681 which later found not related.
@ebraminio ebraminio self-assigned this Nov 13, 2018
@ebraminio
Copy link
Collaborator

ebraminio commented Nov 13, 2018

I guess no one is more motivated for this than me so am assigning it to myself again. I don't remember clearly what was the issue but IIRC the issue is we don't know how this should be done, should it be done by altering cmap which customized cmap callbacks can't benefit from it, or it should be done by writing another GSUB table for it specifically, similar to what is done for .fon Arabic fonts and duplicating its logic.

@astiob
Copy link
Contributor

astiob commented Mar 25, 2020

What is the current state of this issue?

(I’m one of libass developers; we have an issue on our bug tracker about this that’s mentioned above.)

I see #986 has been merged. It seems it adds code to parse the font page ID, but do I understand correctly that this code isn’t used yet and the font pages themselves are not implemented yet?

In libass, we also look up characters in font charmaps ourselves, in order to do font fallback. I assume that even if/when this gets implemented in HarfBuzz, we’d need to copy the legacy charmap code from HarfBuzz to libass to make it work. Is there a way this could be avoided? We’re currently using FreeType’s FT_Set_Charmap and FT_Get_Char_Index to find fonts that have the raw Unicode characters before passing them to HarfBuzz. Maybe I’m misunderstanding something, but wouldn’t it be best to integrate the legacy Windows 3.1 font pages into FreeType?

@ebraminio
Copy link
Collaborator

ebraminio commented Mar 25, 2020

What is the current state of this issue?

I found an old font CD in home from 2004 with lots of these fonts so it is something personally like to make happen.

image

.

Found it hard to do however as current state of the code but should look at it again. Something like rewriting cmap table or its callbacks needed.

Microsoft has added a page about it on opentype spec also.

Maybe I’m misunderstanding something, but wouldn’t it be best to integrate the legacy Windows 3.1 font pages into FreeType?

That may can happen also specially for clients who are overriding font callbacks with freetype like your case apparently but should happen here separately also anyway.

@behdad
Copy link
Member

behdad commented Apr 16, 2020

So we're truly talking about non-Unicode fonts. I don't like doing this in preprocess. Doing it in OT font-funcs sounds appropriate to me. These already declare a Symbol cmap, which we already detect and do ASCII translation for. Just need to do all non-Unicode mappings there. It sucks, yes, but I prefer it to rewriting in the shaper which would break itemization for anyone using font's get_glyph callback.

@khaledhosny
Copy link
Collaborator Author

I’m trying to implement this in hb-ot-cmap-table.hh by implementing different get_glyph_from_symbol() based on the font page setting in OS2 table, but hb-ot-os2-table.hh includes hb-ot-cmap-table.hh, and including hb-ot-os2-table.hh in hb-ot-cmap-table.hh causes a build error:

In file included from ../src/hb-face.cc:35:
In file included from ../src/hb-ot-cmap-table.hh:30:
../src/hb-ot-os2-table.hh:182:11: error: no member named 'cmap' in namespace 'OT'
      OT::cmap::accelerator_t cmap;
      ~~~~^
1 error generated.
ninja: build stopped: subcommand failed.
diff --git a/src/hb-ot-cmap-table.hh b/src/hb-ot-cmap-table.hh
index add21f115..ee1ecd461 100644
--- a/src/hb-ot-cmap-table.hh
+++ b/src/hb-ot-cmap-table.hh
@@ -27,6 +27,7 @@
 #ifndef HB_OT_CMAP_TABLE_HH
 #define HB_OT_CMAP_TABLE_HH

+#include "hb-ot-os2-table.hh"
 #include "hb-open-type.hh"
 #include "hb-set.hh"

@@ -1524,7 +1525,15 @@ struct cmap

       this->get_glyph_data = subtable;
       if (unlikely (symbol))
-       this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+      {
+       auto font_page = face->table.OS2->get_font_page ();
+       if (font_page == OS2::font_page_t::FONT_PAGE_SIMP_ARABIC)
+         this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+       else if (font_page == OS2::font_page_t::FONT_PAGE_TRAD_ARABIC)
+         this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+       else
+         this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+      }
       else
       {
        switch (subtable->u.format) {

@khaledhosny
Copy link
Collaborator Author

I’m now not sure that remapping to presentation forms is best approach since these fonts contain ligatures some of them are not even in Unicode (and even if they are, the Arabic fallback shaper does not enable any ligatures other than lam-alef).

@khaledhosny
Copy link
Collaborator Author

Thinking about this more, I think in needs to be handled in 2 places:

  1. remapping PUA to Unicode in cmap table,
  2. synthesize GSUB table in the Arabic shaper (the synthesized table will need to include Arabic positional features as well as rlig, liga and mset features).

@behdad
Copy link
Member

behdad commented Jul 13, 2021

Thinking about this more, I think in needs to be handled in 2 places:

  1. remapping PUA to Unicode in cmap table,
  2. synthesize GSUB table in the Arabic shaper (the synthesized table will need to include Arabic positional features as well as rlig, liga and mset features).

Exactly. Thanks.

@behdad
Copy link
Member

behdad commented Jul 13, 2021

I’m trying to implement this in hb-ot-cmap-table.hh by implementing different get_glyph_from_symbol() based on the font page setting in OS2 table, but hb-ot-os2-table.hh includes hb-ot-cmap-table.hh, and including hb-ot-os2-table.hh in hb-ot-cmap-table.hh causes a build error:

The OS/2 dependency on cmap should be removed. The cmap-related logic in os2::subset is really contrived. I cannot follow it. @garretrieger can you figure out why that block needs to look into cmap directly and cannot use information from the plan?

Okay I think I understand why it's doing it... It should be rewritten though, to access the subsetted cmap table and set values based on it, instead of recalculating what would become subsetted cmap. Anyway; we can fix the dependency for now by moving that code to an out-of-class method implementation.

@behdad
Copy link
Member

behdad commented Jul 13, 2021

+       auto font_page = face->table.OS2->get_font_page ();
+       if (font_page == OS2::font_page_t::FONT_PAGE_SIMP_ARABIC)

Use a switch?

@khaledhosny
Copy link
Collaborator Author

Use a switch?

I used a switch then -Wswitch-enum got me.

@behdad
Copy link
Member

behdad commented Jul 14, 2021

Use a switch?

I used a switch then -Wswitch-enum got me.

The solution I like is a cast to (int) of the enumerant.

@behdad
Copy link
Member

behdad commented Jul 14, 2021

The solution I like is a cast to (int) of the enumerant.

$ git grep 'switch ((int)'
gen-arabic-joining-list.py:     print ("  switch ((int) script)")
hb-ot-shape-complex-arabic-joining-list.hh:  switch ((int) script)
hb-ot-shape-complex-indic.hh:  switch ((int) side)
hb-ot-shape-complex-khmer.hh:    switch ((int) pos)
hb-ot-shape-complex-myanmar.hh:    switch ((int) pos)

@behdad
Copy link
Member

behdad commented Jul 14, 2021

More recently, using unsigned instead:

$ git grep 'switch ((unsigned'
gen-vowel-constraints.py:print ('  switch ((unsigned) buffer->props.script)')
hb-buffer-serialize.cc:  switch ((unsigned) format)
hb-ot-metrics.cc:  switch ((unsigned) metrics_tag)
hb-ot-metrics.cc:  switch ((unsigned) metrics_tag)
hb-ot-shape-complex-vowel-constraints.cc:  switch ((unsigned) buffer->props.script)
hb-style.cc:  switch ((unsigned) style_tag)

@khaledhosny
Copy link
Collaborator Author

OK, I have the basic re-mapping in cmap done (it works only with ot for functions obviously, but lets keep that for later).

Now I need to handle ligatures, and I think I’m not going to do mset, our fallback positioning is actually doing a better job. For ligatures I’m inclined to put everything in rlig and keep things simple, it is a fallback for legacy fonts after all and fine control is low priority.

looks like hb-ot-shape-complex-arabic-fallback.hh’s arabic_fallback_synthesize_lookup_ligature() is what I need, but it uses static array for ligatures and I don’t see an easy way of making it handle different ligatures per font, so what should I do, duplicate it (I’ll probably need to copies), or templetize it? (I’m still trying to get away without learning C++ templates).

@behdad
Copy link
Member

behdad commented Jul 14, 2021

OK, I have the basic re-mapping in cmap done (it works only with ot for functions obviously, but lets keep that for later).

Now I need to handle ligatures, and I think I’m not going to do mset, our fallback positioning is actually doing a better job. For ligatures I’m inclined to put everything in rlig and keep things simple, it is a fallback for legacy fonts after all and fine control is low priority.

Sounds good.

looks like hb-ot-shape-complex-arabic-fallback.hh’s arabic_fallback_synthesize_lookup_ligature() is what I need, but it uses static array for ligatures and I don’t see an easy way of making it handle different ligatures per font, so what should I do, duplicate it (I’ll probably need to copies), or templetize it? (I’m still trying to get away without learning C++ templates).

Well, other option is to handle these like the win1256 stuff, using a fixed handcoded GSUB table.

@khaledhosny
Copy link
Collaborator Author

Well, other option is to handle these like the win1256 stuff, using a fixed handcoded GSUB table.

But this needs to use glyph IDs and I’m not sure these fonts are guaranteed to have a fixed glyph order.

@behdad
Copy link
Member

behdad commented Jul 15, 2021

But this needs to use glyph IDs and I’m not sure these fonts are guaranteed to have a fixed glyph order.

Right...

Just encoding more ligatures in the existing ligature table would be fine with me.

Do you have a repertoire?

@khaledhosny khaledhosny linked a pull request Jul 18, 2021 that will close this issue
5 tasks
@ebraminio ebraminio removed their assignment Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants