-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "Traditional Arabic Windows 3.1 font page" #681
Comments
I like to do this, but don't think we'd have ANY way to detect these. |
(Edited by Ebrahim: this comment is related to #992, ignore it for this issue) It should be possible to detect them. ssee1256.fon declares its |
(Edited by Ebrahim: this comment is related to #992, ignore it for this issue) Right. But we don't support .fon files. Client font-funcs should support them before we can shape anything, and they should synthesize the cmap. I'm open to coming up with a way to implement .fon and Type 1 in HarfBuzz itself if it's not too intrusive and does not distract from OpenType stuff. |
(Edited by Ebrahim: this comment is related to #992, ignore it for this issue) Does the implementation have to be cross platform (and afterward support big endian machines also)? If so I guess we will have so fun with parsing EXE resources (as FON is just an alias, IIUC) and then FNT, writing a limited set of little endian int parsers. And do we need just to implement something equivalent to |
There is two fonts with the similar to "MS Sans Serif" name, one "MS Sans Serif" with "sserife.fon" file name which is ".fon" file and "Microsoft Sans Serif" with "micross.ttf" file name. Which one is fixed here and referred on here @jfkthame? The bitmap one or the sfnt one? "micross.ttf" is not getting passed on #986 check AFAICS, maybe I am checking a newer version and not the version used to be? And maybe this is a different bug and #991 is headed on a wrong direction? |
Ok, I guess I am now more confident that we were talking about two separate issues, lets separate them, #992 for |
I've found just A LOT of them here This is a survey of them, fonts.txt Note the number of fonts HarfBuzz can't render. Of them 468 ones are "Traditional Arabic Windows 3.1 font page", 13 ones are "Simplified Arabic Windows 3.1 font page" and 3 with 0xEE00 (unknown font page, "Phyllis ATT Italic.TTF", "LUCASIT.TTF" and "SIGNETRO.TTF") |
Aha, now I see, as originally was stated by Khaled and David, this is about Windows-1256 and probably can use some of the things as f28b1c8 but fixing this completely is not limited to that. The interesting thing is OS2's font page list can be completed with fonts dfCharSet as was noted by David but even more interesting, we have found some of the dfCharSet values were not available on the link! |
Or maybe not, 0xBA refers to Simplified Farsi Windows 3.1 font page on OS2 font pages list but to BALTIC_CHARSET on dfCharSet but at least I guess the fonts I found with 0xEE00 were referring to Windows-1250. |
Yes, that fonts had only Windows-1250 related glyphs. |
I think I got a clue, there was a specific version of Windows 3.1 for Middle Eastern countries (link) So they could use 0xBA both for Baltic script and "Farsi" on different Windows versions. |
Now I think I understand the codes around more, hopefully I can fix this eventually. |
Found one with OEM_ARABIC_FONT_PAGE on a font collection, its name is |
My findings:
|
No, we should polypill these fonts for that code path so digits could be supported as well. Let me see if I can reach to it. |
A working rendering of the font. What is logically needed is some sort of
|
The more appropriate way is to create a |
A more complete list that I extracted using issuing an altered version of "Adobe Blank" on Uniscribe (using some python scripts) to see what happens if I put 0xB3 fsSelection but it is far from prefect.
|
I either checked a table on Wikipedia, or opened the font in fontforge. |
Now it has public documentation also, https://docs.microsoft.com/en-us/typography/legacy/legacy_arabic_fonts |
Weren't sure about the needed approach so removed the assignment, perhaps @khaledhosny or @behdad can have another look also. |
It was a part of history... but added in an attempt to fix harfbuzz#681 which later found to be unrelated.
It had interesting stuffs like EXE parsing and big-endian parsers but added in an attempt to find a solution for harfbuzz#681 which later found not related.
It had interesting stuffs like EXE parsing and big-endian parsers but added in an attempt to find a solution for harfbuzz#681 which later found not related.
It had interesting stuffs like EXE parsing and big-endian parsers but added in an attempt to find a solution for #681 which later found not related.
It had interesting stuffs like EXE parsing and big-endian parsers but added in an attempt to find a solution for harfbuzz#681 which later found not related.
I guess no one is more motivated for this than me so am assigning it to myself again. I don't remember clearly what was the issue but IIRC the issue is we don't know how this should be done, should it be done by altering |
What is the current state of this issue? (I’m one of libass developers; we have an issue on our bug tracker about this that’s mentioned above.) I see #986 has been merged. It seems it adds code to parse the font page ID, but do I understand correctly that this code isn’t used yet and the font pages themselves are not implemented yet? In libass, we also look up characters in font charmaps ourselves, in order to do font fallback. I assume that even if/when this gets implemented in HarfBuzz, we’d need to copy the legacy charmap code from HarfBuzz to libass to make it work. Is there a way this could be avoided? We’re currently using FreeType’s |
I found an old font CD in home from 2004 with lots of these fonts so it is something personally like to make happen. Found it hard to do however as current state of the code but should look at it again. Something like rewriting cmap table or its callbacks needed. Microsoft has added a page about it on opentype spec also.
That may can happen also specially for clients who are overriding font callbacks with freetype like your case apparently but should happen here separately also anyway. |
So we're truly talking about non-Unicode fonts. I don't like doing this in preprocess. Doing it in OT font-funcs sounds appropriate to me. These already declare a Symbol cmap, which we already detect and do ASCII translation for. Just need to do all non-Unicode mappings there. It sucks, yes, but I prefer it to rewriting in the shaper which would break itemization for anyone using font's get_glyph callback. |
I’m trying to implement this in
diff --git a/src/hb-ot-cmap-table.hh b/src/hb-ot-cmap-table.hh
index add21f115..ee1ecd461 100644
--- a/src/hb-ot-cmap-table.hh
+++ b/src/hb-ot-cmap-table.hh
@@ -27,6 +27,7 @@
#ifndef HB_OT_CMAP_TABLE_HH
#define HB_OT_CMAP_TABLE_HH
+#include "hb-ot-os2-table.hh"
#include "hb-open-type.hh"
#include "hb-set.hh"
@@ -1524,7 +1525,15 @@ struct cmap
this->get_glyph_data = subtable;
if (unlikely (symbol))
- this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+ {
+ auto font_page = face->table.OS2->get_font_page ();
+ if (font_page == OS2::font_page_t::FONT_PAGE_SIMP_ARABIC)
+ this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+ else if (font_page == OS2::font_page_t::FONT_PAGE_TRAD_ARABIC)
+ this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+ else
+ this->get_glyph_funcZ = get_glyph_from_symbol<CmapSubtable>;
+ }
else
{
switch (subtable->u.format) { |
I’m now not sure that remapping to presentation forms is best approach since these fonts contain ligatures some of them are not even in Unicode (and even if they are, the Arabic fallback shaper does not enable any ligatures other than lam-alef). |
Thinking about this more, I think in needs to be handled in 2 places:
|
Exactly. Thanks. |
The OS/2 dependency on cmap should be removed. The cmap-related logic in Okay I think I understand why it's doing it... It should be rewritten though, to access the subsetted cmap table and set values based on it, instead of recalculating what would become subsetted cmap. Anyway; we can fix the dependency for now by moving that code to an out-of-class method implementation. |
Use a switch? |
I used a switch then -Wswitch-enum got me. |
The solution I like is a cast to |
|
More recently, using
|
OK, I have the basic re-mapping in Now I need to handle ligatures, and I think I’m not going to do looks like |
Sounds good.
Well, other option is to handle these like the win1256 stuff, using a fixed handcoded GSUB table. |
But this needs to use glyph IDs and I’m not sure these fonts are guaranteed to have a fixed glyph order. |
Right... Just encoding more ligatures in the existing ligature table would be fine with me. Do you have a repertoire? |
(This comment is edited by Ebrahim) Currently it needs Uniscribe support as the client needs to use
ScriptGetCMap()
or so since the fonts don’t have an Arabiccmap
subtable.However these fonts are still rather popular among Arabic users and there are hundreds of them floating around the web and people often complain about them not working on HarfBuzz-using applications (for example libass/libass#292).
I don’t know if we can just do PUA shaping similar to Thai or if there is someway to identify these fonts like the Windows API do.
The text was updated successfully, but these errors were encountered: