Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance scripting interface to manipulate the forward/backward encoding vector. #3085

Open
HinTak opened this issue Jun 9, 2017 · 26 comments

Comments

@HinTak
Copy link

HinTak commented Jun 9, 2017

It is a bit daft, really, even the documentation says that, if more than one code point encodes to the same glyph, querying the glyph for encoding with GlyphInfo("Encoding") (or the python interface) would randomly return either of them. This cause problems when re-encoding, as all except one of them is lost.

This is the 2nd problem with contributing to #3080 .

fontforge needs to provide direct scripting access to the forward and backward encoding vector.

You need that to manipulating the Adobe Source CJK / Google Noto CJK fonts properly. They have about 400 glyphs having multiple encoding. I can work around #3080 with an external script (either parsing ftdump -V with perl or using freetype-py with python) but I'd prefer not to.

@HinTak
Copy link
Author

HinTak commented Jun 9, 2017

to workaround #3080 , I'll just use freetype-py , assuming eventually fontforge's own scripting interface will provide equivalent functionality.

@HinTak
Copy link
Author

HinTak commented Jun 9, 2017

It would be nice if MultipleEncodingsToReferences() does the right thing - it currently does not.

@HinTak
Copy link
Author

HinTak commented Jun 9, 2017

My freetype-py script to fix fontforge's encoding problem is up at

https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/subfonts-script-generate.py

The 20 lines in freetype-py should be possible with fontforge's scripting interface.

@jtanx
Copy link
Contributor

jtanx commented Jun 9, 2017

@HinTak thanks for reporting, but please refrain from cross-posting the same message across multiple issues. If they are all linked, it should be sufficient to link back to a single issue (this one?) and continue the discussion there (this makes it easier to track what needs to be done).

@HinTak
Copy link
Author

HinTak commented Jun 10, 2017

Well, there seems to be a number of people and bug reports saying that fontforge cannot process Source Sans properly. Most of them are a bit unclear about what they think is wrong. In my case, I found out what's wrong, and offered my workaround. It works for me. It is really up to these other people to decide whether it fixes their problem too, or offers some insights to where their problems is.

I am okay if these other people want to continue here, but it is their decision. For my part, all I can offer is - this is what I think the problem is, this is my answer. Have a look and see if it works for you too, or can be adapted to work in your case. I don't decide that they should continue here, they do.

@ghost
Copy link

ghost commented Jun 10, 2017

Well, there seems to be a number of people and bug reports

All the more reason not to file the same report ten times with cut and paste.

@HinTak
Copy link
Author

HinTak commented Jun 10, 2017

? I don't decide whether others' problems are the same as mine. They decide. I have my answer. Why are you two giving me a hard time trying to help?

At 850+ bug reports and still counting, you are not going to decrease that number if everytime somebody tries to offer an answer to some of them, and you two come along and say don't bother.

@ghost
Copy link

ghost commented Jun 10, 2017

Because I've received 35 email notifications from you in the last day and a half, and most of them are duplicates of just one complaint.

@HinTak
Copy link
Author

HinTak commented Jun 10, 2017

Please feel free not to read, if they don't apply to you.

@ghost
Copy link

ghost commented Jun 10, 2017

That's the classic excuse of a spammer, and from now on, I will treat communications from you as spam.

@HinTak
Copy link
Author

HinTak commented Jun 10, 2017

There are those who may find my answer useful. For those who don't, I don't know why they even bother to respond. WTF.

@jtanx
Copy link
Contributor

jtanx commented Jun 10, 2017

For those who don't, I don't know why they even bother to respond. WTF.

@HinTak, copy pasting the same post across multiple issues is not necessary, and as mentioned above, creates a lot of noise for those watching these issues. If they are all the same issue, then make one issue and continue discussion there. You can just mention all affected (or related) issues from the issue that you have made.

I am okay if these other people want to continue here, but it is their decision.

No, if it is all the same issue, then ideally all the other issues should be closed as duplicates, because otherwise it's just making it more difficult than it should be to follow what's happening.

@JoesCat
Copy link
Contributor

JoesCat commented Jun 11, 2017

Hi @HinTak,
Thanks for posting the errors and helping post solutions too - that is helpful.
If you have other solutions and discover other problems, post those too, but if you see existing errors in the list, it will be preferable to post attached to the existing errors if the errors/solutions match - that will help diagnose existing problems further.
In terms of WTF, you'll need to look at FontForge patching and fixing as a marathon, and not a sprint. Sometimes there are some bugs that appear easy at the surface level, but further diagnosing shows one thing depends on another, which depends on another, so sometimes some fixes may be quick, others will take a little time to dig-in and resolve. 850+ bugs may appear like a lot, but eventually it will whittle-down.
If you have skills and want to help in one way or another, you are welcome. Reading the thread above, I would hesitate on chopping-out parts of the program simply because it's not working (now), since some/many things have a purpose for being created in the first place, in these cases, it's a bit of patience, trying to figure-out the intent, and try to fix/improve it if we can, other things sometimes will have to wait for another part to be resolved, but you'll have to see this as a marathon - not a sprint.
Hope this helps, sometimes some things are misunderstood, it happens.
...but once again, thanks for your input.

@HinTak
Copy link
Author

HinTak commented Jun 11, 2017

? Not really sure I want to continue in this thread. Look, it started just over a week ago when somebody else reported a problem of fontforge mis-converting Source CJK for TeXLive's use.

I looked into the issue, filed 4 bugs, and finally offered my workaround to convert Source CJK correctly. I don't have any vested interest in fontforge, and I wish I don't need to write a python script using freetype's capability to generate a massive fontforge native script to work around fontforge's own problem, but such is the case.

My python script to generate a massive fontforge native script might be the answer to somebody else problems with Source CJK, and it might not. But it is offered as is, and yes, I am treating this short adventure in fontforge-land as a sprint. I don't want to run a marathon. Forgive me for trying to offer my workaround for others' problem.

@HinTak
Copy link
Author

HinTak commented Jun 11, 2017

See the entire May/June traffic ( http://lists.nongnu.org/archive/html/cjk-list/ ) . Every post in the archive for the two months is about this issue, btw, despite the branching and apparent change of the "subject".

@khaledhosny
Copy link
Contributor

khaledhosny commented Jun 12, 2017

FontForge’s own Python scripting interface handles multiple encoded glyphs rather well (glyph.unicode for first encoding value, and glyph.altuni for any others), you may as well use it instead of the “native” scripting interface.

@HinTak
Copy link
Author

HinTak commented Jun 12, 2017

Thanks. I have seen the altuni thing in the C code side, when I was looking at the source code of MultipleEncodingsToReferences() . As I mentioned, MultipleEncodingsToReferences() does not work correctly. It may or may not do what needs to be done. I 'll let somebody else investigate :-)

The font conversion for TeXLive only needs to be done by the TexLive package maintainer(s) , one off , not the end users - and I already finished the conversion and uploaded the corrected type 1 fonts, so I am not spending any more time on this.

@HinTak
Copy link
Author

HinTak commented Jul 3, 2017

I have re-written the earlier "freetype-py script to generate a fontforge native script" attempt (subfonts-script-generate.py) into a a combo python script which uses both freetype's and fontforge's python API and do both steps in one:
https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/subfonts.py

I doubt that it can be done without correcting fontforge at the C level, but @khaledhosny , do you want to have a go at replace the two blocks of freetype-py dependent python code with fontforge's? It is marked as such for your convenience :-).

@khaledhosny
Copy link
Contributor

khaledhosny commented Jul 4, 2017

Untested:

--- subfonts.py	2017-07-04 02:47:27.523883836 +0200
+++ subfonts.py	2017-07-04 03:03:21.950776378 +0200
@@ -10,7 +10,6 @@
 #  Must move /usr/share/fontforge/Adobe-Identity-0.cidmap!
 # -----------------------------------------------------------------------------
 import fontforge
-from freetype import Face
 import time
 import locale
 
@@ -22,20 +21,6 @@
     print( "usage: [fontforge -script] ", sys.argv[0], " master-font name-stem SFD-file" )
     exit(1)
 
-# Use freetype-py to remember the cmap:
-face = Face(sys.argv[1])
-face.set_charmap( face.charmap )
-reverse_lookup = {}
-charcode, gindex = face.get_first_char()
-while ( gindex ):
-    if ( gindex in reverse_lookup.keys() ):
-        reverse_lookup[gindex].append( charcode )
-    else:
-        reverse_lookup[gindex] = [charcode]
-    charcode, gindex = face.get_next_char( charcode, gindex )
-del face
-# first block of freetype-py code ends.
-
 print("Loading ", sys.argv[1], "...")
 font = fontforge.open(sys.argv[1])
 
@@ -44,24 +29,25 @@
 
 font.reencode("ucs4")
 
-# 2nd block of freetype-py code:
-for gindex in reverse_lookup.keys():
-    if ( len(reverse_lookup[gindex]) > 1 ):
-        for x in range( len(reverse_lookup[gindex]) - 1 ):
-            font.selection.select( reverse_lookup[gindex][-1] )
-            if ( not (font[reverse_lookup[gindex][-1]]).isWorthOutputting() ):
+for glyph in font.glyphs():
+    if glyph.unicode < 0:
+        continue
+    if glyph.altuni:
+        unicodes = [u[0] for u in glyph.altuni if u[1] == -1]
+        for uni in unicodes:
+            if not glyph.isWorthOutputting():
                 print( 'Source Empty!' )
+            font.selection.select(glyph.glyphname)
             font.copy()
-            font.selection.select( reverse_lookup[gindex][x] )
+            font.selection.select(uni)
             try:
-                font[reverse_lookup[gindex][x]]
+                font[uni]
             except TypeError:
                 # expect this!
                 pass
             else:
                 print( 'Destination Full!' )
             font.paste()
-# 2nd block of freetype-py code ends.
 
 locale.setlocale(locale.LC_ALL, "C")
 copyright = font.copyright + "\n\nSubfont version " + time.strftime("%F", time.gmtime()) + "."

@HinTak
Copy link
Author

HinTak commented Jul 4, 2017 via email

@HinTak
Copy link
Author

HinTak commented Jul 4, 2017

Just FYI @khaledhosny , your suggestion does not work. I also added this line:

--- a/doc/pdfhowto/subfonts.py
+++ b/doc/pdfhowto/subfonts.py
@@ -34,6 +34,7 @@ for glyph in font.glyphs():
         continue
     if glyph.altuni:
         unicodes = [u[0] for u in glyph.altuni if u[1] == -1]
+        print(glyph.unicode, unicodes)
         for uni in unicodes:
             if not glyph.isWorthOutputting():
                 print( 'Source Empty!' )

so that I can watch what it is doing - it does not print anything. If it is workinng correctly, it should print about 400 lines against the Noto CJK fonts (to be exact, about 360 of the single altuni type, and another 15 or so of the double altuni type). I think it might have worked if it were also written in two blocks like mine did, one before the font.reencode("ucs4") line, one after. ATM I'd prefer not to spend more time on a more or less curiosity, since what it is meant to do (converting into another format) only needs to be done rarely as a one-off. The conversion takes about 25 minutes on my hardware, so it is just 1/2 hour CPU time thrown away, not a big deal; I wasn't expecting it would, so not too disappointed.

@khaledhosny
Copy link
Contributor

The altuni entries in Noto Sans CJK contain variation selectors (the second element in the tuple is not -1), so they are to be activated only for char+VS sequences. If you want to use them then drop the if u[1] == -1 part. Now I’m really confused about what you are trying to do here.

@HinTak
Copy link
Author

HinTak commented Jul 5, 2017 via email

@ahyangyi
Copy link
Contributor

To be honest, this bug has been lurking in Fontforge for years and received numerous reports. Hintak seems to me as the first bug reporter that actually tried to dig the root cause of this problem as well as presenting a workaround.

Judged by the reaction, this community does not look like they welcome constructive suggestion.

Of course I still have hope that someone with actual C-level understanding of fontforge could come across this issue and put in a fix. Please, people. Stop judging others by their behaviors when the glaring bug is there unfixed for years.

@frank-trampe
Copy link
Contributor

@HinTak, could you perhaps propose a specific interface as a starting point? I think that the implementation would not be terribly difficult once we know precisely what we want.

@skef
Copy link
Contributor

skef commented Feb 2, 2020

850+ bugs may appear like a lot, but eventually it will whittle-down.

sigh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants