Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8285308: Win: Japanese logical fonts are drawn with wrong size #8329

Closed
wants to merge 3 commits into from

Conversation

toshiona
Copy link
Contributor

@toshiona toshiona commented Apr 21, 2022

Japanese logical fonts are drawn with wrong size since Java 18.
It's triggered by JEP 400, UTF-8 by Default. sun.awt.FontConfiguration (and sun.awt.windows.WFontConfiguration) seems to expect the native encoding instead of the default encoding. This patch changes to use native encoding.

Tested: jdk_desktop on Windows, Linux, and macOS


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 reviews required, with at least 1 reviewer)

Issue

  • JDK-8285308: Win: Japanese logical fonts are drawn with wrong size

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8329/head:pull/8329
$ git checkout pull/8329

Update a local copy of the PR:
$ git checkout pull/8329
$ git pull https://git.openjdk.java.net/jdk pull/8329/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8329

View PR using the GUI difftool:
$ git pr show -t 8329

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8329.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 21, 2022

👋 Welcome back tnakamura! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 21, 2022

@toshiona The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the client client-libs-dev@openjdk.org label Apr 21, 2022
@toshiona toshiona marked this pull request as ready for review April 22, 2022 00:28
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 22, 2022
@mlbridge
Copy link

mlbridge bot commented Apr 22, 2022

Webrevs

@prrace
Copy link
Contributor

prrace commented Apr 22, 2022

Hmm. I am surprised that the glyphs are actually bigger. Did you come to understand the exact mechanism (steps)
by which this triggers the wrong size ? Without that explanation it isn't apparent to me we've understood the problem
even if this fixes it.

And whilst you tested linux & mac it is a shared code change so it is another reason to understand ..

And are you sure you can't write a test ? Even something that uses heuristics around the metrics ?

@naotoj
Copy link
Member

naotoj commented Apr 22, 2022

With JEP 400, the default charset may differ from the platform encoding, which is the case like this on Windows (mac & Linux won't be affected as they already use UTF-8 natively) and the mitigation here looks fine to me. But the real issue here to me is why the font size gets bigger depending on the encoding (windows-31j vs. UTF-8)?

@toshiona
Copy link
Contributor Author

Thank you for the comments, @prrace @naotoj

The bigger fonts are caused by the behavior of RichEdit component. I was able to recreate the similar behavior with a native application. I attached the code and the screen shot to JBS. RichEdit seems to use fallback font if English font (Arial, for example) was set for Japanese text. It could show Japanese, but its size was not perfect. So, we need to set a proper font.

NG case (Java18): "Arial 9pt" with ANSI_CHARSET was used.
OK case (Java17): "MS Gothic 9pt" with SHIFTJIS_CHARSET was used.

And whilst you tested linux & mac it is a shared code change so it is another reason to understand ..

OK, I moved the fix to Windows part. I need to change setEncoding method to protected, because "encoding" value was used in the constructor.

And are you sure you can't write a test ? Even something that uses heuristics around the metrics ?

Sorry, I tried some ways, but I couldn't. It tightly binds with OS settings.

A pre-submit test was failed once, but passed after rerun.

@prrace
Copy link
Contributor

prrace commented Apr 26, 2022

Hmm. I'm not sure I'm seeing what you see in the native app.
There the Latin font is larger than the JDK case, and the Japanese font is smaller than the JDK case,
and they end up about the same size as each other.
Also you have to remember if you select "8 pt" in a windows native app, that (at 100% scale) is at 96dpi, not 72dpi.
So that's 96/72 times the size of the Java 8 pt font. 8 * 96 / 72 will round up to 11 pixels in native, not the 8 pixels in Java.

Also JDK - in theory - controls the exact fonts that are used even in a TextField.

I'd like to see this resolved but I'm still not feeling like I have a sufficient understanding.

@toshiona
Copy link
Contributor Author

@prrace
Sorry for your confusion. Please allow me to explain with 30-point case. That's too small Japanese characters than alphabets.
(With 8-point case, the native application clipped upper side, but I couldn't find the reason.)

Java code:
t.setFont(new Font(Font.SERIF, Font.PLAIN, 30));
Native code:
CreateFont(-30, 0, 0, 0, 0, 0, 0, 0, ANSI_CHARSET, 0, 0, 0, 0, _TEXT("Times New Roman"));

These codes displayed the exact same glyphs. Screen shot was attached in JBS.

Also JDK - in theory - controls the exact fonts that are used even in a TextField.

Well, I think TextField sets only one font (HFONT) via AwtTextComponent::SetFont(AwtFont* font).
spyxx command recorded only that WM_SETFONT event.

So, we need to pick up a proper font from fontconfig file.
Now, unexpected encoding was set and unexpected font was picked up.

Expected native call:
CreateFont(-30, 0, 0, 0, 0, 0, 0, 0, SHIFTJIS_CHARSET, 0, 0, 0, 0, _TEXT("MS Mincho"));

@naotoj
Copy link
Member

naotoj commented Apr 28, 2022

The following diff seems to choose the right font:

--- a/src/java.desktop/windows/data/fontconfig/fontconfig.properties
+++ b/src/java.desktop/windows/data/fontconfig/fontconfig.properties
@@ -230,7 +230,7 @@ sequence.dialog.x-MS950-HKSCS-XP=alphabetic,chinese-ms950,ch
inese-hkscs,dingbats
 sequence.dialoginput.x-MS950-HKSCS-XP=alphabetic,chinese-ms950,chinese-hkscs,di
ngbats,symbol,chinese-ms950-extb

 sequence.allfonts.UTF-8.hi=alphabetic/1252,devanagari,dingbats,symbol
-sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol
+sequence.allfonts.UTF-8.ja=japanese,alphabetic,dingbats,symbol

 sequence.allfonts.windows-1255=hebrew,alphabetic/1252,dingbats,symbol

This diff intends to choose the japanese set before the alphabetic set, in case for ja locale in UTF-8 file encoding. I am not saying this is the right fix, but could be a starting point.

@prrace
Copy link
Contributor

prrace commented Apr 29, 2022

The following diff seems to choose the right font:

--- a/src/java.desktop/windows/data/fontconfig/fontconfig.properties
+++ b/src/java.desktop/windows/data/fontconfig/fontconfig.properties
@@ -230,7 +230,7 @@ sequence.dialog.x-MS950-HKSCS-XP=alphabetic,chinese-ms950,ch
inese-hkscs,dingbats
 sequence.dialoginput.x-MS950-HKSCS-XP=alphabetic,chinese-ms950,chinese-hkscs,di
ngbats,symbol,chinese-ms950-extb

 sequence.allfonts.UTF-8.hi=alphabetic/1252,devanagari,dingbats,symbol
-sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol
+sequence.allfonts.UTF-8.ja=japanese,alphabetic,dingbats,symbol

 sequence.allfonts.windows-1255=hebrew,alphabetic/1252,dingbats,symbol

This diff intends to choose the japanese set before the alphabetic set, in case for ja locale in UTF-8 file encoding. I am not saying this is the right fix, but could be a starting point.

I've been looking at that exact line since I think it is what we'd use in the new default locale
and also thinking about the "allfonts" issue we've seen with Korean.
And which font is "japanese" in this case ?
Maybe we want to fill out the UTF8. based locales in this file now that is the default

I suspect it is a lot safer to do what Toshio is doing but I just want to understand the mechanism.
I didn't have all the JA fonts installed on my system and I've been trying but they don't seem to have
appeared yet, so specifically I don't have MS Mincho, so there's not much I can test yet.
It migh

@toshiona
Copy link
Contributor Author

toshiona commented May 2, 2022

Hi @prrace
The key variable is WFontConfiguration:textInputCharset.

In Japanese until Java 17, textInputCharset was "SHIFTJIS_CHARSET", which was set in WFontConfiguration:initTables().
WFontConfiguration:getTextComponentFontName(), which used the value, called findFontWithCharset(,"SHIFTJIS_CHARSET"), and it returned "MS Gothic,SHIFTJIS_CHARSET" or "MS Mincho,SHIFTJIS_CHARSET" by specified logical font name.

sequence.serif.windows-31j=alphabetic,japanese,dingbats,symbol

In Japanese after Java 18, textInputCharset is "DEFAULT_CHARSET".
findFontWithCharset(,"DEFAULT_CHARSET") retuned null, and the first font in the target line of fontconfig was used. It's alphabetic (mapped "Times New Roman,ANSI_CHARSET" or "Arial,ANSI_CHARSET").

sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol

When the first one was changed to "japanese", it was used (mapped "MS Gothic,SHIFTJIS_CHARSET" or "MS Mincho,SHIFTJIS_CHARSET").

sequence.allfonts.UTF-8.ja=japanese,alphabetic,dingbats,symbol

So, I tried to modify textInputCharset value only. It almost worked. However, that couldn't pick up the proper line from fontconfig file for Chinese, because it couldn't tell GBK and GB18030 without the native encoding.
That is, the following two lines have the same key.

sequence.allfonts.UTF-8.zh.CN=alphabetic,chinese-ms936,dingbats,symbol,chinese-ms936-extb
sequence.allfonts.UTF-8.zh.CN=alphabetic,chinese-gb18030,dingbats,symbol,chinese-gb18030-extb

@prrace
Copy link
Contributor

prrace commented May 2, 2022

@naotoj
So why can't I reproduce this ? I installed the Japanese (well all international) fonts and ran the test in the bug
report using
/jdk/bin/java -Duser.language=ja -Duser.country=JP fonttest
locale=ja_JP
cs=UTF-8

the print statements are mine that I added
System.out.println("locale="+Locale.getDefault());
System.out.println("cs="+Charset.defaultCharset());

I supposed this is all I need to do to make the JDK use the right settings ?

@naotoj
Copy link
Member

naotoj commented May 2, 2022

I believe this is only reproducible on Japanese Windows, where the system locale is set to Japanese. You can emulate this by
ControlPanel -> Region -> "Administrative" tab -> "Change system locale..." button -> "Current system locale" to Japanese(Japan) on an English Windows.

@prrace
Copy link
Contributor

prrace commented May 3, 2022

Ah, because even in the JDK context the behaviour of the native Text Control in this respect is determined in part by the system locale, not just the JDK's idea of locale ..

@prrace
Copy link
Contributor

prrace commented May 3, 2022

It looks to me as if we specify a latin font as the text component font, some windows fall back behaviour insists
on a minimum size for the Japanese fallback font.
And the way to avoid that is to specify a locale (Japanese) font instead which is what used to happen.


Naoto suggested :
-sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol
+sequence.allfonts.UTF-8.ja=japanese,alphabetic,dingbats,symbol

This did't work for me because it isn't picking up that line anyway

So what I see is that MS Mincho isn't even in the list of names it is considering !
Because we are finding this :-
sequence.allfonts=alphabetic/default,dingbats,symbol

I see Toshio says he saw the UTF-8 entry being used, but I don't see that.
So I need to understand why not the UTF-8 entry - note that I have set my system locale to JA now.
The consequence of this is that the fallback sequence is what provides Japanese and
so it is from the Chinese MingLiu-ExtB font which I do have installed.

Toshio is right that what matters here for the native text component is what is picked up in
the logic around WFontConfiguration.getTextComponentFontName()

The helper method for getTextComponentFontName() is findFontWithCharset()

That has a bit of a questionable behaviour in that it returns the last font in the
list that matches the charset it wants.
So hypothetically if we had the charset as DEFAULT_CHARSET
MS Mincho,DEFAULT_CHARSET
Times New Roman,DEFAULT_CHARSET
and if we had
Times New Roman,DEFAULT_CHARSET
MS Mincho,SHIFTJIS_CHARSET

then in both cases we'd get Times and still have the problem
The latter case seems to actually happen - and so even though the font is there, we ignore it.
Clearly what we want is the "locale" font, and we are using encoding to identify any one
that matches but this breaks down in UTF8.
Toshio pointed out that code in WFontConfiguration initTables() basically says
if we found a font tagged as "japanese" then its subsetCharMap entry is SHIFTJIS_CHARSET
and this used to work because this also mapped windows-31j to SHIFTJIS_CHARSET.
But what do you map UTF-8 to ? The current code maps it to DEFAULT_CHARSET.
There needs to be a different way of doing this for UTF-8 locales.

So this fix is a "band aid" on the problem that in the UTF 8 locale we don't seem to be picking
up the entry we should.
If Toshio confirms for SURE he's seeing the UTF-8 one picked up it would be a useful data point.
I still need to debug why I am not getting it.

UPDATE: pilot error on my part - I set lang to jp .. not ja ..

So back to just the encoding case ..

Regarding what Toshio pointed out that we can't have both
sequence.allfonts.UTF-8.zh.CN=alphabetic,chinese-ms936,dingbats,symbol,chinese-ms936-extb
sequence.allfonts.UTF-8.zh.CN=alphabetic,chinese-gb18030,dingbats,symbol,chinese-gb18030-extb
I think that's just a fact. Once you choose UTF-8 you have to decide which of these you want.

@toshiona
Copy link
Contributor Author

toshiona commented May 4, 2022

Hi @prrace

Yes, my system also picked up "UTF-8.ja" line. "ja" can be specified by locale data, such as "-Duser.language=ja".

However, I was not able to recreate the wrong size issue on the system which changed the primary language from English to Japanese. There may be differences between pure Japanese Windows and English Windows changed the primary language to Japanese.

Unicode (UTF-8) is language independent. So, we need to use a locale data.
I created a trial patch to use locale data. If you prefer this way, I'll also adjust fontconfig file and test some environments I can prepare.

sequence.allfonts=alphabetic/default,dingbats,symbol

"alphabetic/default" assigned to "DEFAULT_CHARSET", but it's only used on this line.

sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol

"alphabetic" assigned to "ANSI_CHARSET". So, if we had "DEFAULT_CHARSET", nothing was matched. Then, the first one was used. (WFontConfiguration.java, l.165)

@prrace
Copy link
Contributor

prrace commented May 4, 2022

Hi @prrace

Yes, my system also picked up "UTF-8.ja" line. "ja" can be specified by locale data, such as "-Duser.language=ja".

Yes, so did I after I fixed my typo.

However, I was not able to recreate the wrong size issue on the system which changed the primary language from English to Japanese. There may be differences between pure Japanese Windows and English Windows changed the primary language to Japanese.

I definitely can reproduce this on my "English" windows by changing the system locale.

Unicode (UTF-8) is language independent. So, we need to use a locale data. I created a trial patch to use locale data. If you prefer this way,

If you mean those changes you have in this latest diff then I was thinking of something like that .. although not exactly.

I'll also adjust fontconfig file and test some environments I can prepare.

I'm not sure what fontconfig changes you are thinking of.

sequence.allfonts=alphabetic/default,dingbats,symbol

"alphabetic/default" assigned to "DEFAULT_CHARSET", but it's only used on this line.

sequence.allfonts.UTF-8.ja=alphabetic,japanese,dingbats,symbol

"alphabetic" assigned to "ANSI_CHARSET". So, if we had "DEFAULT_CHARSET", nothing was matched. Then, the first one was used. (WFontConfiguration.java, l.165)

Yes, I think we have issues there I'd like to look closely at.

Really, I am at the point where I'd like to say "thank you for drawing this to our attention, but I'd prefer to do this fix myself"

I can forsee a bunch of follow up work that we might want to do over time.
This change to UTF-8 seems to be very impactful on this code.

@toshiona
Copy link
Contributor Author

toshiona commented May 5, 2022

OK, I'll leave it to you.

@toshiona toshiona closed this May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client client-libs-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants