Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File I/O should use UTF-8, rather than system default encoding #360

Open
SkywaveTM opened this issue Sep 7, 2018 · 12 comments
Open

File I/O should use UTF-8, rather than system default encoding #360

SkywaveTM opened this issue Sep 7, 2018 · 12 comments

Comments

@SkywaveTM
Copy link
Contributor

@SkywaveTM SkywaveTM commented Sep 7, 2018

It is a real problem across the project. It leads to unexpected data loss or malfunctions.

Here is an example. For now, if options in a skin are named in Japanese, its values are not properly saved into/loaded from player config files in systems where Japanese is not their primary language. It causes skin setting are not properly saved across sessions and default values are selected whenever the program is booted up. In worst cases, the program die unexpectedly. Those examples are the things I suffered with beatoraja 0.6.2 and LITONE5 0.9 (180820) in Korean Windows 10.

I tried to fix it by myself, but there are some problems.

  1. File IO is not managed by specific class, rather every single classes are opening (and not closing) files. It means that we need to find all statements that opening any files.
  2. If we change the encoding, previous files will not be properly read.

Due to those problems, I just tested a quite-fix without a commit -- changing user config files to use UTF-8 and creating a new user profile -- and the problems I suffered were gone.

Here's the code snippet I tried in bms.player.beatoraja.PlayerConfig.java.

before

	public static PlayerConfig readPlayerConfig(String playerpath, String playerid) {
		PlayerConfig player = new PlayerConfig();
		Path p = Paths.get(playerpath + "/" + playerid + "/config.json");
		Json json = new Json();
		try {
			json.setIgnoreUnknownFields(true);
			player = json.fromJson(PlayerConfig.class, new FileReader(p.toFile()));
			player.setId(playerid);
			player.validate();
		} catch(Throwable e) {
			e.printStackTrace();
		}
		return player;
	}

	public static void write(String playerpath, PlayerConfig player) {
		Json json = new Json();
		json.setOutputType(JsonWriter.OutputType.json);
		Path p = Paths.get(playerpath + "/" + player.getId() + "/config.json");
		try (FileWriter fw = new FileWriter(p.toFile())) {
			fw.write(json.prettyPrint(player));
			fw.flush();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

after

	public static PlayerConfig readPlayerConfig(String playerpath, String playerid) {
		PlayerConfig player = new PlayerConfig();
		Path p = Paths.get(playerpath + "/" + playerid + "/config.json");
		Json json = new Json();
		try (Reader fr = new InputStreamReader(
				new FileInputStream(p.toFile()), StandardCharsets.UTF_8
		)) {
			json.setIgnoreUnknownFields(true);
			player = json.fromJson(PlayerConfig.class, fr);
			player.setId(playerid);
			player.validate();
		} catch(IOException e) {
			e.printStackTrace();
		}
		return player;
	}

	public static void write(String playerpath, PlayerConfig player) {
		Json json = new Json();
		json.setOutputType(JsonWriter.OutputType.json);
		Path p = Paths.get(playerpath + "/" + player.getId() + "/config.json");
		try (Writer fw = new OutputStreamWriter(
				new FileOutputStream(p.toFile()), StandardCharsets.UTF_8
		)) {
			fw.write(json.prettyPrint(player));
			fw.flush();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
@SkywaveTM

This comment has been minimized.

Copy link
Contributor Author

@SkywaveTM SkywaveTM commented Sep 7, 2018

Oh... I notice that same issue was opened about 1 year ago. #81

However, I'll not close this issue for now to alert other developers. THIS IS A REALLY HUGE PROBLEM.

@wcko87

This comment has been minimized.

Copy link
Contributor

@wcko87 wcko87 commented Sep 7, 2018

In fact this issue was opened yet another time at #318.
#318 also mentions a similar problem with the encoding of folder names for BMS files. (it crashes beatoraja when trying to play the song twice)

And yes this is a huge problem.

@staraxis1684

This comment has been minimized.

Copy link

@staraxis1684 staraxis1684 commented Oct 8, 2018

I think this problem also concerns with the problem at #143 .(This problem haven't resolved until now)
While the application runnning, it seems that this gets the controller device name by MS932, on the other hand, it looks like that displays by other characterset (probably it's UTF-8) at key config and also writes the config file by SHIFT-JIS.

This inconsistency seems to be due to the fact that the character code is not clarified in various parts iof the application.

(日本語で補足しておくと、IIDX Premiumコントローラーは「2 軸 16 ボタン ジョイスティック」とDirectInput上では認識しているようですが、これがbeatorajaのキーコンフィグではこのように文字化けし、設定ファイルではさらに別の文字コードとして「2 ?? 16 ?{?^?? ?-2」と文字化けしているようです。ゲーム上の文字化けは仕方ないかもしれませんが、設定ファイルとJSONファイルの文字コード等が合っていないことが問題な気がします。
この問題が解決すると個人的には非常に助かります。よろしくお願いいたします。)

@parataxia

This comment has been minimized.

Copy link
Contributor

@parataxia parataxia commented Oct 19, 2018

There is an explicit override for Java to use UTF-8 exclusively for all I/O operations, should be useful if used persistently:
-Dfile.encoding="UTF-8"

Add it to "_JAVA_OPTIONS" environment variable in your batch/command file.

edit Of course we're still faced with the issue of determining specific byte codes for proper charset implementation (i.e, UTF-8/SHIFT-JIS) which gives us arbitrary (it needed to had been solved earlier) solutions. Although the environment variable solution is good, I am scared to commit permanent change to given start-up script files because it would break compatibility possibly between users. Maybe a commented out appendage for the "_JAVA_OPTIONS" instead?

Furthermore there are "guessing" algorithms that use odd statistical analysis for trying to detect character encoding properly, I would think with config & BMS files/folder names it would make a poor solution because the data is too noisy (or not enough?) with indeterminable characters and there are more opaque characters available than unique chars. Also: slow AF I bet and too many files (so sometimes likely to fail as well.) Curious if this helps people because I noticed I posted earlier on accident, oops. Hopefully it helps the config problem at least, the rest need work!

@wcko87

This comment has been minimized.

Copy link
Contributor

@wcko87 wcko87 commented Jan 23, 2019

The UTF-8 encoding flag that parataxia suggested works very well.
I am now running beatoraja in US locale with absolutely no issues.

I added it to my .bat file and have had no encoding problems ever since. (I didn't lose any of my scores, replays, songdb, config or anything else either)

@979CHAN

This comment has been minimized.

Copy link

@979CHAN 979CHAN commented Feb 22, 2019

Hello, I don't know programming. So I ask.
I've added -Dfile.encoding="UTF-8" to .bat, but when I re-run it, the skin settings and IR settings are all reset.
I added it after _JAVA_OPTIONS=' and am I wrong?
I do not have a place to ask, so I post a question on this side.
(Google Translate)

@wcko87

This comment has been minimized.

Copy link
Contributor

@wcko87 wcko87 commented Feb 24, 2019

Hello,
The _JAVA_OPTIONS line should look like this:

set _JAVA_OPTIONS='-Dsun.java2d.opengl=true -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel' -Dfile.encoding="UTF-8"

(yes, the -Dfile.encoding="UTF-8" is placed outside of the quotes ' '. It's odd, but it doesn't seem to work inside the quotes)

Skin settings might reset after switching to UTF-8, because the skin settings may have been saved in the default encoding. I think it shouldn't be too much trouble to set the skin settings again, since you only need to do this once.
I'm not sure about IR settings.

Also after switching to UTF-8, if you experience any crashes when loading the skins in game, try switching to the default skin, and back to your current skin. This refreshes the settings and saves them again in the correct encoding.

There should be no more crashes / issues after that.

Regarding places to ask questions, there is the JP BMS Discord, and the International BMS Discord.
But I will answer this here because I think this answer may be useful to other people.

@979CHAN

This comment has been minimized.

Copy link

@979CHAN 979CHAN commented Feb 25, 2019

Thanks for the answer, but I have another problem.
In the cmd window that appears when you run the bat file, what you originally saw in Japanese is broken and looks like Korean. Also, the loading is significantly slower than before.
If you select a folder that contains songs, the songs will not appear and only the folders except the songs will be displayed. If you select that other folder, you can not select it.
And the most important skin problem can not be solved. What should I do?
I do not know where the Discord is, so I ask again here.
(Google Translate)

@979CHAN

This comment has been minimized.

Copy link

@979CHAN 979CHAN commented May 5, 2019

References Links Thank you! But now I think that when I run it on my laptop, the setting has not been reset even though it is the same Korean locale. What happened?
(Google Translate)

@979CHAN

This comment has been minimized.

Copy link

@979CHAN 979CHAN commented May 5, 2019

Also, if you run it as config.bat, it will recognize the controller, but will not recognize the button in the key setting.
If you change the locale to Japanese, the initialization symptom remains.
(Google Translate)

@wcko87

This comment has been minimized.

Copy link
Contributor

@wcko87 wcko87 commented May 6, 2019

Your PC locale should not matter if you have correctly configured the UTF-8 mode.
I suggest checking that you have set the UTF-8 mode flag correctly.

@979CHAN

This comment has been minimized.

Copy link

@979CHAN 979CHAN commented Jun 4, 2019

image
This is done by applying UTF-8.
When I did not apply it, the Japanese that did not break was broken. Is it normal?
No configuration information is saved even when this is applied.
(Google Translate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.