Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DONT MERGE - WIN32: Use Unicode for paths #2914

Open
wants to merge 1 commit into
base: master
from

Conversation

@ZvikaZ
Copy link
Contributor

@ZvikaZ ZvikaZ commented Apr 5, 2021

Before this commit, Windows returned path names with local computer encoding, and then ScummVM converted them to U32String, without knowing what is the proper encoding.

This commit ensures that Windows returns Unicode paths.

When scummvm.ini is opened in GVim, the Hebrew path isn't rendered correctly. However, :e! ++enc=utf8 makes it show correctly. I'm not certain about this - is it normal? Or should we so something for scummvm.ini so editors will automatically know that it's UTF8? (maybe add a BOM? but as far as I know, it's frowned upon in the Linux world)

EDIT

Codacy fails, with:

backends/fs/windows/windows-fs.cpp

Does not handle strings that are not \0-terminated; if given one it may perform an over-read (it could cause a crash if unprotected) (CWE-126).
WideCharToMultiByte(CP_ACP, 0, str, _tcslen(str) + 1, asciiString, sizeof(asciiString), NULL, NULL);

Does not handle strings that are not \0-terminated; if given one it may perform an over-read (it could cause a crash if unprotected) (CWE-126).
MultiByteToWideChar(CP_ACP, 0, str, strlen(str) + 1, unicodeString, sizeof(unicodeString) / sizeof(TCHAR));

However, I have only changed the codepage. The rest of these lines is similar to before. Do we want to change something in these lines? Or can we safely ignore these warnings?

(BTW, it's weird that Codacy complains about the original lines, before my changes - I have modified CP_ACP to CP_UTF8)

EDIT2

grepping found that there's another instance of CP_ACP usage, at engines/ags/shared/util/path.cpp
I assume it should also be changed to CP_UTF8. Either as part of this PR, or maybe after its acceptance. However, I'm completely unfamiliar with AGS engine. @PaulGilbert , what do you say?

EDIT3

We should be aware that this PR doesn't provide safe migration for users that already have non ascii characters in their game paths, as they are encoded in the local encoding, which ScummVM isn't aware of. However, this PR still improves the situation, because anyway, currently these games simply won't work as ScummVM doesn't understand the old path encoding. So, either with this PR, or without it, we're not backward compatible. This PR at least allows the users to re-add those games.

@ZvikaZ ZvikaZ force-pushed the ZvikaZ:z_hebrew_path branch from f579ef5 to 8d58fba Apr 5, 2021
@deepcode-ci-bot
Copy link

@deepcode-ci-bot deepcode-ci-bot bot commented Apr 5, 2021

Congratulations 🎉. DeepCode analyzed your code in 16.834 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@sluicebox
Copy link
Member

@sluicebox sluicebox commented Apr 5, 2021

I can't speak to the existing encoding situation or its problems, but unconditionally defining UNICODE is a problem.

windows-fs.cpp is full of #ifndef UNICODE code and this change unconditionally defines UNICODE a few lines earlier so now all that code is dead. That doesn't sound right.

UNICODE (and _UNICODE) should be consistently defined/undefined across an entire program and not per cpp. Otherwise TCHAR usage in headers (windows-fs.h, SDK headers, etc) could be char or wchar_t depending on which cpp gets compiled first. If you're unconditionally defining UNICODE then you don't need TCHARs at all because that gives the illusion of char/wchar_t compatibility when it doesn't exist, but you don't want to be unconditionally defining UNICODE in source at all because...

Most importantly, this will break the beloved Win98 build. (Okay maybe just @SupSuper loves it.) The Win32 API calls that take strings will now always link to the wide-char versions.

I'm not familiar with any of this code, just saw that the change started with #define UNICODE and had Win32 flashbacks. I don't know if this is relevant but since we have code to handle when UNICODE is defined, I'd want to know under what circumstances that occurs.

Before this commit, Windows returned path names with local computer
encoding, and then ScummVM converted them to U32String, without knowing
what is the proper encoding.
This commit ensures that Windows returns Unicode paths.
@ZvikaZ ZvikaZ force-pushed the ZvikaZ:z_hebrew_path branch from 8d58fba to 14d7523 Apr 6, 2021
@ZvikaZ ZvikaZ changed the title WIN32: Use Unicode for paths DONT MERGE - WIN32: Use Unicode for paths Apr 6, 2021
@ZvikaZ
Copy link
Contributor Author

@ZvikaZ ZvikaZ commented Apr 6, 2021

I can't speak to the existing encoding situation or its problems, but unconditionally defining UNICODE is a problem.

windows-fs.cpp is full of #ifndef UNICODE code and this change unconditionally defines UNICODE a few lines earlier so now all that code is dead. That doesn't sound right.

UNICODE (and _UNICODE) should be consistently defined/undefined across an entire program and not per cpp. Otherwise TCHAR usage in headers (windows-fs.h, SDK headers, etc) could be char or wchar_t depending on which cpp gets compiled first. If you're unconditionally defining UNICODE then you don't need TCHARs at all because that gives the illusion of char/wchar_t compatibility when it doesn't exist, but you don't want to be unconditionally defining UNICODE in source at all because...

Most importantly, this will break the beloved Win98 build. (Okay maybe just @SupSuper loves it.) The Win32 API calls that take strings will now always link to the wide-char versions.

I'm not familiar with any of this code, just saw that the change started with #define UNICODE and had Win32 flashbacks. I don't know if this is relevant but since we have code to handle when UNICODE is defined, I'd want to know under what circumstances that occurs.

That's a good point!
I have moved the (_)UNICODE defines to create_project. @SupSuper , is it enough for Win98 support, or should I find another place to put them?

Anyway, now there are many compilation errors, as it seems we have a lot of code that depends on UNICODE being not defined.

I assume I can fix them all, but I'd like first to be sure that I'm on the right path, before investing time and energy in this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants