Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cope with UTF-8 codepage #1468

Closed
wants to merge 1 commit into from
Closed

Cope with UTF-8 codepage #1468

wants to merge 1 commit into from

Conversation

perennialmind
Copy link

Add UTF-8 codepage 65001, sometimes referred to as "Unicode (UTF-8 without signature)". Recent versions of Windows allow for locale variants with UTF-8 encoding, making CP_UTF8 a valid return for GetOEMCP(). Alternatively, you could drop cp_hr_list and instead call GetCPInfoExA.

Add UTF-8 codepage 65001, sometimes referred to as "Unicode (UTF-8 without signature)". Recent versions of Windows allow for locale variants with UTF-8 encoding, making `CP_UTF8` a valid return for `GetOEMCP()`. Alternatively, you could drop `cp_hr_list` and instead call  [GetCPInfoExA](https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getcpinfoexa).
@pbatard
Copy link
Owner

pbatard commented Mar 7, 2020

Good find. Please mention that this triggers an assert, coz that helps me prioritize these matters.

I don't think I wan to use GetCPInfoExA because I want to be in control of the displayed name, and I also want to have an idea of what codepages don't work, so that I can add proper support for them (hence the assert).

Besides your patch, I guess I'm going to have to add a special case for UTF-8, because the end result is that 437 rather than 850 (well, 858 since we want the € symbol) is being used on UK/IE localized platforms, which isn't ideal, so I got to think about this some more...

@pbatard pbatard added this to the 3.10 milestone Mar 7, 2020
@pbatard pbatard self-assigned this Mar 7, 2020
@pbatard
Copy link
Owner

pbatard commented Mar 11, 2020

I think I have a proper solution for this issue now, that doesn't rely on falling back to US codepage for all systems that use UTF-8.

But boy is it a pain in the ass to get a Windows system that defaults to UTF-8 to give you an OEM codepage. I'll just leave this for people interested in programmatically finding out the real codepage of a system with default UTF-8 locale:

UINT actual_cp;
GetLocaleInfoA(GetUserDefaultUILanguage(), LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUMBER,
               (char*)&actual_cp, sizeof(actual_cp)

The above is for OEM. If you want ANSI, you should use LOCALE_IDEFAULTANSICODEPAGE.

With this, one finally gets the expected result:

Will use DOS keyboard 'uk' [UK-English]
Will use codepage 858 [Western-European (Euro)]

I will close this PR once I push the relevant commit. Once again, thanks for reporting this!

@pbatard pbatard closed this in 5681c3b Mar 11, 2020
dyeske pushed a commit to dyeske/rufus that referenced this pull request Mar 13, 2020
* Recent versions of Windows can set the deafult locale to codepage 65001 (UTF-8).
* This produces an assert due to a missing entry in cp_hr_list[], so fix that.
* However, this fix alone is not enough, as a GetOEMCP() that returns 65001 means
  that any systems set to UTF-8 will fall back to codepage 437 for DOS, which is
  definitely not what we want => Add an extra call to determine the actual OEM
  codepage when UTF-8 is detected.
* Closes pbatard#1468
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants