Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gettext(windows): always use UTF-8 #217

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented May 20, 2019

The main issue we work around here is that Windows does not have a UTF-8 "code page".

Side note: there is actually a code page for UTF-8: 65001 (see https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers). However, when experimenting with it, we ran into a multitude of issues in the Git for Windows project, ranging from various problems with Windows' default console to miscounted file writes. While these issues may have been mitigated in recent Windows 10 versions, older ones (in particular, Windows 7) still seem to have most of them, and Git for Windows specifically still supports even Windows Vista. So from a practical point of view, there is no UTF-8 code page.

Changes since v1:

  • The LC_ALL=C method used by ab/no-kwset to prevent Git from assuming UTF-8-encoded input is now supported.
  • The commit message was enhanced and revamped.

@dscho dscho added the ready to submit Has commits that have not been submitted yet label May 21, 2019
@dscho
Copy link
Member Author

dscho commented Jun 27, 2019

/submit

@dscho dscho removed the ready to submit Has commits that have not been submitted yet label Jun 27, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

Submitted as pull.217.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This branch is now known as kb/windows-force-utf8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This patch series was integrated into pu via git@39559f5.

@gitgitgadget gitgitgadget bot added the pu label Jun 27, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@0adb53a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@8087c10.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@d784b2f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2019

This patch series was integrated into pu via git@52c0c27.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2019

This patch series was integrated into pu via git@003c186.

gettext.c Show resolved Hide resolved
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@5d18a85.

@dscho dscho force-pushed the gettext-force-utf-8-on-windows branch from ff37a26 to 43f1fd6 Compare July 3, 2019 20:01
On native Windows, Git exclusively uses UTF-8 for console output (both
with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
determine the output encoding for translated text, however, MSVCRT's
`setlocale()` does not support UTF-8. As a result, translated text is
encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
mangled in console output.

Side note: There is actually a code page for UTF-8: 65001. In practice,
it does not work as expected at least on Windows 7, though, so we cannot
use it in Git. Besides, if we overrode the code page, any process
spawned from Git would inherit that code page (as opposed to the code
page configured for the current user), which would quite possibly break
e.g. diff or merge helpers. So we really cannot override the code page.

In `init_gettext_charset()`, Git calls gettext's
`bind_textdomain_codeset()` with the character set obtained via
`locale_charset()`; Let's override that latter function to force the
encoding to UTF-8 on native Windows.

In Git for Windows' SDK, there is a `libcharset.h` and therefore we
define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
`config.mak.uname`, therefore we need to add the override before that
conditionally-compiled code block.

Rather than simply defining `locale_charset()` to return the string
`"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
`ab/no-kwset` patch series, for example, needs to have a way to prevent
Git from expecting UTF-8-encoded input.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the gettext-force-utf-8-on-windows branch from 43f1fd6 to 2d2253f Compare July 3, 2019 20:07
@dscho
Copy link
Member Author

dscho commented Jul 3, 2019

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

Submitted as pull.217.v2.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@5adbd93.

gettext.c Show resolved Hide resolved
@dscho
Copy link
Member Author

dscho commented Oct 27, 2019

This already made it into v2.23.0, as gitster@090d1e8.

@dscho dscho closed this Oct 27, 2019
@dscho dscho deleted the gettext-force-utf-8-on-windows branch October 27, 2019 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants