Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<locale>: Creating a new locale with a custom facet doesn't work correctly #245

Open
connex-nachtschatten opened this issue Nov 1, 2019 · 6 comments
Labels
bug Something isn't working info needed We need more info before working on this

Comments

@connex-nachtschatten
Copy link

connex-nachtschatten commented Nov 1, 2019

Describe the bug
I tried to make a workaround to solve an old bug in STL, which isn't fixed after some years :(
(DevCom-51443).
Just created a ctype facet with an own table. But when I create a new locale with my custom facet, the locale change from german to classic (i think it's classic locale). For simplicity i add in demo code std::money_get facet instead of my custom ctype facet, but it's the same behaviour.
I´m using VS2019 with latest update (16.3.7)

#include <iostream>
#include <locale>

int main()
{
   std::locale loc_ger("de");
   std::locale loc_en("en");
   const auto &f = std::use_facet<std::money_get<char>>(loc_en);
   std::locale loc = std::locale(loc_ger, &f);
   std::locale::global(loc);
   // std::cout.imbue(loc);  // tried with this line too
   std::cout << "üöäÜÖÄ" << std::endl;
}

output shall be:
"üöäÜÖÄ"
but output is:
"³÷õ▄Í─"

I did the same test on https://rextester.com/l/cpp_online_compiler_gcc. there i got the correct output.

Greets from germany

Helmut

@StephanTLavavej StephanTLavavej changed the title create new locale with custom facet don´t work correctly <locale>: Creating a new locale with a custom facet doesn't work correctly Nov 1, 2019
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Nov 1, 2019
@MattStephanson
Copy link
Contributor

@connex-nachtschatten, if you're still interested in this issue, could you clarify your example? It doesn't actually use any facets, so it's not clear what, if anything, is going wrong. It would be better to avoid examining the console output, because it depends on system-specific things like the active code page. The example below, comparing the raw strings, works for me. You can see that it's a mix of German-style number and US-style money formatting.

C:\temp>type gh-245.cpp
#include <cassert>
#include <iomanip>
#include <locale>
#include <sstream>

using namespace std;

int main() {
    std::locale loc(locale{"de-DE.utf-8"}, "en-US.utf-8", locale::monetary);

    std::stringstream ss;
    ss.imbue(loc);
    ss << 1234 << ' ' << std::showbase << std::put_money(567);
    assert(ss.str() == "1.234 $5.67");

    return 0;
}

C:\temp>cl /EHsc /std:c++latest /nologo gh-245.cpp
gh-245.cpp

C:\temp>.\gh-245.exe

C:\temp>

@StephanTLavavej StephanTLavavej added the info needed We need more info before working on this label Jun 30, 2021
@connex-nachtschatten
Copy link
Author

connex-nachtschatten commented Jul 6, 2021

@MattStephanson

the problem is real the output in console for this example.
Here a working test with a standard locale("de"):

#include <iostream>
#include <locale>
#include <sstream>
#include <cassert>

int main()
{
   std::cout.imbue(std::locale("de"));
   std::cout << "üöäÜÖÄ" << std::endl;

   std::locale loc_ger("de");
   std::locale loc_en("en");
   const auto &f = std::use_facet<std::money_get<char>>(loc_en);
   std::locale loc = std::locale(loc_ger, &f); //(2)
   std::locale::global(loc_ger);  //(1)
   std::cout << "üöäÜÖÄ" << std::endl;
}

I used as global locale in line(1) the object loc_ger. output is:

³÷õ▄Í─
üöäÜÖÄ

Now I use the object from line(2) which is the locale loc_ger with the facet from loc_en.
The output has to be the same as from first test

#include <iostream>
#include <locale>
#include <sstream>
#include <cassert>

int main()
{
   std::cout.imbue(std::locale("de"));
   std::cout << "üöäÜÖÄ" << std::endl;

   std::locale loc_ger("de");
   std::locale loc_en("en");
   const auto &f = std::use_facet<std::money_get<char>>(loc_en);
   std::locale loc = std::locale(loc_ger, &f); //(2)
   std::locale::global(loc);  //(1)
   std::cout << "üöäÜÖÄ" << std::endl;
}

but the result is different:

³÷õ▄Í─
³÷õ▄Í─

An other example is with an own ctype facet. The problem is that the mask table for ctype::is is incorrect.
A little example shows it:

#include <iostream> 
#include <locale> 
#include <cctype>

auto main(int, const char**) -> int 
{ 
   std::locale::global(std::locale::classic()); 
   std::cout << isblank('\n') << std::endl; 
   std::cout << std::use_facet<std::ctype<char>>(std::cout.getloc()).is(std::ctype<char>::blank, static_cast<unsigned char>('\n')) << std::endl; 
   return 0; 
} 

the output is

0
1

To fix that problem I made my own table:

template<class CHAR_TYPE, std::size_t SIZE>
inline constexpr auto fix_blank_category(category_table<CHAR_TYPE, SIZE> &table) noexcept -> void
{
   for(std::size_t i = 0; i < SIZE; ++i)
   {
      if(table[i] & std::ctype_base::blank)
      {
         table[i] = table[i] & ~(std::ctype_base::blank);   // delete blank bit
      }
   }

   table[' '] |= std::ctype_base::blank;
   table['\t'] |= std::ctype_base::blank;
}


struct fixed_blank_ctype : std::ctype<char>
{
   fixed_blank_ctype(void) : std::ctype<char>(table())
   {
   }

   static auto table(void) noexcept -> const std::ctype_base::mask*
   {
      static mdt::abi::category_table<std::ctype_base::mask, 256> s_table;
      fix_blank_category(s_table);
      return s_table.data();
   }
}; /* struct fixed_blank_ctype */


inline auto fixed_blank_locale(const std::locale &loc) noexcept -> std::locale
{
   return std::locale(loc, new fixed_blank_ctype());
}

inline auto fixed_blank_locale(const char *loc_name) -> std::locale
{
   return fixed_blank_locale(std::locale(loc_name));
}

then I made a little test

#include <iostream>
#include <locale> 
#include <cctype>
#include "./mdt/abi/ctype.mdt.abi.hpp"   // <- here is my ctype facet and my own table

auto main(int, const char**) -> int 
{ 
   std::locale::global(mdt::abi::fixed_blank_locale("de"));
   std::cout << isblank('\n') << std::endl; 
   std::cout << std::use_facet<std::ctype<char>>(std::cout.getloc()).is(std::ctype<char>::blank, static_cast<unsigned char>('\n')) << std::endl;

   auto table = mdt::abi::fixed_blank_ctype::table();

   std::cout << (table['\n'] & std::ctype<char>::blank) << std::endl;
   return 0;
}

here I use a german locale and combine it with by own ctype facet. The result of the test is

0
1
0

The first 0 is from isblank function.
The 1 is from std::use_facet and is wrong.
To check if my table is correct I try it directly with the table. It is correct. it is second 0.

BTW: if I changefor std::cout the locale to "de" I can´t write umlauts(öäü). It works only with the global locale. Is there an other way to write umlauts to concole without set std::locale::globale to "de"?

@MattStephanson
Copy link
Contributor

There seems to be a couple different issues going on, let me take them one by one.

  1. It's known that the flags for ctype_base::blank are wrong. That's issue <locale>: std::ctype<char>::blank return true for all space-characters #1121, which is marked as "vNext", indicating that fixing it is an ABI-breaking change and can't be done in the current release.
  2. You can install your custom fixed_blank_ctype facet and apply the resulting locale to a stream, and it should work as expected. For illustration, I've modified your example to use std::array:
C:\temp>type temp.cpp
#include <array>
#include <iostream>
#include <locale>

using namespace std;

template <class CHAR_TYPE, size_t SIZE>
inline constexpr auto fix_blank_category(array<CHAR_TYPE, SIZE>& table) noexcept -> void {
    for (size_t i = 0; i < SIZE; ++i) {
        if (table[i] & ctype_base::blank) {
            table[i] = table[i] & ~(ctype_base::blank); // delete blank bit
        }
    }

    table[' '] |= ctype_base::blank;
    table['\t'] |= ctype_base::blank;
}


struct fixed_blank_ctype : ctype<char> {
    fixed_blank_ctype(void) : ctype<char>(table()) {}

    static auto table(void) noexcept -> const ctype_base::mask* {
        static array<ctype_base::mask, 256> s_table;
        fix_blank_category(s_table);
        return s_table.data();
    }
}; /* struct fixed_blank_ctype */


inline auto fixed_blank_locale(const locale& loc) noexcept -> locale {
    return locale(loc, new fixed_blank_ctype());
}

inline auto fixed_blank_locale(const char* loc_name) -> locale {
    return fixed_blank_locale(locale(loc_name));
}

int main() {
    cout << isblank('\n', cout.getloc()) << endl;

    locale loc_fix_blank(cout.getloc(), new fixed_blank_ctype);
    cout.imbue(loc_fix_blank);

    cout << isblank('\n', cout.getloc()) << endl;
}

C:\temp>cl /std:c++latest /nologo /EHsc temp.cpp
temp.cpp

C:\temp>.\temp.exe
1
0

C:\temp>
  1. Finally, the problem of the console output depending on the locale, which I think is what your issue is really about. It involves the interaction of several parts of the STL and UCRT.

If the argument has a name, does setlocale(LC_ALL, loc.name().c_str()); otherwise, the effect on the C locale, if any, is implementation-defined.

  • Second, the encoding of your string literal "üöäÜÖÄ" and the console are generally different. On my computer, by default, they're code pages 1252 and 437, respectively.
  • Third, there is code in the console I/O part of the UCRT to handle translation between different encodings, but it's conditional. In the UCRT source, you can find this comment:

// Double translation is required if both [a] the current locale is not the C
// locale or the file is open in a non-ANSI mode and [b] we are writing to the
// console.

Note that this refers to the C locale, not the C++ locale or the one associated with the stream.

These points together explain the behavior you observe.

  • When std::locale::global() hasn't been called, the C locale is the "C" locale, so UCRT translation is not active and the encoding mismatch results in mojibake.
  • When the C++ locale is set to the named locale loc_ger, the UCRT's translation is active and the output is correct.
  • When the C++ locale is set to the unnamed locale std::locale(loc_ger, &f), the C locale is unchanged, so you continue to get mojibake in your second example.

As for your final question, "Is there an other way to write umlauts to console without set std::locale::global to 'de'?", I hesitate to answer because I'm far from an expert, and I don't want to mislead you. But some methods I've observed to work are (1) call SetConsoleOutputCP with the encoding of your string, CP-1252 in my case, so that the UCRT translation is unnecessary; (2) call _setmode(_fileno(stdout), _O_U16TEXT), so that stdout is no longer in ANSI mode and UCRT translation is activated, then write UTF-16 output through wcout, fputws, etc. There may be other methods, however, and there may be limitation I don't know about to these ones.

@fsb4000
Copy link
Contributor

fsb4000 commented Jul 12, 2021

@MattStephanson is right.
But if you don't need utf-8 or utf-16 then just do chcp and save your source as cp1252 (probably already saved in that codepage)

#include <iostream>
#include <locale>
#include <cstdlib>

int main()
{
   std::locale loc_ger("de");
   std::locale loc_en("en");
   const auto &f = std::use_facet<std::money_get<char>>(loc_en);
   std::locale loc = std::locale(loc_ger, &f);
   std::locale::global(loc);
   //std::cout.imbue(loc);  // tried with this line too
   system("chcp 1252 > NUL");
   std::cout << "üöäÜÖÄ" << std::endl;
}

изображение

@fsb4000
Copy link
Contributor

fsb4000 commented Jul 12, 2021

Or you can save your sources with codepage 850 and do nothing.
and it will output correct. (On a computer with same locale settings, I think it's default German...)
изображение

@fsb4000
Copy link
Contributor

fsb4000 commented Jul 12, 2021

I use "Visual Studio Code" for saving in different codepages. (CP850 or CP1252 or CP1251 or UTF-8)
изображение

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working info needed We need more info before working on this
Projects
None yet
Development

No branches or pull requests

4 participants