Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

从控制台输入的中文问题 #15380

Closed
November20 opened this issue May 18, 2023 · 28 comments
Closed

从控制台输入的中文问题 #15380

November20 opened this issue May 18, 2023 · 28 comments
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@November20
Copy link

Windows Terminal version

No response

Windows build number

10.0.22621.1555

Other Software

No response

Steps to reproduce

我在编写unicode程序时,硬编码用std::cout正常输出c风格字符串的"中文"到控制台,例如std::cout << "中文" << std::endl;
但无法通过std::cin输入中文到string里,例如string s1;std::cin >> s1;我想了解从控制台输入的中文到底是什么编码,以至于程序无法读取.

Expected Behavior

No response

Actual Behavior

从控制台输入的中文问题

@November20 November20 added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels May 18, 2023
@driver1998
Copy link

Translation of the original issue:

When I write my unicode program, I hard coded a c-style string "中文" in my code, and I can output it just fine with std::cout << "中文" << std::endl.

But I can't input Chinese character to a std::string with std::cin without garbled result.

I wonder what encoding the console uses that causes that.

My answer:
简体中文环境下默认是CP936,即GBK,可以在cmd下用chcp命令,或者GetConsoleCP()查询。

The default non-wide-char encoding in Simplified Chinese environment is code page 936, aka GBK. You can check that with chcp in cmd, or GetConsoleCP().

@November20
Copy link
Author

November20 commented May 19, 2023

用UTF-8没办法输入中文,用GBK没办法输出中文:
E:\sc>type z.cpp
#include <Windows.h>
#include
#include
#include <locale.h>
#include <stdio.h>
#include
#include <wchar.h>
using std::cin;
using std::cout;
using std::endl;
using std::string;
using std::wcin;
using std::wcout;
using std::wstring;

int
main ()
{
SetConsoleOutputCP (936);
wprintf (L"除");
cout << "除" << endl;
wcout << L"除" << endl;
return 0;
}

E:\sc>a.exe

@237dmitry
Copy link

I wonder what encoding the console uses that causes that.

I think this depends on system locale.

Screenshot 2023-05-19 095904

@November20
Copy link
Author

November20 commented May 19, 2023

#include <Windows.h>
// #include <cstdio>
#include <iostream>
#include <locale.h>
#include <stdio.h>
#include <string>
#include <wchar.h>
using std::cin;
using std::cout;
using std::endl;
using std::string;
using std::wcin;
using std::wcout;
using std::wstring;

int
main ()
{
    SetConsoleOutputCP (936);
    cout << "除" << endl;
    return 0;
}

这个程序的无法在终端输出中文"除",我用的就是ANSI编码的c++ stl库,支持GBK,源文件也是GBK编码的.
输出的结果如下.chcp也是936.

E:\sc>a.exe
?

我现在的问题是:
当chcp是65001(UTF-8)时,用UTF-8没办法输入中文,却可以输出硬编码的C风格常量宽字符串中文;
当chcp时936(GBK)时,用GBK没办法输出准确的中文(有的中文可以输出,有的则是乱码),
却可以准确输入中文(即使是在输出时乱码的中文也可以输入,然后正常输出),这是为什么?

@237dmitry
Copy link

Переходите на юникод, тогда будет нормальный вывод.

@o-sdn-o
Copy link

o-sdn-o commented May 19, 2023

For correct input/output of any Unicode characters, regardless of the system locale (a-la chcp ...), the following conditions must be met:

  • Program source code must be in UTF-8 format -- to store string literals in UTF-8 encoding.
  • The console output code page must be set to 65001 (SetConsoleOutputCp(CP_UTF8)) -- for correct interpretation of output string literals by output functions.
    Update: Fixed by Remove TranslateUnicodeToOem and all related code #14745 - The codepage of the console input does not matter, since the Windows console can correctly accept Unicode characters from the user only in UTF-16 encoding (while it can accept surrogate pairs - e.g. emoji via Win+. ).
  • The input code page (keyboard input) of the console must also be 65001 (SetConsoleCp(CP_UTF8)).
  • Compilation of the program must be with the /utf-8 key -- for the correct translation of string literals from the source code into a binary executable file.
  • If possible, use a modern dialect of the C++ language (e.g. C++20), which fixes the shortcomings of std::string/std::wstring, for example, such as the absence of char * std::string::data()(non const).

Project properties:
image

image

Make sure your source code files are saved in UTF-8 encoding:
image

image

image

Test code for input/output of any Unicode characters:

#include <iostream>
//#include <io.h>     //_setmode()
//#include <fcntl.h>  //
#include <string>
#include "windows.h"

int main()
{
    using namespace std;
    auto out_cp = GetConsoleOutputCP(); // To restore output code page at exit.
    auto inp_cp = GetConsoleCP();       // To restore input  code page at exit.
    SetConsoleOutputCP(CP_UTF8); // Set console output code page to UTF-8 encoding.
    SetConsoleCP(CP_UTF8);       // Set console input  code page to UTF-8 encoding.

    cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << endl; // Make sure you save your project file with 65001(UTF-8) encoding.

    // Update: Windows console UTF-8 input has been fixed in #14745.
    auto utf8 = string{};
    cout << "Enter text: ";
    cin >> utf8;
    cout << "UTF-8 text: " << utf8 << endl;

    // Outdated.
    //auto wide = wstring{}; 
    //auto utf8 = string{};
    //cout << "Enter text: ";
    //// stdin should be configured in order to receive wchar_t (you can't receive in UTF-8 encoding on Windows yet)
    //_setmode(_fileno(stdin), _O_U16TEXT);
    //wcin >> wide;
    //
    //// Optional: stdout should be configured to output wide strings
    //_setmode(_fileno(stdout), _O_U16TEXT);
    //wcout << L"Wide  text: " << wide << endl;
    //_setmode(_fileno(stdout), _O_TEXT); // Restore to UTF-8.
    //
    //// or convert wide-string to UTF-8 string before output it
    //utf8.resize(wide.size() * 3); // Resize utf8 buffer for the worst case.
    //auto size = WideCharToMultiByte(CP_UTF8, 0, wide.data(), (DWORD)wide.size(), utf8.data(), (DWORD)utf8.size(), 0, 0);
    //utf8.resize(size);
    //cout << "UTF-8 text: " << utf8 << endl;

    SetConsoleOutputCP(out_cp); // Restore original system code pages.
    SetConsoleCP(inp_cp);       //
    return 0;
}
PowerShell.2023-05-19.15-26-44.mp4

@zadjii-msft
Copy link
Member

That last post by @o-sdn-o was pretty comprehensive and better than anything I could have put together. @November20 that work for you/?

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label May 19, 2023
@November20
Copy link
Author

November20 commented May 20, 2023

    auto wide = wstring{};
                       ^
s.cpp(19) : Error: '(' expected following simple type name
    auto utf8 = string{};
                      ^
s.cpp(20) : Error: '(' expected following simple type name
    _setmode (_fileno (stdin), _O_U16TEXT);
                                         ^
s.cpp(24) : Error: undefined identifier '_O_U16TEXT'

_O_U16TEXT
wstring{}
我看不懂,我现在想知道旧标准会阻止使用 Unicode吗?

我使用的是可能只支持c++98的编译器(string甚至没有back()成员),我没有办法装VS,我想知道老标准怎么支持中文的?

在使用gbk写的时候只能支持部分中文,cout部分常用中文会导致乱码,这是为什么?

WideCharToMultiByte是一个宽字符到多字节函数.官方给的解释太少了,Maps a UTF-16 (wide character) string to a new character string. The new character string is not necessarily from a multibyte character set.就说了映射过去,没有谈论具体细节.
我尝试自己写了一个,但依旧不行.

#include <Windows.h>
#include <iostream>

int
main ()
{
    SetConsoleOutputCP (CP_UTF8);
    SetConsoleCP (CP_UTF8);
    std::wstring input;
    std::wcout << L"请输入中文字符串:";
    std::getline (std::wcin, input);
    int utf8Length = WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1,
                                          nullptr, 0, nullptr, nullptr);
    char *utf8Str = new char[utf8Length];
    WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1, utf8Str, utf8Length,
                         nullptr, nullptr);
    std::cout << "UTF-8编码的字符串:" << utf8Str << std::endl;
    delete[] utf8Str;
    return 0;
}

输出为空.

E:\sc>a.exe
请输入中文字符串:除法
UTF-8编码的字符串:

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels May 20, 2023
@driver1998
Copy link

C++98,你用的VC6吗😅,还是用现代一点的编译器吧
Windows宽字符对应的是wchar_t和std::wstring,VC6应该也是有的

C++98... Visual C++ 6? 😅, try something modern maybe?
Wide-char UTF-16 correspond to wchar and std::wstring, Visual C++ 6 should still have them.

@November20
Copy link
Author

November20 commented May 20, 2023

#include <iostream>

int
main ()
{
    std::string input;
    std::cout << "请输入中文字符串:";
    std::cin >> input;
    std::cout << "UTF-8编码的字符串:" << input << std::endl;
    return 0;
}

git bash里输出:

Administrator@DESKTOP-5HP0KEF MINGW64 /e/sc (master)
$ ./a.exe
请输入中文字符串:除法
UTF-8编码的字符串:除法

终端里输出:

E:\sc>chcp 65001
Active code page: 65001

E:\sc>a.exe
请输入中文字符串:除法
UTF-8编码的字符串:

同样的编译器,用git bash 甚至不用宽字符也行,问题的关键可能在于UTF-16到UTF-8的转换.
之前GBK部分文字乱码的问题我无法再次复现,问题就搁置了,我现在转回使用GBK编码,

@driver1998
Copy link

driver1998 commented May 20, 2023

The UTF-8 input does seems to be a bit odd:

#include <iostream>
#include <stdio.h>
#include <windows.h>

DWORD write_console(const char* str)
{
    DWORD charsWritten;
    WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE), str, (DWORD)strlen(str), &charsWritten, nullptr);
    return charsWritten;
}

DWORD read_console(char* buf, DWORD charsToRead)
{
	DWORD charsRead;
	ReadConsoleA(GetStdHandle(STD_INPUT_HANDLE), (LPVOID)buf, charsToRead, &charsRead, nullptr);
	return charsRead;
}

int main()
{
    setlocale(LC_ALL, "zh_CN.UTF8");
    std::string input;
    char str[1024];

    std::cout << "C++ std::cin, std::cout" << std::endl;
    std::cout << "请输入字符串:";
    std::cin >> input;
    std::cout << "UTF-8编码的字符串:" << input << std::endl << std::endl;


    puts("C scanf/printf/puts");
    printf("请输入字符串:");
    scanf("%s", str);
    printf("UTF-8编码的字符串:");
    puts(str);
    puts("");

    write_console("WriteConsoleA, ReadConsoleA\n");
    write_console("请输入字符串:");
    read_console(str, 1024);
    write_console("UTF-8编码的字符串:");
    write_console(str);
    write_console("\n");
    return 0;
}

Neither std::cin/cout, scanf/printf nor even Windows ReadConsoleA/WriteConsoleA can get the chinese charaters input back:

D:\>chcp
Active code page: 65001

D:\>cl.exe /utf-8 /EHsc test.cpp
用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.36.32532 版
版权所有(C) Microsoft Corporation。保留所有权利。

test.cpp
Microsoft (R) Incremental Linker Version 14.36.32532.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

D:\>test
C++ std::cin, std::cout
请输入字符串:123
UTF-8编码的字符串:123

C scanf/printf/puts
请输入字符串:123
UTF-8编码的字符串:123

WriteConsoleA, ReadConsoleA
请输入字符串:123
UTF-8编码的字符串:123


D:\>test
C++ std::cin, std::cout
请输入字符串:测试
UTF-8编码的字符串:

C scanf/printf/puts
请输入字符串:测试
UTF-8编码的字符串:

WriteConsoleA, ReadConsoleA
请输入字符串:测试
UTF-8编码的字符串:

D:\>

@driver1998
Copy link

Even wide-char in/out is a bit weird.

#include <iostream>
#include <stdio.h>
#include <windows.h>

DWORD write_console(const wchar_t* str)
{
    DWORD charsWritten;
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, (DWORD)wcslen(str), &charsWritten, nullptr);
    return charsWritten;
}

DWORD read_console(wchar_t* buf, DWORD charsToRead)
{
    DWORD charsRead;
    ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), (LPVOID)buf, charsToRead, &charsRead, nullptr);
    return charsRead;
}

int main()
{
    std::wstring input;
    setlocale(LC_ALL, "zh_CN");
    wchar_t str[1024];

    std::wcout << L"C++ std::wcin, std::wcout" << std::endl;
    std::wcout << L"请输入字符串:";
    std::wcin >> input;
    std::wcout << L"字符串:" << input << std::endl << std::endl;

    puts("C wscanf/wprintf/_putws");
    wprintf(L"请输入字符串:");
    wscanf(L"%s", str);
    wprintf(L"字符串:");
    _putws(str);
    _putws(L"");

    memset(str, 0, 1024 * sizeof(wchar_t));
    write_console(L"WriteConsoleW, ReadConsoleW\n");
    write_console(L"请输入字符串:");
    read_console(str, 1024);
    write_console(L"字符串:");
    write_console(str);
    return 0;
}

The test works when I use codepage 936:

D:\>chcp
活动代码页: 936

D:\>cl.exe /utf-8 /EHsc test.cpp
用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.36.32532 版
版权所有(C) Microsoft Corporation。保留所有权利。

test.cpp
Microsoft (R) Incremental Linker Version 14.36.32532.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

D:\>test
C++ std::wcin, std::wcout
请输入字符串:测试
字符串:测试

C wscanf/wprintf/_putws
请输入字符串:测试
字符串:测试

WriteConsoleW, ReadConsoleW
请输入字符串:测试
字符串:测试

D:\>

But in codepage 65001, both the C and C++ standard input/output methods can't get my chinese input back. Even with setlocale(LC_ALL, "zh_CN");

Granted the low-level ReadConsoleW and WriteConsoleW still work fine. But you would think the point of using wide-char is to ignore all these code-page nonsense, right?


D:\>chcp
Active code page: 65001

D:\>test
C++ std::wcin, std::wcout
请输入字符串:1
字符串:1

C wscanf/wprintf/_putws
请输入字符串:1
字符串:1

WriteConsoleW, ReadConsoleW
请输入字符串:1
字符串:1

D:\>test
C++ std::wcin, std::wcout
请输入字符串:测试
字符串:

C wscanf/wprintf/_putws
请输入字符串:测试
字符串:

WriteConsoleW, ReadConsoleW
请输入字符串:测试
字符串:测试

D:\>

Oh and without that setlocale call? I don't even have Chinese output then.


D:\>chcp
Active code page: 65001

D:\>cl.exe /utf-8 /EHsc test.cpp
用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.36.32532 版
版权所有(C) Microsoft Corporation。保留所有权利。

test.cpp
Microsoft (R) Incremental Linker Version 14.36.32532.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

D:\>test
C++ std::wcin, std::wcout
1
C wscanf/wprintf/_putws
1
1

WriteConsoleW, ReadConsoleW
请输入字符串:1
字符串:1

D:\>

@driver1998
Copy link

This is all tested on Windows 11 22621.1702, and Windows Terminal 1.16.1026.

@o-sdn-o
Copy link

o-sdn-o commented May 20, 2023

Neither std::cin/cout, scanf/printf nor even Windows ReadConsoleA/WriteConsoleA can get the chinese charaters input back

Update: Fixed by #14745. At the moment, in the Windows console, the input of Unicode characters must be in UTF-16 encoding using wide functions. Input in UTF-8 encoding is not implemented. Perhaps this is the main intention of this issue and it needs to be fixed.

The following code should work as expected after this issue is fixed

#include <iostream>
#include <string>
#include "windows.h"

int main()
{
    UINT out_cp = GetConsoleOutputCP(); // To restore output code page at exit.
    UINT inp_cp = GetConsoleCP();       // To restore input  code page at exit.
    SetConsoleOutputCP(CP_UTF8); // Set console output code page to UTF-8 encoding.
    SetConsoleCP(CP_UTF8);       // Set console input  code page to UTF-8 encoding.
    std::cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << std::endl; // Make sure you save your project file with 65001(UTF-8) encoding.
    std::string utf8;
    std::cout << "Enter text: ";
    std::cin >> utf8;
    std::cout << "UTF-8 text: " << utf8 << std::endl;
    SetConsoleOutputCP(out_cp); // Restore original system code pages.
    SetConsoleCP(inp_cp);       //
    return 0;
}

@o-sdn-o
Copy link

o-sdn-o commented May 20, 2023

我使用的是可能只支持c++98的编译器(string甚至没有back()成员),我没有办法装VS,我想知道老标准怎么支持中文的? (I am using a compiler that may only support c++98 (string does not even have a back() member), and I have no way to install VS. I want to know how the old standard supports Chinese?)

@November20 UTF-8 support will work with C++98 without any problems if UTF-8 encoded input support is implemented on the Windows Terminal side. UPDATE: It is fixed by #14745. This does not currently work with UTF-8 user input, and you need to resort to translating between UTF-16 and UTF-8 back and forth in your program.

With the following fixes, your code works as expected

#include <Windows.h>
#include <iostream>

int
main()
{
    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);
    std::wstring input;
    //std::wcout << L"请输入中文字符串:"; // UTF-16 is not a byte oriented stream!
    std::cout << "请输入中文字符串:"; // UTF-8 output should works well.

    //std::getline(std::wcin, input); // Use ReadConsoleW instead.
    wchar_t buffer[1000];
    DWORD length = 1000;
    DWORD count;
    ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, length, &count, 0);
    input = std::wstring(buffer, count);

    int utf8Length = WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1,
        nullptr, 0, nullptr, nullptr);
    char* utf8Str = new char[utf8Length];
    WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1, utf8Str, utf8Length,
        nullptr, nullptr);
    std::cout << "UTF-8编码的字符串:" << utf8Str << std::endl;
    delete[] utf8Str;
    return 0;
}
PS D:\sdn\source\repos\ConsoleApplication11\x64\Debug> .\ConsoleApplication11.exe
请输入中文字符串:除法
UTF-8编码的字符串:除法

@o-sdn-o
Copy link

o-sdn-o commented May 20, 2023

This does not yet work in the terminal from the store, but in the build from the main branch (since #14745), the following code works almost right: cooked read preview doesn't display surrogate pairs, but cooks it in UTF-8 encoding correctly

    std::cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << std::endl; // Make sure you save your project file with 65001(UTF-8) encoding.
    std::string utf8;
    std::cout << "Enter text: ";
    std::cin >> utf8;
    std::cout << "UTF-8 text: " << utf8 << std::endl;
PowerShell 7.3.4
PS C:\Users\sdn> D:\sdn\source\repos\ConsoleApplication9\x64\Debug\ConsoleApplication9.exe
Test: あああ🙂🙂🙂日本👌中文👍Кириллица
Enter text: ああああ������
UTF-8 text: ああああ👌👌👌
PS C:\Users\sdn>

@November20
Copy link
Author

November20 commented May 21, 2023

我切换回gbk后今天重新开始学习 C++,但是出现了乱码.

/*
键:家族姓氏
值:家族孩子们的名字vector对像

vector对象存储pair类型的对象,记录每个孩子
的名字和生日.

基于家族姓氏查询该家族的所有孩子的名字
*/
#include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <utility>
#include <vector>

using std::cin;
using std::cout;
using std::endl;
using std::make_pair;
using std::map;
using std::pair;
using std::string;
using std::vector;

int
main ()
{
    vector<pair<string, string> > haizi;
    map<string, vector<pair<string, string> > > hzsr;
    pair<string, string> mzsr;

    string xs;
    cout << "\n输入家族姓氏: " << endl;
    string mz;
    string sr;
    string pd;
    while (cin >> xs, !cin.eof ())
        {
            // 将姓氏添加到map的键中
            cinclear (cin);
            map<string, vector<pair<string, string> > >::iterator ret
                = hzsr.find (xs);
            if (ret != hzsr.end ())
                {
                    cout << "\n\t家族姓氏 " << xs << " 已存在!\n" << endl;
                    haizi = ret->second;
                }
            cout << "\n输入孩子的名字: " << endl;
            while (cin >> mz, !cin.eof ())
                {
                    cout << "\n输入孩子的生日: " << endl;
                    cinclear (cin);
                    cin >> sr;
                    cinclear (cin);
                    mzsr.first = mz;
                    mzsr.second = sr;
                    haizi.push_back (mzsr);
                    cout << "\n请确认是否继续添加 " << xs
                         << " 家族的孩子(Y/N):" << endl;
                    cin >> pd;
                    if (pd == "N")
                        break;
                    cout << "\n请输入新的孩子的名字: " << endl;
                }
            pair<map<string, vector<pair<string, string> > >::iterator, bool>
                cs = hzsr.insert (make_pair (xs, haizi));
            if (!cs.second)
                {
                    (cs.first)->second = haizi; // 更新数据
                    cout << "\n\t提示: " << xs << " 家族已更新\n" << endl;
                }
            else
                {
                    cout << "\n\t提示:" << xs << " 家族已添加\n" << endl;
                }
            cinclear (cin);
            cout << "\n请确认是否继续添加新的家族(Y/N)" << endl;
            cin >> pd;
            if (pd == "N")
                break;
            cout << "\n-------------------------------\n\n请输入新的家族姓氏: "
                 << endl;
        }
    cinclear (cin);

    // --------------------------------------------------------------------------------
    cout << "\n\n---------------------------------------" << endl;
    cout << "\t----查询系统----\n" << endl;
    cout << "\n请输入家族姓氏" << endl;
    while (cin >> xs, !cin.eof ())
        {
            cinclear (cin);
            map<string, vector<pair<string, string> > >::iterator ret
                = hzsr.find (xs);
            if (ret != hzsr.end ())
                {
                    cout << xs << " 家族的孩子生日:\n" << endl;
                    vector<pair<string, string> >::iterator vit
                        = (ret->second).begin ();
                    while (vit != (ret->second).end ())
                        {
                            cout << "姓名: " << vit->first
                                 << "\t\t生日: " << vit->second << endl;
                            ++vit;
                        }
                }
            else
                cout << xs << " 家族没有记录" << endl;
            cout << "\n是否继续查询(Y/N):" << endl;
            cin >> pd;
            if (pd == "N")
                break;
            cout << "\n继续请输入家族姓氏" << endl;
        }

    return 0;
}

cinclear.cpp 清理cin的缓冲区

#include "cinclear.h"
#include <cstdio>
#include <iostream>

using std::cin;
using std::istream;
using std::wcin;

void
cinclear (istream &dd)
{
    dd.ignore ();
    dd.clear ();
    dd.sync ();
    fflush (stdin);
    rewind (stdin);
    setbuf (stdin, NULL);
    return;
}

cinclear.h:

#ifndef CINCLEAR_H
#define CINCLEAR_H
#include <iostream>
void cinclear (std::istream &);
#endif

输出的结果:

输入家族姓氏:
中文

输入孩子的名字:
中文

输入孩子的生日:
2000

请确认是否继续添加 中文 家族的孩子(Y/N):
N

        提示:中文 家族已添加


请确认是否继续添加新的家族(Y/N)
N


---------------------------------------
        ----查询系统----


请输入家族姓氏
中文
?? 家族没有记录

是否继续查询(Y/N):
N

@November20
Copy link
Author

在GIT BUSH下开启GBK时,程序也能正常运行.终端应该对初学c++的人更友好一些.

Administrator@DESKTOP-5HP0KEF MINGW64 /e/sc (master)
$ ./a.exe

输入家族姓氏:
中文

输入孩子的名字:
中文

输入孩子的生日:
2000

请确认是否继续添加 中文 家族的孩子(Y/N):
N

        提示:中文 家族已添加


请确认是否继续添加新的家族(Y/N)
N


---------------------------------------
        ----查询系统----


请输入家族姓氏
中文
中文 家族的孩子生日:

姓名: 中文              生日: 2000

是否继续查询(Y/N):
N

@November20 November20 reopened this May 21, 2023
@November20
Copy link
Author

November20 commented May 21, 2023

这个问题可以在我的电脑上复现,重新编译也会出现同样的问题,尝试输入不同姓氏的家族时也有机会看到乱码.
.如果这个问题已经修复,赶快上架商场,我已经迫不及待了.

@November20
Copy link
Author

November20 commented May 21, 2023

E:\sc>zj
unix2dos: converting file xiti10_23.cpp to DOS format...
unix2dos: converting file cinclear.cpp to DOS format...

E:\sc>type xiti10_23.cpp
#include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <vector>

using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;
using std::vector;

int
main ()
{
    cout << "请输入排除单词:" << endl;
    string pc;
    vector<string> pcj;
    while (cin >> pc, !cin.eof ())
        {
            cinclear (cin);
            pcj.push_back (pc);
        }
    cinclear (cin);
    cout << "请输入单词:" << endl;
    bool pd = false;
    while (cin >> pc, !cin.eof ())
        {
            pd = false;
            cinclear (cin);
            for (vector<string>::iterator vt = pcj.begin (); vt != pcj.end ();
                 ++vt)
                if ((*vt) == pc)
                    {
                        pd = true;
                    }
            if (!pd)
                cout << pc << endl;
        }
    return 0;
}
/*
使用set的好处:首先可以排除排除集中重复的单词;其次可以使用count或find运算来
检查单词是否出现在排除集中,而不是像vector用循环比较来完成.
*/

E:\sc>chcp
活动代码页: 936

E:\sc>by
------------run----------------
请输入排车ゴ?
中文
^Z
请输入单词:
中文
??
^Z
------------over--------------- return: 0
请按任意键继续. . .

E:\sc>

git bush也出现了同样的问题,但只是"除"字后的所有字符乱码,但不影响程序使用

Administrator@DESKTOP-5HP0KEF MINGW64 /e/sc (master)
$ ./a.exe
请输入排车ゴ▒:
哈哈

请输入单词:
哈哈
zhongwen
zhongwen
中文
中文

我vim的配置文件

colorscheme murphy
syntax on
set encoding=gbk
set fileencoding=gbk
set fileencodings=ucs-bom,utf-8,cp936,gb18030,big5,euc-jp,euc-kr,latin1


set number
"set guifont=IBM_Plex_Mono:h18
set guifont=HanaMinB:h20
set lines=24 columns=95
set fileformat=dos
set tabstop=4

@o-sdn-o
Copy link

o-sdn-o commented May 21, 2023

Everything is correct in your code, except that you don't explicitly indicate what type of encoding your program uses. Every console in the wild has a runtime state for the I/O encoding type. The text stream encoding type in the console may not match the encoding type in your program. Therefore, when you run your program, you must explicitly configure the console according to the encoding type you are using.

You have two mutually exclusive encoding options to choose from:

  • Use the national code page GBK(936). You are limited here to use the set of characters contained in this code page.
  • Use UTF-8 encoding. Often called using the Unicode word. Here you are not limited in the use of any characters, even hieroglyphs, even emoji.

First.
Your program's source code files must be saved in either GBK(936) or UTF-8(65001) encoding format.

Second.
On non-Windows operating systems (on Linux or macOS) everyone globally switched to using UTF-8, and there is no such problems, but on Windows all stuff still uses national code pages by default, so your program must configure console to use correct encoding. Note: It is necessary to restore the original console code page at program exit.

Modified xiti10_23.cpp (added mandatory code block):

#include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <vector>

// Mandatory code block on windows.
#ifdef _WIN32
    #include "windows.h"
    namespace winapi_cp_state
    {
        static UINT ou_state = GetConsoleOutputCP(); // Save original system code pages.
        static UINT in_state = GetConsoleCP();       //
        static void set_page(UINT out, UINT in) { SetConsoleOutputCP(out); SetConsoleCP(in); }
        static void set_page() { set_page(ou_state, in_state); }

        // Uncomment to use UTF-8
        //static int _state = (set_page(CP_UTF8, CP_UTF8), ::atexit(set_page)); // Set to UTF-8 and always restore original system code pages at exit.
            
        // Uncomment in case of using national code page GBK(OEM-936).
        static int _state = (set_page(936, 936), ::atexit(set_page)); // Set to GBK(OEM-936) and always restore original system code pages at exit.
    }
#endif

using namespace std;

int
main ()
{
    cout << "请输入排除单词:" << endl;
    string pc;
    vector<string> pcj;
    while (cin >> pc, !cin.eof ())
        {
            cinclear (cin);
            pcj.push_back (pc);
        }
    cinclear (cin);
    cout << "请输入单词:" << endl;
    bool pd = false;
    while (cin >> pc, !cin.eof ())
        {
            pd = false;
            cinclear (cin);
            for (vector<string>::iterator vt = pcj.begin (); vt != pcj.end ();
                 ++vt)
                if ((*vt) == pc)
                    {
                        pd = true;
                    }
            if (!pd)
                cout << pc << endl;
        }
    return 0;
}
/*
使用set的好处:首先可以排除排除集中重复的单词;其次可以使用count或find运算来
检查单词是否出现在排除集中,而不是像vector用循环比较来完成.
*/

@o-sdn-o
Copy link

o-sdn-o commented May 21, 2023

I filled a feature request to add the mandatory code block (specified in my previous comment here) to the <iostream> by default on Windows. I think this will help solve a lot of problems with encodings.

@November20
Copy link
Author

This feature request may be difficult to pass. Microsoft will not make any changes in order to ensure that the old program can still run normally. As long as a large project can run normally, there is no need to take risks.The threshold for writing code on things related to Microsoft is very high.Thank you very much for helping me.

@November20 November20 closed this as not planned Won't fix, can't repro, duplicate, stale May 21, 2023
@November20
Copy link
Author

November20 commented May 23, 2023

#include <Windows.h>
#include <iostream>
#include <string>
#include <wchar.h>
using std::cout;
using std::endl;
using std::string;
using std::wcin;
using std::wistream;
using std::wstring;

void
co (std::string utf8Str)
{
    UINT out_cp = GetConsoleOutputCP ();
    UINT inp_cp = GetConsoleCP ();
    SetConsoleOutputCP (CP_UTF8);
    SetConsoleCP (CP_UTF8);

    std::cout << utf8Str << std::flush;

    SetConsoleOutputCP (out_cp);
    SetConsoleCP (inp_cp);
    return;
}

wistream &
ci (wistream &aa, string &ss)
{
    UINT out_cp = GetConsoleOutputCP ();
    UINT inp_cp = GetConsoleCP ();
    SetConsoleOutputCP (CP_UTF8);
    SetConsoleCP (CP_UTF8);

    std::wstring input;
    wchar_t buffer[1000];
    DWORD length = 1000;
    DWORD count;
    ReadConsoleW (GetStdHandle (STD_INPUT_HANDLE), buffer, length, &count, 0);
    input = std::wstring (buffer, count);

    int utf8Length = WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1,
                                          nullptr, 0, nullptr, nullptr);
    char *utf8Str = new char[utf8Length];
    WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1, utf8Str, utf8Length,
                         nullptr, nullptr);
    ss.assign (utf8Str, utf8Length - 1);
    delete[] utf8Str;

    SetConsoleOutputCP (out_cp);
    SetConsoleCP (inp_cp);

    return aa;
}

int
main ()
{
    co ("中文除法\n");
    string dd;
    while (ci (wcin, dd))
        co (dd);

    return 0;
}

The ReadConsoleW function cannot be terminated through EOF, which has caused some hindrance to my debugging.I'm not very good at using the ReadConsoleW function.I originally planned to use wcin to process input, but it turned out to be garbled.There are no examples for reference, let alone I am a beginner in C++.I need your help.

@November20 November20 reopened this May 23, 2023
@o-sdn-o
Copy link

o-sdn-o commented May 23, 2023

The ReadConsoleW function cannot be terminated through EOF

You should catch the control characters yourself (^Z aka EOF, \n aka LF and \r aka CR)

#include <Windows.h>
#include <iostream>
#include <string>
#include <wchar.h>

void
co(std::string utf8Str)
{
    UINT out_cp = GetConsoleOutputCP();
    UINT inp_cp = GetConsoleCP();
    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

    std::cout << utf8Str << std::endl << std::flush;

    SetConsoleOutputCP(out_cp);
    SetConsoleCP(inp_cp);
    return;
}

bool
ci(std::string& ss)
{
    UINT out_cp = GetConsoleOutputCP();
    UINT inp_cp = GetConsoleCP();
    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

    std::wstring input;
    wchar_t buffer[1000];
    DWORD length = 1000;
    DWORD count;
    ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, length, &count, 0);
    input = std::wstring(buffer, count);
    // Pop all trailing '\r\n'
    while (input.size() && (input.back() == '\n' || input.back() == '\r'))
        input.pop_back();
    // EOF/^Z detection
    bool eof = input.empty() || input.back() == 26;
    if (!eof)
    {
        int utf8Length = WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1,
            nullptr, 0, nullptr, nullptr);
        char* utf8Str = new char[utf8Length];
        WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1, utf8Str, utf8Length,
            nullptr, nullptr);
        ss.assign(utf8Str, utf8Length);
        delete[] utf8Str;
    }

    SetConsoleOutputCP(out_cp);
    SetConsoleCP(inp_cp);
    return !eof;
}

int
main()
{
    co("中文除法\n");
    std::string dd;
    while (ci(dd))
        co(dd);
    return 0;
}

@o-sdn-o
Copy link

o-sdn-o commented May 23, 2023

@November20 It is better to create a new test project of your own on GitHub to discuss the nuances of console functions.

Please create a new Public repository in your GitHub profile and I'll help you deal with it there.

@November20
Copy link
Author

I am very happy to learn code knowledge from you. But it may waste your time and you won't get any return. My level is just that I just flipped through the book to introduce the related container section. I am only achieving my goals and not helping with the testing. I think I should finish reading the book first and move on to Linux and GCC. When I learn the Windows console later, I can better understand your code.

@o-sdn-o
Copy link

o-sdn-o commented May 23, 2023

Feel free to ask me, I'll be happy to advise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
Development

No branches or pull requests

5 participants