-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
从控制台输入的中文问题 #15380
Comments
Translation of the original issue: When I write my unicode program, I hard coded a c-style string "中文" in my code, and I can output it just fine with But I can't input Chinese character to a I wonder what encoding the console uses that causes that. My answer: The default non-wide-char encoding in Simplified Chinese environment is code page 936, aka GBK. You can check that with |
用UTF-8没办法输入中文,用GBK没办法输出中文: int E:\sc>a.exe |
这个程序的无法在终端输出中文"除",我用的就是ANSI编码的c++ stl库,支持GBK,源文件也是GBK编码的.
我现在的问题是: |
Переходите на юникод, тогда будет нормальный вывод. |
For correct input/output of any Unicode characters, regardless of the system locale (a-la chcp ...), the following conditions must be met:
Make sure your source code files are saved in UTF-8 encoding: Test code for input/output of any Unicode characters: #include <iostream>
//#include <io.h> //_setmode()
//#include <fcntl.h> //
#include <string>
#include "windows.h"
int main()
{
using namespace std;
auto out_cp = GetConsoleOutputCP(); // To restore output code page at exit.
auto inp_cp = GetConsoleCP(); // To restore input code page at exit.
SetConsoleOutputCP(CP_UTF8); // Set console output code page to UTF-8 encoding.
SetConsoleCP(CP_UTF8); // Set console input code page to UTF-8 encoding.
cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << endl; // Make sure you save your project file with 65001(UTF-8) encoding.
// Update: Windows console UTF-8 input has been fixed in #14745.
auto utf8 = string{};
cout << "Enter text: ";
cin >> utf8;
cout << "UTF-8 text: " << utf8 << endl;
// Outdated.
//auto wide = wstring{};
//auto utf8 = string{};
//cout << "Enter text: ";
//// stdin should be configured in order to receive wchar_t (you can't receive in UTF-8 encoding on Windows yet)
//_setmode(_fileno(stdin), _O_U16TEXT);
//wcin >> wide;
//
//// Optional: stdout should be configured to output wide strings
//_setmode(_fileno(stdout), _O_U16TEXT);
//wcout << L"Wide text: " << wide << endl;
//_setmode(_fileno(stdout), _O_TEXT); // Restore to UTF-8.
//
//// or convert wide-string to UTF-8 string before output it
//utf8.resize(wide.size() * 3); // Resize utf8 buffer for the worst case.
//auto size = WideCharToMultiByte(CP_UTF8, 0, wide.data(), (DWORD)wide.size(), utf8.data(), (DWORD)utf8.size(), 0, 0);
//utf8.resize(size);
//cout << "UTF-8 text: " << utf8 << endl;
SetConsoleOutputCP(out_cp); // Restore original system code pages.
SetConsoleCP(inp_cp); //
return 0;
} PowerShell.2023-05-19.15-26-44.mp4 |
That last post by @o-sdn-o was pretty comprehensive and better than anything I could have put together. @November20 that work for you/? |
_O_U16TEXT 我使用的是可能只支持c++98的编译器(string甚至没有back()成员),我没有办法装VS,我想知道老标准怎么支持中文的? 在使用gbk写的时候只能支持部分中文,cout部分常用中文会导致乱码,这是为什么? WideCharToMultiByte是一个宽字符到多字节函数.官方给的解释太少了,Maps a UTF-16 (wide character) string to a new character string. The new character string is not necessarily from a multibyte character set.就说了映射过去,没有谈论具体细节.
输出为空.
|
C++98,你用的VC6吗😅,还是用现代一点的编译器吧 C++98... Visual C++ 6? 😅, try something modern maybe? |
git bash里输出:
终端里输出:
同样的编译器,用git bash 甚至不用宽字符也行,问题的关键可能在于UTF-16到UTF-8的转换. |
The UTF-8 input does seems to be a bit odd: #include <iostream>
#include <stdio.h>
#include <windows.h>
DWORD write_console(const char* str)
{
DWORD charsWritten;
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE), str, (DWORD)strlen(str), &charsWritten, nullptr);
return charsWritten;
}
DWORD read_console(char* buf, DWORD charsToRead)
{
DWORD charsRead;
ReadConsoleA(GetStdHandle(STD_INPUT_HANDLE), (LPVOID)buf, charsToRead, &charsRead, nullptr);
return charsRead;
}
int main()
{
setlocale(LC_ALL, "zh_CN.UTF8");
std::string input;
char str[1024];
std::cout << "C++ std::cin, std::cout" << std::endl;
std::cout << "请输入字符串:";
std::cin >> input;
std::cout << "UTF-8编码的字符串:" << input << std::endl << std::endl;
puts("C scanf/printf/puts");
printf("请输入字符串:");
scanf("%s", str);
printf("UTF-8编码的字符串:");
puts(str);
puts("");
write_console("WriteConsoleA, ReadConsoleA\n");
write_console("请输入字符串:");
read_console(str, 1024);
write_console("UTF-8编码的字符串:");
write_console(str);
write_console("\n");
return 0;
} Neither std::cin/cout, scanf/printf nor even Windows ReadConsoleA/WriteConsoleA can get the chinese charaters input back:
|
Even wide-char in/out is a bit weird. #include <iostream>
#include <stdio.h>
#include <windows.h>
DWORD write_console(const wchar_t* str)
{
DWORD charsWritten;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, (DWORD)wcslen(str), &charsWritten, nullptr);
return charsWritten;
}
DWORD read_console(wchar_t* buf, DWORD charsToRead)
{
DWORD charsRead;
ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), (LPVOID)buf, charsToRead, &charsRead, nullptr);
return charsRead;
}
int main()
{
std::wstring input;
setlocale(LC_ALL, "zh_CN");
wchar_t str[1024];
std::wcout << L"C++ std::wcin, std::wcout" << std::endl;
std::wcout << L"请输入字符串:";
std::wcin >> input;
std::wcout << L"字符串:" << input << std::endl << std::endl;
puts("C wscanf/wprintf/_putws");
wprintf(L"请输入字符串:");
wscanf(L"%s", str);
wprintf(L"字符串:");
_putws(str);
_putws(L"");
memset(str, 0, 1024 * sizeof(wchar_t));
write_console(L"WriteConsoleW, ReadConsoleW\n");
write_console(L"请输入字符串:");
read_console(str, 1024);
write_console(L"字符串:");
write_console(str);
return 0;
} The test works when I use codepage 936:
But in codepage 65001, both the C and C++ standard input/output methods can't get my chinese input back. Even with Granted the low-level ReadConsoleW and WriteConsoleW still work fine. But you would think the point of using wide-char is to ignore all these code-page nonsense, right?
Oh and without that
|
This is all tested on Windows 11 22621.1702, and Windows Terminal 1.16.1026. |
Update: Fixed by #14745. The following code should work as expected after this issue is fixed #include <iostream>
#include <string>
#include "windows.h"
int main()
{
UINT out_cp = GetConsoleOutputCP(); // To restore output code page at exit.
UINT inp_cp = GetConsoleCP(); // To restore input code page at exit.
SetConsoleOutputCP(CP_UTF8); // Set console output code page to UTF-8 encoding.
SetConsoleCP(CP_UTF8); // Set console input code page to UTF-8 encoding.
std::cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << std::endl; // Make sure you save your project file with 65001(UTF-8) encoding.
std::string utf8;
std::cout << "Enter text: ";
std::cin >> utf8;
std::cout << "UTF-8 text: " << utf8 << std::endl;
SetConsoleOutputCP(out_cp); // Restore original system code pages.
SetConsoleCP(inp_cp); //
return 0;
} |
@November20 UTF-8 support will work with C++98 without any problems if UTF-8 encoded input support is implemented on the Windows Terminal side. UPDATE: It is fixed by #14745. With the following fixes, your code works as expected #include <Windows.h>
#include <iostream>
int
main()
{
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
std::wstring input;
//std::wcout << L"请输入中文字符串:"; // UTF-16 is not a byte oriented stream!
std::cout << "请输入中文字符串:"; // UTF-8 output should works well.
//std::getline(std::wcin, input); // Use ReadConsoleW instead.
wchar_t buffer[1000];
DWORD length = 1000;
DWORD count;
ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, length, &count, 0);
input = std::wstring(buffer, count);
int utf8Length = WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1,
nullptr, 0, nullptr, nullptr);
char* utf8Str = new char[utf8Length];
WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1, utf8Str, utf8Length,
nullptr, nullptr);
std::cout << "UTF-8编码的字符串:" << utf8Str << std::endl;
delete[] utf8Str;
return 0;
}
|
This does not yet work in the terminal from the store, but in the build from the std::cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица" << std::endl; // Make sure you save your project file with 65001(UTF-8) encoding.
std::string utf8;
std::cout << "Enter text: ";
std::cin >> utf8;
std::cout << "UTF-8 text: " << utf8 << std::endl;
|
我切换回gbk后今天重新开始学习 C++,但是出现了乱码. /*
键:家族姓氏
值:家族孩子们的名字vector对像
vector对象存储pair类型的对象,记录每个孩子
的名字和生日.
基于家族姓氏查询该家族的所有孩子的名字
*/
#include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <utility>
#include <vector>
using std::cin;
using std::cout;
using std::endl;
using std::make_pair;
using std::map;
using std::pair;
using std::string;
using std::vector;
int
main ()
{
vector<pair<string, string> > haizi;
map<string, vector<pair<string, string> > > hzsr;
pair<string, string> mzsr;
string xs;
cout << "\n输入家族姓氏: " << endl;
string mz;
string sr;
string pd;
while (cin >> xs, !cin.eof ())
{
// 将姓氏添加到map的键中
cinclear (cin);
map<string, vector<pair<string, string> > >::iterator ret
= hzsr.find (xs);
if (ret != hzsr.end ())
{
cout << "\n\t家族姓氏 " << xs << " 已存在!\n" << endl;
haizi = ret->second;
}
cout << "\n输入孩子的名字: " << endl;
while (cin >> mz, !cin.eof ())
{
cout << "\n输入孩子的生日: " << endl;
cinclear (cin);
cin >> sr;
cinclear (cin);
mzsr.first = mz;
mzsr.second = sr;
haizi.push_back (mzsr);
cout << "\n请确认是否继续添加 " << xs
<< " 家族的孩子(Y/N):" << endl;
cin >> pd;
if (pd == "N")
break;
cout << "\n请输入新的孩子的名字: " << endl;
}
pair<map<string, vector<pair<string, string> > >::iterator, bool>
cs = hzsr.insert (make_pair (xs, haizi));
if (!cs.second)
{
(cs.first)->second = haizi; // 更新数据
cout << "\n\t提示: " << xs << " 家族已更新\n" << endl;
}
else
{
cout << "\n\t提示:" << xs << " 家族已添加\n" << endl;
}
cinclear (cin);
cout << "\n请确认是否继续添加新的家族(Y/N)" << endl;
cin >> pd;
if (pd == "N")
break;
cout << "\n-------------------------------\n\n请输入新的家族姓氏: "
<< endl;
}
cinclear (cin);
// --------------------------------------------------------------------------------
cout << "\n\n---------------------------------------" << endl;
cout << "\t----查询系统----\n" << endl;
cout << "\n请输入家族姓氏" << endl;
while (cin >> xs, !cin.eof ())
{
cinclear (cin);
map<string, vector<pair<string, string> > >::iterator ret
= hzsr.find (xs);
if (ret != hzsr.end ())
{
cout << xs << " 家族的孩子生日:\n" << endl;
vector<pair<string, string> >::iterator vit
= (ret->second).begin ();
while (vit != (ret->second).end ())
{
cout << "姓名: " << vit->first
<< "\t\t生日: " << vit->second << endl;
++vit;
}
}
else
cout << xs << " 家族没有记录" << endl;
cout << "\n是否继续查询(Y/N):" << endl;
cin >> pd;
if (pd == "N")
break;
cout << "\n继续请输入家族姓氏" << endl;
}
return 0;
} cinclear.cpp 清理cin的缓冲区 #include "cinclear.h"
#include <cstdio>
#include <iostream>
using std::cin;
using std::istream;
using std::wcin;
void
cinclear (istream &dd)
{
dd.ignore ();
dd.clear ();
dd.sync ();
fflush (stdin);
rewind (stdin);
setbuf (stdin, NULL);
return;
} cinclear.h:
输出的结果:
|
在GIT BUSH下开启GBK时,程序也能正常运行.终端应该对初学c++的人更友好一些.
|
这个问题可以在我的电脑上复现,重新编译也会出现同样的问题,尝试输入不同姓氏的家族时也有机会看到乱码. |
E:\sc>zj
unix2dos: converting file xiti10_23.cpp to DOS format...
unix2dos: converting file cinclear.cpp to DOS format...
E:\sc>type xiti10_23.cpp
#include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <vector>
using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;
using std::vector;
int
main ()
{
cout << "请输入排除单词:" << endl;
string pc;
vector<string> pcj;
while (cin >> pc, !cin.eof ())
{
cinclear (cin);
pcj.push_back (pc);
}
cinclear (cin);
cout << "请输入单词:" << endl;
bool pd = false;
while (cin >> pc, !cin.eof ())
{
pd = false;
cinclear (cin);
for (vector<string>::iterator vt = pcj.begin (); vt != pcj.end ();
++vt)
if ((*vt) == pc)
{
pd = true;
}
if (!pd)
cout << pc << endl;
}
return 0;
}
/*
使用set的好处:首先可以排除排除集中重复的单词;其次可以使用count或find运算来
检查单词是否出现在排除集中,而不是像vector用循环比较来完成.
*/
E:\sc>chcp
活动代码页: 936
E:\sc>by
------------run----------------
请输入排车ゴ?
中文
^Z
请输入单词:
中文
??
^Z
------------over--------------- return: 0
请按任意键继续. . .
E:\sc> git bush也出现了同样的问题,但只是"除"字后的所有字符乱码,但不影响程序使用
我vim的配置文件
|
Everything is correct in your code, except that you don't explicitly indicate what type of encoding your program uses. Every console in the wild has a runtime state for the I/O encoding type. The text stream encoding type in the console may not match the encoding type in your program. Therefore, when you run your program, you must explicitly configure the console according to the encoding type you are using. You have two mutually exclusive encoding options to choose from:
First. Second. Modified #include "cinclear.h"
#include <iostream>
#include <map>
#include <string>
#include <vector>
// Mandatory code block on windows.
#ifdef _WIN32
#include "windows.h"
namespace winapi_cp_state
{
static UINT ou_state = GetConsoleOutputCP(); // Save original system code pages.
static UINT in_state = GetConsoleCP(); //
static void set_page(UINT out, UINT in) { SetConsoleOutputCP(out); SetConsoleCP(in); }
static void set_page() { set_page(ou_state, in_state); }
// Uncomment to use UTF-8
//static int _state = (set_page(CP_UTF8, CP_UTF8), ::atexit(set_page)); // Set to UTF-8 and always restore original system code pages at exit.
// Uncomment in case of using national code page GBK(OEM-936).
static int _state = (set_page(936, 936), ::atexit(set_page)); // Set to GBK(OEM-936) and always restore original system code pages at exit.
}
#endif
using namespace std;
int
main ()
{
cout << "请输入排除单词:" << endl;
string pc;
vector<string> pcj;
while (cin >> pc, !cin.eof ())
{
cinclear (cin);
pcj.push_back (pc);
}
cinclear (cin);
cout << "请输入单词:" << endl;
bool pd = false;
while (cin >> pc, !cin.eof ())
{
pd = false;
cinclear (cin);
for (vector<string>::iterator vt = pcj.begin (); vt != pcj.end ();
++vt)
if ((*vt) == pc)
{
pd = true;
}
if (!pd)
cout << pc << endl;
}
return 0;
}
/*
使用set的好处:首先可以排除排除集中重复的单词;其次可以使用count或find运算来
检查单词是否出现在排除集中,而不是像vector用循环比较来完成.
*/ |
I filled a feature request to add the mandatory code block (specified in my previous comment here) to the |
This feature request may be difficult to pass. Microsoft will not make any changes in order to ensure that the old program can still run normally. As long as a large project can run normally, there is no need to take risks.The threshold for writing code on things related to Microsoft is very high.Thank you very much for helping me. |
#include <Windows.h>
#include <iostream>
#include <string>
#include <wchar.h>
using std::cout;
using std::endl;
using std::string;
using std::wcin;
using std::wistream;
using std::wstring;
void
co (std::string utf8Str)
{
UINT out_cp = GetConsoleOutputCP ();
UINT inp_cp = GetConsoleCP ();
SetConsoleOutputCP (CP_UTF8);
SetConsoleCP (CP_UTF8);
std::cout << utf8Str << std::flush;
SetConsoleOutputCP (out_cp);
SetConsoleCP (inp_cp);
return;
}
wistream &
ci (wistream &aa, string &ss)
{
UINT out_cp = GetConsoleOutputCP ();
UINT inp_cp = GetConsoleCP ();
SetConsoleOutputCP (CP_UTF8);
SetConsoleCP (CP_UTF8);
std::wstring input;
wchar_t buffer[1000];
DWORD length = 1000;
DWORD count;
ReadConsoleW (GetStdHandle (STD_INPUT_HANDLE), buffer, length, &count, 0);
input = std::wstring (buffer, count);
int utf8Length = WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1,
nullptr, 0, nullptr, nullptr);
char *utf8Str = new char[utf8Length];
WideCharToMultiByte (CP_UTF8, 0, input.c_str (), -1, utf8Str, utf8Length,
nullptr, nullptr);
ss.assign (utf8Str, utf8Length - 1);
delete[] utf8Str;
SetConsoleOutputCP (out_cp);
SetConsoleCP (inp_cp);
return aa;
}
int
main ()
{
co ("中文除法\n");
string dd;
while (ci (wcin, dd))
co (dd);
return 0;
} The ReadConsoleW function cannot be terminated through EOF, which has caused some hindrance to my debugging.I'm not very good at using the ReadConsoleW function.I originally planned to use wcin to process input, but it turned out to be garbled.There are no examples for reference, let alone I am a beginner in C++.I need your help. |
You should catch the control characters yourself (^Z aka EOF, \n aka LF and \r aka CR) #include <Windows.h>
#include <iostream>
#include <string>
#include <wchar.h>
void
co(std::string utf8Str)
{
UINT out_cp = GetConsoleOutputCP();
UINT inp_cp = GetConsoleCP();
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
std::cout << utf8Str << std::endl << std::flush;
SetConsoleOutputCP(out_cp);
SetConsoleCP(inp_cp);
return;
}
bool
ci(std::string& ss)
{
UINT out_cp = GetConsoleOutputCP();
UINT inp_cp = GetConsoleCP();
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
std::wstring input;
wchar_t buffer[1000];
DWORD length = 1000;
DWORD count;
ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, length, &count, 0);
input = std::wstring(buffer, count);
// Pop all trailing '\r\n'
while (input.size() && (input.back() == '\n' || input.back() == '\r'))
input.pop_back();
// EOF/^Z detection
bool eof = input.empty() || input.back() == 26;
if (!eof)
{
int utf8Length = WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1,
nullptr, 0, nullptr, nullptr);
char* utf8Str = new char[utf8Length];
WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1, utf8Str, utf8Length,
nullptr, nullptr);
ss.assign(utf8Str, utf8Length);
delete[] utf8Str;
}
SetConsoleOutputCP(out_cp);
SetConsoleCP(inp_cp);
return !eof;
}
int
main()
{
co("中文除法\n");
std::string dd;
while (ci(dd))
co(dd);
return 0;
} |
@November20 It is better to create a new test project of your own on GitHub to discuss the nuances of console functions. Please create a new Public repository in your GitHub profile and I'll help you deal with it there. |
I am very happy to learn code knowledge from you. But it may waste your time and you won't get any return. My level is just that I just flipped through the book to introduce the related container section. I am only achieving my goals and not helping with the testing. I think I should finish reading the book first and move on to Linux and GCC. When I learn the Windows console later, I can better understand your code. |
Feel free to ask me, I'll be happy to advise. |
Windows Terminal version
No response
Windows build number
10.0.22621.1555
Other Software
No response
Steps to reproduce
我在编写unicode程序时,硬编码用std::cout正常输出c风格字符串的"中文"到控制台,例如std::cout << "中文" << std::endl;
但无法通过std::cin输入中文到string里,例如string s1;std::cin >> s1;我想了解从控制台输入的中文到底是什么编码,以至于程序无法读取.
Expected Behavior
No response
Actual Behavior
从控制台输入的中文问题
The text was updated successfully, but these errors were encountered: