Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<filesystem>: sometimes path can not work with the string under chinese environment correctly #469

Open
creatorlxd opened this issue Jan 28, 2020 · 11 comments
Labels
bug Something isn't working filesystem C++17 filesystem

Comments

@creatorlxd
Copy link

Describe the bug
When I invoke std::filesystem::path::generic_string, the VS gets into the debug mode, and it shows that the std::string object can not be converted into wide. I find the std::filesystem::_Check_convert_result return zero at that time. Then I try to use setlocale(LC_ALL, "zh-CN.UTF-8"); to solve the problem. However, this problem occupies again but not the same place, this time it happens when I try to construct a path by the string. It seems that it also can not covert the string correctly.

STL version (git commit or Visual Studio version):latest VS2017

@StephanTLavavej
Copy link
Member

Can you please provide a self-contained test case demonstrating the problem? We need to see the exact source code you're using.

@StephanTLavavej StephanTLavavej changed the title <filesystem>sometimes path can not work with the string under chinese environment correctly <filesystem>: sometimes path can not work with the string under chinese environment correctly Jan 28, 2020
@StephanTLavavej StephanTLavavej added bug Something isn't working info needed We need more info before working on this labels Jan 28, 2020
@BillyONeal
Copy link
Member

Also note that I believe the CRT will not accept .UTF-8 as a valid locale setting, so I bet the most likely outcome is that this setlocale call is failing.

@BillyONeal
Copy link
Member

(There is some form of beta UTF-8 support but I think it needs to be opted-in to on a machine by machine basis in settings pages)

@creatorlxd
Copy link
Author

Test Case

#include <iostream>
#include <filesystem>

using namespace std;
using namespace std::filesystem;

int main()
{
	auto re = setlocale(LC_ALL, "zh-CN.UTF-8");
	cout << re << endl;
	path p1("大象无形  虚幻引擎程序设计浅析_14181715");
	path p2(L"d:\\Document\\CppLanguage\\CppCon2017\\Lightning Talks and Lunch Sessions\\A C++20 Preview - operator˂=˃");
	auto str = p2.generic_string();
	return 0;
}

On my machine, the setlocate can be called correctly. In this test case, whether I use setlocate or not, one of the path objects will occupy a error as I said above.

@StephanTLavavej StephanTLavavej removed the info needed We need more info before working on this label Jan 30, 2020
@HenryAWE
Copy link

Try this

using namespace std::filesystem;
path p1 = u8path(u8"你要的中文");

参见这里

@BillyONeal
Copy link
Member

@HenryAWE Still looks correct to me:

image

@BillyONeal
Copy link
Member

Test Case

#include <iostream>
#include <filesystem>

using namespace std;
using namespace std::filesystem;

int main()
{
	auto re = setlocale(LC_ALL, "zh-CN.UTF-8");
	cout << re << endl;
	path p1("大象无形  虚幻引擎程序设计浅析_14181715");
	path p2(L"d:\\Document\\CppLanguage\\CppCon2017\\Lightning Talks and Lunch Sessions\\A C++20 Preview - operator˂=˃");
	auto str = p2.generic_string();
	return 0;
}

On my machine, the setlocate can be called correctly. In this test case, whether I use setlocate or not, one of the path objects will occupy a error as I said above.

In this example I observe that the compiler has already converted the values to ????s before it ever gets to fs::path. If I prevent the compiler from doing that (by adding "u8" as in:

path p1(u8"大象无形  虚幻引擎程序设计浅析_14181715");

it works fine.

@creatorlxd
Copy link
Author

But in fact, the error occupied when I get these strings from the filesystem, which means when my program find the files with these strings as their name, the error occupied. I don't write these string in my code.

@BillyONeal
Copy link
Member

@creatorlxd There's no guarantee that the strings on the filesystem are representable in the active code page, even if the active code page is UTF-8. Can you get an example of the raw bytes from .native() we can use as an example?

Thanks!

@creatorlxd
Copy link
Author

@BillyONeal I'd like to give a more accurate example. However, after I have changed my IDE to VS2019, I find that this error can not be found again. It seems that the string I got by using the filesystem is different the old one. When I use the old string which was saved in a file, the program abort, but when I delete the file, and generated the information again, the error disappeared. So I'm sorry that I can not give the example you want. But the test case I have gave can not work in VS2019 as well.

@BillyONeal
Copy link
Member

Maybe there was a bug here that got fixed? I do know that UTF-8 anything was a very late breaking feature when the last 2017 version shipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working filesystem C++17 filesystem
Projects
None yet
Development

No branches or pull requests

5 participants