-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse wchar_t args correctly #222
Comments
Can you preprocess the |
I cannot preprocess "wmain" into vector of string, because if you convert specific symbols in wchar_t to standard string, you loss data. I need users to be able to pass the path to the output or input file as arguments, and file names are often in the user's language (Chinese, Ukrainian, etc.). But if I try convert user input to standard string and open file, I get filename like this: |
You could keep your arguments as The following works fine on Godbolt. // clang-format off
#include <https://raw.githubusercontent.com/p-ranav/argparse/master/include/argparse/argparse.hpp>
// clang-format on
#include <iostream>
#include <vector>
#include <string>
#include <codecvt>
int main(int argc, char *argv[])
{
std::wstring arg = L"Hello, 世界.txt";
/// Convert to std::string
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;
std::string arg_as_string = converter.to_bytes(arg);
/// Create argument parser
argparse::ArgumentParser program("program_name");
program.add_argument("filename");
try
{
/// Parse vector of strings
program.parse_args(std::vector<std::string>{"./program", arg_as_string});
}
catch (const std::exception &err)
{
std::cerr << "argparse failed with: " << typeid(err).name() << " " << err.what() << "\n";
std::cerr << program;
return 1;
}
auto input = program.get("filename");
std::cout << input << "\n"; /// Hello, 世界.txt
} |
I dont know. Maybe its working on Godbolt, because all locales and encodings are added there, but on my pc this program prints |
@p-ranav Beat me to it. Yes, convert to UTF-8 for internal use and no codepoints should be mangled. @Theodikes When sending UTF-8 strings back out to the consle or filesystem, you may need to convert to the user's locale. |
The problem is that the user can process files that are not in their locale. For example, I often see Ukrainian files, although I do not have this locale. In addition, how to correctly determine the locale of each user? For example, I use en-US Windows, but my native language is non-acsii. Isn't it easier to add support for wstring and wmain, which will immediately reduce all problems with locales to nothing? |
Welcome to the world of multi-language programming. I don't mean that sarcastically, cross language coding is full of challenges.
Only on Windows. |
Yes, I know it's not easy, but why deliberately complicate your life when you can just use wstring?
Yes, I know it. But my program is simple 300kb exe file which should work only under Windows... |
Not everyone uses UTF-16 and
Have you considered forking |
I know it. But I was thinking about not changing to the detriment of cross-platform, but about adding a separate function for wmain parsing, for example, so that in addition to |
I don't believe it is as simple as adding Would you be interested in templatizing Every class, function, and statement that uses I don't know if this is workable or would be acceptable to @p-ranav. |
No, it doesn`t work. Error class "ArgumentParser" may not have a template argument list |
Could you push your changes to a fork? I'd like to see what is not working. |
Have you made any changes to the code? If not, it won't work as things are today.
As @skrobinson says, we'd have to templatize everything on the string type first. Then, you could use I'm not sure yet if all of the string algorithms we use are directly applicable to multi-byte character strings or if there are any assumptions based on |
Looked through the entire source code, changed everything to |
Here comes the possible non-trivial effort as, @p-ranav put it. We have |
Yes, I have already doing this other, simpler way (as I think). I delete
It handles any number correctly, including scientific notation, octal and hexadecimal. But then for more than four hours I was looking for a problem - the program incorrectly handled negative numeric arguments. Now all works, maybe later I will add compatibility with the current version so that the library can work with both wmain and main, and add tests, but for now I just made a draft working version (fork). Issue solved, thanks for help. |
Since users of my program can input non-ascii-encoded strings, it should be possible to parse arguments like this:
vector<wstring> args = program.get<vector<wstring>>("--some-arg");
With standard
const char** args
this of course doesn't work, I tried to useint wmain(int argc, const wchar_t** argv)
, butprogram.parse_args(argc, argv)
doesn`t work with wchars.The text was updated successfully, but these errors were encountered: