Skip to content

[LLVM Support] encoding problem of ExecuteAndWait on windows #161679

@16bit-ykiko

Description

@16bit-ykiko

I am developing a new tool based on llvm. When I use ExecuteAndWait to execute a program and the error occurs in the execution

int sys::ExecuteAndWait(StringRef Program, ArrayRef<StringRef> Args,
std::optional<ArrayRef<StringRef>> Env,
ArrayRef<std::optional<StringRef>> Redirects,
unsigned SecondsToWait, unsigned MemoryLimit,
std::string *ErrMsg, bool *ExecutionFailed,
std::optional<ProcessStatistics> *ProcStat,
BitVector *AffinityMask) {
assert(Redirects.empty() || Redirects.size() == 3);
ProcessInfo PI;
if (Execute(PI, Program, Args, Env, Redirects, MemoryLimit, ErrMsg,
AffinityMask, /*DetachProcess=*/false)) {
if (ExecutionFailed)
*ExecutionFailed = false;
ProcessInfo Result = Wait(
PI, SecondsToWait == 0 ? std::nullopt : std::optional(SecondsToWait),
ErrMsg, ProcStat);
return Result.ReturnCode;
}
if (ExecutionFailed)
*ExecutionFailed = true;
return -1;

I get error message Couldn't execute program 'clang++.exe': �������� (0x57). This is clearly an encoding error, as my console is UTF-8 encoded. After reviewing the relevant code, I've located the source of the error.

bool MakeErrMsg(std::string *ErrMsg, const std::string &prefix) {
if (!ErrMsg)
return true;
char *buffer = NULL;
DWORD LastError = GetLastError();
DWORD R = FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER |
FORMAT_MESSAGE_FROM_SYSTEM |
FORMAT_MESSAGE_MAX_WIDTH_MASK,
NULL, LastError, 0, (LPSTR)&buffer, 1, NULL);
if (R)
*ErrMsg = prefix + ": " + buffer;
else
*ErrMsg = prefix + ": Unknown error";
*ErrMsg += " (0x" + llvm::utohexstr(LastError) + ")";
LocalFree(buffer);
return R != 0;
}

This function uses the system's default ANSI codepage for formatting error messages. For my region (Simplified Chinese), this codepage is GBK, which results in encoding errors when the output is consumed by UTF-8 applications.

Ideally, I would like to retrieve the error message in UTF-8 encoding.

Considering that the prefix for the error message is already in English, should we consider forcing the returned system message to always be in English to avoid these encoding issues?

Alternatively, perhaps we could add a new parameter to allow users to request the error message in UTF-8 specifically?

My main motivation is to avoid writing platform-specific code in my own project. As a user of this library, I want to avoid explicitly checking for Windows (e.g., #ifdef _WIN32), including <Windows.h>, and handling the encoding conversion myself.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions