Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<cctype> abuse #2340

Open
riastradh opened this issue Apr 27, 2024 · 0 comments · May be fixed by #2341
Open

<cctype> abuse #2340

riastradh opened this issue Apr 27, 2024 · 0 comments · May be fixed by #2341

Comments

@riastradh
Copy link

Various parts of ledger pass values of type char, derived from arbitrary user input, into the standard C++ functions std::isspace, std::isalpha, &c. (which are just copies of the standard C <ctype.h> functions). For example:

while (len > 0 && std::isspace(line[len - 1])) // strip trailing whitespace

Here line[...] has type char.

The ctype functions take inputs of type int. But when the input to the functions is not representable by type unsigned char -- that is, when the value represented by the object of type char is negative -- and the value is not the value of the integer constant macro EOF, this is undefined behaviour:

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

ISO C11, Sec. 7.4 Character handling <ctype.h>

For example, given char x[2] = {0xa2,0}, which represents a NUL-terminated string of POUND SIGN (£) in the locale en_GB.ISO8859-1, on platforms where char is signed such as the usual amd64 ABI and EOF is -1, isspace(x[0]) is undefined behaviour because the value of x[0] is -94, which is neither the value of EOF nor representable by the type unsigned char. (It can be converted to unsigned char, giving the different number 162.)

The undefined behaviour may manifest as silent wrong answers, as in #2338, or as crashes if the virtual page preceding a ctype table is mapped unreadable. Even in cases where it is not undefined behaviour, the answer may be wrong. For example, given char x[2] = {0xff, 0}, which might represent a NUL-terminated string of LATIN SMALL LETTER Y WITH DIAERESIS (ÿ) in the locale fr_FR.ISO8859-1, isalpha(x[0]) will return the same result as isspace(EOF) on most systems (where EOF is -1), giving false when the correct answer is true.

The reason for this peculiar interface is that the ctype functions were intended to handle the results of C functions like fgetc, or C++ functions like std::istream::peek, which return values of type int, which can be either EOF (typically -1) or a value that is representable in unsigned char (0, 1, 2, ..., 255):

std::istream in = ...;
int ch;

while ((ch = in.peek()) != EOF) {
  if (std::isspace(ch)) ...
}

Using them to process elements of a char array or std::string generally requires casting the char to unsigned char first:

std::string s = ...;

if (std::isspace(s[i])) { ... } // almost always wrong
if (std::isspace(static_cast<unsigned char>(s[i]))) { ... } // guaranteed safe

Even in cases where an input of type char might be guaranteed, by the surrounding context, to lie in the range {0, 1, 2, ..., 127} instead of {-128, -127, ..., -1, 0, 1, 2, ..., 127}, it is (a) still safe to cast to unsigned char first, (b) easier to audit.


(Some uses of the ctype macros might be inappropriate anyway to process UTF-8-encoded input (or possibly other multibyte input, if ledger supports that). But, if that is an issue, it is a different issue that can be resolved separately. This issue is just about the abuse of ctype functions for inputs of type char.)

riastradh pushed a commit to riastradh/ledger that referenced this issue Apr 27, 2024
@riastradh riastradh linked a pull request Apr 27, 2024 that will close this issue
riastradh pushed a commit to riastradh/ledger that referenced this issue Apr 27, 2024
riastradh pushed a commit to riastradh/ledger that referenced this issue Apr 27, 2024
riastradh pushed a commit to riastradh/ledger that referenced this issue Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant