Avoid ctype abuse. #2341

riastradh · 2024-04-27T20:11:20Z

afh

Thank you for this PR @riastradh, very much appreciated!
Would you be willing to also add tests for the changes? I think that tests would help explain the issue(s) with the code prior to this PR and avoid regressions in the future.

afh · 2024-04-28T18:53:35Z

src/amount.cc

@@ -1034,7 +1035,7 @@ bool amount_t::parse(std::istream& in, const parse_flags_t& flags)
    commodity_t::parse_symbol(in, symbol);

    if (! in.eof() && ((n = static_cast<char>(in.peek())) != '\n')) {
-      if (std::isspace(static_cast<char>(in.peek())))
+      if (std::isspace(in.peek()))


Can you provide more context please, as to why the static_cast should be removed here, please?

Suppose we're on a platform where char is signed and EOF is -1.

in.peek() returns an int, and the set of values that it can return is: {-1, 0, 1, 2, 3, 4, ..., 254, 255}. (That's 257 distinct possible values.) This set of values is always valid for std::isspace -- they're designed to be used together like this! -- so the cast is unnecessary.

What happens if we cast anyway?

Of that set, values in the subset {128, 129, 130, ..., 254, 255} are mapped by static_cast<char> to {-128, -127, -126, ..., -2, -1}.

Of that set, only -1 (coming from 255) is a valid input to std::isspace, because it coincides with the value of EOF, but that's probably not what is intended. (It might work by accident here, but if we were asking std::isalpha in an ISO-8859-1 locale, it should return true for 255 but false for EOF.)

All the other possible values 128 and over that in.peek() can return lead to passing values in {-128, -127, ..., -3, -2} to std::isspace, which is undefined behaviour.

So not only is the static_cast<char> here unnecessary, but it's actually harmful.

That's helpful, thanks! Do you know any platforms where EOF is -1? Once I again, I'd like to kindly ask for tests that catch ledger's current misbehaviour and ensures regressions to this fix are caught. Would you be willing to do that? If there are questions on how to write tests for ledger, I'm happy to help.

Do you know any platforms where EOF is -1?

I don't know any platforms where EOF isn't -1. In principle such platforms might exist and still conform to the C standard. But it's -1 on all the BSDs, on glibc, and on macOS.

afh · 2024-04-28T18:55:15Z

src/option.cc

+          *r++ = static_cast<char>(std::tolower(
+            static_cast<unsigned char>(*q)));


Is first casting to unsigned char and then casting to char intentional here? If yes, what are the implications?

It is intentional because the only change I made was to replace std::tolower(*q) by std::tolower(static_cast<unsigned char>(*q)); I preserved the context.

It may be that you can safely remove the static_cast<char>(...) around it. EOF is not a possible output of tolower here, because EOF is not a possible input coming via static_cast<unsigned char>(...), so there's no risk of losing information from a non-injective conversion of int to char. But it might trigger warnings about implicit signedness conversions.

I left it as is to make auditing this change easier: keeping the outer static_cast<char> can't make the code worse. If you want to remove it, that's probably fine; I just felt it would be better to defer to a separate change.

Thanks for taking the time to reply in detail, @riastradh. I agree with your reasoning.

riastradh · 2024-04-28T19:55:46Z

Thank you for this PR @riastradh, very much appreciated! Would you be willing to also add tests for the changes? I think that tests would help explain the issue(s) with the code prior to this PR and avoid regressions in the future.

It would be nice to make sure all of these cases are exercised, and normally for bug fixes like this I would prefer to add an xfail test in one commit and fix the bug in another commit. However, the symptom of the undefined behaviour is nondeterministic: you're reading out whatever random data happens to precede the ctype/tolower/toupper tables in memory. So it might sometimes return the right answers, and might sometimes return the wrong ones. Or it might crash. Or demons might fly out of your nose. But I don't think the ledger test harness has a way to verify nasal demons.

That said: the tests have been failing on NetBSD for a while, and I just never got around to tracking down the cause. It turns out that this change together with one other fix (for a GCC libstdc++ bug, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114879, which breaks std::ios::sync_with_stdio(false) on NetBSD, https://gnats.NetBSD.org/58206) resolves all the test failures on my NetBSD system. I'm also considering committing changes to NetBSD to guarantee that ctype abuse triggers immediate SIGSEGV, rather than silent data corruption (https://gnats.NetBSD.org/58208).

So while the existing ledger tests might not exercise all possible ctype abuse (and there's a lot -- there are many calls to ctype functions in ledger), they do seem to exercise enough to raise an issue.

riastradh · 2024-08-06T22:26:04Z

Ping? Is there any chance of merging this?

The bug (#2340) is causing actual problems in my real-world use of ledger to maintain records with Chinese and Japanese names (#2338), so I have to carry around the patch and I would like to avoid carrying it around. The existing automatic tests already catch many instances of the problem on NetBSD.

jwiegley · 2024-08-07T00:23:55Z

I'd love to merge it, but there is a CI build failure yet to be addressed.

fix ledger#2338 fix ledger#2340

riastradh · 2024-08-07T00:32:39Z

I'd love to merge it, but there is a CI build failure yet to be addressed.

Thanks!

I checked the build failure at https://github.com/ledger/ledger/actions/runs/8862348541/job/24350665848?pr=2341 and all I found was:

Error: The log was not found. It may have been deleted based on retention settings.

The build appears to have taken 1s, so I don't think it actually got to any part of the build that this change affected -- more likely a transient failure in github actions.

So I updated the patch and pushed again, but it looks like it requires maintainer approval to rerun the build. Can you push that button?

riastradh · 2024-08-07T00:50:47Z

Thanks, looks like it was, as I guessed, a transient github actions failure -- all checks passed this time around.

riastradh force-pushed the riastradh-20240427-issue2340-ctypeabuse branch 3 times, most recently from beb7ca7 to 789d306 Compare April 27, 2024 20:18

afh self-assigned this Apr 28, 2024

afh reviewed Apr 28, 2024

View reviewed changes

riastradh mentioned this pull request Jun 28, 2024

Encoding error #2338

Closed

Avoid ctype abuse.

d87dee4

fix ledger#2338 fix ledger#2340

riastradh force-pushed the riastradh-20240427-issue2340-ctypeabuse branch from 789d306 to d87dee4 Compare August 7, 2024 00:28

jwiegley approved these changes Aug 7, 2024

View reviewed changes

jwiegley merged commit d9967c2 into ledger:master Aug 7, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid ctype abuse. #2341

Avoid ctype abuse. #2341

riastradh commented Apr 27, 2024

afh left a comment

afh Apr 28, 2024

riastradh Apr 28, 2024 •

edited

Loading

afh Apr 28, 2024

riastradh Apr 28, 2024

afh Apr 28, 2024

riastradh Apr 28, 2024 •

edited

Loading

afh Apr 28, 2024

riastradh commented Apr 28, 2024

riastradh commented Aug 6, 2024

jwiegley commented Aug 7, 2024

riastradh commented Aug 7, 2024

riastradh commented Aug 7, 2024

		*r++ = static_cast<char>(std::tolower(
		static_cast<unsigned char>(*q)));

Avoid ctype abuse. #2341

Avoid ctype abuse. #2341

Conversation

riastradh commented Apr 27, 2024

afh left a comment

Choose a reason for hiding this comment

afh Apr 28, 2024

Choose a reason for hiding this comment

riastradh Apr 28, 2024 • edited Loading

Choose a reason for hiding this comment

afh Apr 28, 2024

Choose a reason for hiding this comment

riastradh Apr 28, 2024

Choose a reason for hiding this comment

afh Apr 28, 2024

Choose a reason for hiding this comment

riastradh Apr 28, 2024 • edited Loading

Choose a reason for hiding this comment

afh Apr 28, 2024

Choose a reason for hiding this comment

riastradh commented Apr 28, 2024

riastradh commented Aug 6, 2024

jwiegley commented Aug 7, 2024

riastradh commented Aug 7, 2024

riastradh commented Aug 7, 2024

riastradh Apr 28, 2024 •

edited

Loading

riastradh Apr 28, 2024 •

edited

Loading