New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix false positives of IS_*() macros for 8-bit ASCII characters #5844
Conversation
Fixes openssl#5778, openssl#5840 The various IS_*() macros did not work correctly for 8-bit ASCII characters with the high bit set, because the CVT(a) preprocessor macro and'ed the given ASCII value with 0x7F, effectively folding the high value range 128-255 over the low value range 0-127. As a consequence, some of the IS_*() erroneously returned TRUE. This commit fixes the issue by mapping CVT(a) to 127 for all values a >= 127. This works because the lookup tables CONF_type_default and CONF_type_win32 have all bits cleared at entry 127, whence IS_*(a) returns FALSE for all values outside the range 0-127. The IS_*() macros were also changed to return TRUE or FALSE (1 or 0) instead of a nonzero or zero value. Note that with this change, the macro IS_*(c,a) evaluates the 'a' argument twice, unless CHARSET_EBCDIC is defined. This prohibits the use of arguments with side effects like IS_*(c, *p++).
crypto/conf/conf_def.h
Outdated
/* | ||
* Attention: because the macro argument 'a' is evaluated twice in CVT(a), | ||
* it is not allowed pass 'a' arguments with side effects to IS_*(c,a) | ||
* like for example IS_*(c, *p++). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I think that introducing this kind of fragility is the wrong way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But note that this header file is used/included in only one internal source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we only shoot our own kneecaps off... still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're stuck with moving forward in pieces, I guess, because we had to remove the previous code.
How much of this overlaps with |
There's an open issue on this and ctype. I forget the number. There are some complications (one assumes ascii only?) but it could be worked around. |
I suppose another option is to make the table be 256 entries and fill the top 128 with 0 |
#5840 ? |
Glibc uses a table 192 long which goes from -128 to 255 to deal with signed chars quickly. |
This sounds like a good idea. But I'll won't be able to do it before this evening, because I'm traveliing today. |
I just had a look. It's not quite as simple as one might think... |
...to get rid of double argument evaluation in the CVT() macro.
Richard, I am aware that the macros IS_ALNUM*() do not work correctly for utf-8 characters (in particular german umlauts ;-) ), But according to @mattcaswell, it was not a requirement that these macros support utf-8 encodings in general (#5778 (comment)). So maybe it is only the exaggerated title of the commit message that disturbs you?
What I fixed was only the false positives caused by the fact that the range 0-255 was mapped two-to-one onto 0-127. So would you accept this pull request if I change the title of the commit message to something like the following?
|
Or would you prefer if I fix the IS_*() macros to work correctly for all single byte utf-8 characters? |
Sorry, my comment was about trying to use our ctype.h instead. That is a harder endeavor than one might think at first. This PR looks good to me at this point. However, I cannot be the one to approve it in this particular case. It has to be someone else. |
This will be the final commit message:
ping @mattcaswell |
Thanks for all your work on this!! As I removed the original code I think someone from the OMC who was not involved with that removal should approve this. Maybe one of @vdukhovni, @iamamoose or @ekasper could do this? |
I think that mask in CVT is required, i.e. CVT(a) ((unsigned char)(a) & 0xFF) - on some platforms char is not 8-bit. Everything that restrict those IS_* macros to work only on 7-bit ASCII characters is solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't perfect and it would be good to see our ctype used - but this gets us back to where we should be from which we can improve things going forward.
@petrovr, thank you for pointing out a possible problem if CHAR_BIT > 8. Thinking about it for a while, I noticed that your suggestion to use the 0xFF mask reintroduces exactly the same problem which this pull request tries to fix: If you take a 'char' with more than 8 bits and mask it with 0xFF, then you might end up with a 7 bit ASCII char and false positives for IS_*() again. The only solution is a correct range check, which was not possible using preprocessor macros, without evaluating the macro argument multiple times. So solve this dilemma, I moved a lot of the implementation into a static function. For better comparison, I created a separate pull request, see #5903. |
Closed in favour of #5903. |
10x Matthias. |
Fixes #5778, #5840
The various
IS_*()
macros did not work correctly for 8-bit ASCII characters with the high bit set, because theCVT(a)
preprocessor macro and'ed the given ASCII value with 0x7F, effectively folding the high value range 128-255 over the low value range 0-127. As a consequence, some of theIS_*()
erroneously returned TRUE.This commit fixes the issue by mapping CVT(a) to 127 for all values a >= 127. This works because the lookup tables CONF_type_default and CONF_type_win32 have all bits cleared at entry 127, whence IS_*(a) returns FALSE for all values outside the range 0-127.
The IS_*() macros were also changed to return TRUE or FALSE (1 or 0) instead of a nonzero or zero value.
Note that with this change, the macro
IS_*(c,a)
evaluates thea
argument twice, unless CHARSET_EBCDIC is defined. This prohibits the use of arguments with side effects likeIS_*(c, *p++)
.