Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix false positives of IS_*() macros for 8-bit ASCII characters #5844

Closed
wants to merge 3 commits into from

Conversation

mspncp
Copy link
Contributor

@mspncp mspncp commented Apr 2, 2018

Fixes #5778, #5840

The various IS_*() macros did not work correctly for 8-bit ASCII characters with the high bit set, because the CVT(a) preprocessor macro and'ed the given ASCII value with 0x7F, effectively folding the high value range 128-255 over the low value range 0-127. As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by mapping CVT(a) to 127 for all values a >= 127. This works because the lookup tables CONF_type_default and CONF_type_win32 have all bits cleared at entry 127, whence IS_*(a) returns FALSE for all values outside the range 0-127.

The IS_*() macros were also changed to return TRUE or FALSE (1 or 0) instead of a nonzero or zero value.

Note that with this change, the macro IS_*(c,a) evaluates the a argument twice, unless CHARSET_EBCDIC is defined. This prohibits the use of arguments with side effects like IS_*(c, *p++).

Fixes openssl#5778, openssl#5840

The various IS_*() macros did not work correctly for 8-bit ASCII
characters with the high bit set, because the CVT(a) preprocessor
macro and'ed the given ASCII value with 0x7F, effectively folding
the high value range 128-255 over the low value range 0-127.
As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by mapping CVT(a) to 127 for all values
a >= 127. This works because the lookup tables CONF_type_default
and CONF_type_win32 have all bits cleared at entry 127, whence
IS_*(a) returns FALSE for all values outside the range 0-127.

The IS_*() macros were also changed to return TRUE or FALSE (1 or 0)
instead of a nonzero or zero value.

Note that with this change, the macro IS_*(c,a) evaluates the 'a'
argument twice, unless CHARSET_EBCDIC is defined. This prohibits the
use of arguments with side effects like IS_*(c, *p++).
@mspncp mspncp requested a review from mattcaswell April 2, 2018 21:59
@mspncp mspncp added this to the 1.1.1 milestone Apr 2, 2018
@mspncp mspncp added the branch: master Merge to master branch label Apr 2, 2018
/*
* Attention: because the macro argument 'a' is evaluated twice in CVT(a),
* it is not allowed pass 'a' arguments with side effects to IS_*(c,a)
* like for example IS_*(c, *p++).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I think that introducing this kind of fragility is the wrong way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But note that this header file is used/included in only one internal source file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only shoot our own kneecaps off... still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're stuck with moving forward in pieces, I guess, because we had to remove the previous code.

@paulidale
Copy link
Contributor

How much of this overlaps withcrypto/ctype.h ? Could this functionality be wrapped up by the former?

@richsalz
Copy link
Contributor

richsalz commented Apr 3, 2018

There's an open issue on this and ctype. I forget the number. There are some complications (one assumes ascii only?) but it could be worked around.

@richsalz
Copy link
Contributor

richsalz commented Apr 3, 2018

I suppose another option is to make the table be 256 entries and fill the top 128 with 0

@paulidale
Copy link
Contributor

#5840 ? crypto/ctype.h knows about EBCDIC. It does limit itself to 7 bit characters once EBCDIC is dealt with.

@richsalz
Copy link
Contributor

richsalz commented Apr 3, 2018

No, #5672 (which I created!!) which refers to a comment in #5669.

@paulidale
Copy link
Contributor

Glibc uses a table 192 long which goes from -128 to 255 to deal with signed chars quickly.
I'm not sure this is necessary here. Checking the eighth bit is sufficient to catch both cases quickly.

@mspncp
Copy link
Contributor Author

mspncp commented Apr 3, 2018

I suppose another option is to make the table be 256 entries and fill the top 128 with 0

This sounds like a good idea. But I'll won't be able to do it before this evening, because I'm traveliing today.

@levitte
Copy link
Member

levitte commented Apr 3, 2018

I just had a look. It's not quite as simple as one might think...

...to get rid of double argument evaluation in the CVT() macro.
@mspncp
Copy link
Contributor Author

mspncp commented Apr 4, 2018

I just had a look. It's not quite as simple as one might think...

Richard, I am aware that the macros IS_ALNUM*() do not work correctly for utf-8 characters (in particular german umlauts ;-) ), But according to @mattcaswell, it was not a requirement that these macros support utf-8 encodings in general (#5778 (comment)). So maybe it is only the exaggerated title of the commit message that disturbs you?

Make the IS_*() macros work correctly for 8-bit ASCII characters

What I fixed was only the false positives caused by the fact that the range 0-255 was mapped two-to-one onto 0-127. So would you accept this pull request if I change the title of the commit message to something like the following?

Fix false positives of IS_*() macros for 8-bit ASCII characters

@mspncp
Copy link
Contributor Author

mspncp commented Apr 4, 2018

Or would you prefer if I fix the IS_*() macros to work correctly for all single byte utf-8 characters?

@levitte
Copy link
Member

levitte commented Apr 4, 2018

Sorry, my comment was about trying to use our ctype.h instead. That is a harder endeavor than one might think at first.

This PR looks good to me at this point. However, I cannot be the one to approve it in this particular case. It has to be someone else.

@mspncp
Copy link
Contributor Author

mspncp commented Apr 4, 2018

This will be the final commit message:

Fix false positives of IS_*() macros for 8-bit ASCII characters

Fixes #5778, #5840

The various IS_*() macros did not work correctly for 8-bit ASCII
characters with the high bit set, because the CVT(a) preprocessor
macro and'ed the given ASCII value with 0x7F, effectively folding
the high value range 128-255 over the low value range 0-127.
As a consequence, some of the IS_*() erroneously returned TRUE.

This commit fixes the issue by enlarging the arrays CONF_type_default
and CONF_type_win32 to 256 entries, filling the upper 128 entries
with zeroes. In other words, IS_*(c, a) returns FALSE for all
arguments 128 <= a < 256.

The IS_*() macros were also changed to return TRUE or FALSE (1 or 0)
instead of a nonzero or zero value.

ping @mattcaswell

@mattcaswell
Copy link
Member

ping @mattcaswell

Thanks for all your work on this!! As I removed the original code I think someone from the OMC who was not involved with that removal should approve this. Maybe one of @vdukhovni, @iamamoose or @ekasper could do this?

@mattcaswell
Copy link
Member

Or @kroeckx, or @t-j-h!

@mspncp mspncp changed the title Make the IS_*() macros work correctly for 8-bit ASCII characters Fix false positives of IS_*() macros for 8-bit ASCII characters Apr 4, 2018
@mspncp
Copy link
Contributor Author

mspncp commented Apr 4, 2018

I will also wait for @petrovr to confirm that this pull request indeed fixes #5840 .

@petrovr
Copy link

petrovr commented Apr 4, 2018

I think that mask in CVT is required, i.e. CVT(a) ((unsigned char)(a) & 0xFF) - on some platforms char is not 8-bit.
I'm not sure why TRUE or FALSE is preferred instead non-zero and zero.

Everything that restrict those IS_* macros to work only on 7-bit ASCII characters is solution.
In particular #5844 should resolve #5840 .

Copy link
Member

@t-j-h t-j-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't perfect and it would be good to see our ctype used - but this gets us back to where we should be from which we can improve things going forward.

@mattcaswell mattcaswell added the approval: done This pull request has the required number of approvals label Apr 6, 2018
@mspncp
Copy link
Contributor Author

mspncp commented Apr 7, 2018

I think that mask in CVT is required, i.e. CVT(a) ((unsigned char)(a) & 0xFF) - on some platforms char is not 8-bit.

@petrovr, thank you for pointing out a possible problem if CHAR_BIT > 8. Thinking about it for a while, I noticed that your suggestion to use the 0xFF mask reintroduces exactly the same problem which this pull request tries to fix: If you take a 'char' with more than 8 bits and mask it with 0xFF, then you might end up with a 7 bit ASCII char and false positives for IS_*() again.

The only solution is a correct range check, which was not possible using preprocessor macros, without evaluating the macro argument multiple times. So solve this dilemma, I moved a lot of the implementation into a static function.

For better comparison, I created a separate pull request, see #5903.

@mspncp
Copy link
Contributor Author

mspncp commented Apr 8, 2018

Closed in favour of #5903.

@mspncp mspncp closed this Apr 8, 2018
@petrovr
Copy link

petrovr commented Apr 9, 2018

10x Matthias.
I did some test with static inline function but due to Easter holidays I was not able to share results.
So #5903 was the right solution.

@mspncp mspncp deleted the pr-fix-is-macros branch April 10, 2018 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: done This pull request has the required number of approvals branch: master Merge to master branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 8-bit characters in conf_def.h
7 participants