-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added utf8islower, utf8isupper, utf8lwr, utf8upr #29
Conversation
@@ -150,7 +150,7 @@ utf8_nonnull utf8_pure utf8_weak void *utf8valid(const void *str); | |||
// Sets out_codepoint to the next utf8 codepoint in str, and returns the address | |||
// of the utf8 codepoint after the current one in str. | |||
utf8_nonnull utf8_weak void *utf8codepoint(const void *utf8_restrict str, | |||
long *utf8_restrict out_codepoint); | |||
int *utf8_restrict out_codepoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your thoughts on this change? I see no reason to return a variable of a type that could potentially be 8 bytes while we only need 4!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So int is not guaranteed to be 32 bits with C, whereas long is. Long will normally be sizeof(int) on my platforms I deal with (long long for 64 bits normally).
I think all the ints need to become longs, but apart from that the code is great thanks! |
The closed issue #10 gave some justification for int -> long. |
On which compilers is it 16 bit? There were other places where codepoints were passed as ints. On gcc 64 bit, |
@angelog I had some notion to support VS11 which didn't have stdint, but we could do better than long aye. I have code in utest.h that does;
We could bring that over and make them 32 instead of 64? |
I like that idea! I've added it to the PR. |
As discussed earlier in #25, I didn't mark these as "complete" in the readme because they really only cover the
A-Za-z
range.utf8islower
is dumb and only returns0
for every codepoint in theA-Z
range.The upside is that we can use these functions on any utf-8 string or codepoint and they will include any ascii characters.
Thoughts?