Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

Merged
merged 6 commits into from
Aug 23, 2016

Conversation

codecat
Copy link
Contributor

@codecat codecat commented Aug 18, 2016

As discussed earlier in #25, I didn't mark these as "complete" in the readme because they really only cover the A-Za-z range. utf8islower is dumb and only returns 0 for every codepoint in the A-Z range.

The upside is that we can use these functions on any utf-8 string or codepoint and they will include any ascii characters.

Thoughts?

@@ -150,7 +150,7 @@ utf8_nonnull utf8_pure utf8_weak void *utf8valid(const void *str);
// Sets out_codepoint to the next utf8 codepoint in str, and returns the address
// of the utf8 codepoint after the current one in str.
utf8_nonnull utf8_weak void *utf8codepoint(const void *utf8_restrict str,
long *utf8_restrict out_codepoint);
int *utf8_restrict out_codepoint);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your thoughts on this change? I see no reason to return a variable of a type that could potentially be 8 bytes while we only need 4!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So int is not guaranteed to be 32 bits with C, whereas long is. Long will normally be sizeof(int) on my platforms I deal with (long long for 64 bits normally).

@sheredom
Copy link
Owner

I think all the ints need to become longs, but apart from that the code is great thanks!

@sheredom
Copy link
Owner

The closed issue #10 gave some justification for int -> long.

@codecat
Copy link
Contributor Author

codecat commented Aug 20, 2016

On which compilers is it 16 bit? There were other places where codepoints were passed as ints.

On gcc 64 bit, long is a 64 bit integer. How about using stdint.h so int32_t can be used instead?

@sheredom
Copy link
Owner

@angelog I had some notion to support VS11 which didn't have stdint, but we could do better than long aye.

I have code in utest.h that does;

#if defined(_MSC_VER)
#define int64_t __int64
#define uint64_t unsigned __int64
#else
#include <stdint.h>
#endif

We could bring that over and make them 32 instead of 64?

@codecat
Copy link
Contributor Author

codecat commented Aug 22, 2016

I like that idea! I've added it to the PR.

@sheredom sheredom merged commit a754b6d into sheredom:master Aug 23, 2016
@codecat codecat deleted the lower-upper branch August 23, 2016 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants