Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

codecat · 2016-08-18T17:49:47Z

As discussed earlier in #25, I didn't mark these as "complete" in the readme because they really only cover the A-Za-z range. utf8islower is dumb and only returns 0 for every codepoint in the A-Z range.

The upside is that we can use these functions on any utf-8 string or codepoint and they will include any ascii characters.

Thoughts?

… bit integer issues

codecat · 2016-08-18T17:57:20Z

utf8.h

@@ -150,7 +150,7 @@ utf8_nonnull utf8_pure utf8_weak void *utf8valid(const void *str);
 // Sets out_codepoint to the next utf8 codepoint in str, and returns the address
 // of the utf8 codepoint after the current one in str.
 utf8_nonnull utf8_weak void *utf8codepoint(const void *utf8_restrict str,
-                                           long *utf8_restrict out_codepoint);
+                                           int *utf8_restrict out_codepoint);


Your thoughts on this change? I see no reason to return a variable of a type that could potentially be 8 bytes while we only need 4!

So int is not guaranteed to be 32 bits with C, whereas long is. Long will normally be sizeof(int) on my platforms I deal with (long long for 64 bits normally).

sheredom · 2016-08-20T22:52:14Z

I think all the ints need to become longs, but apart from that the code is great thanks!

sheredom · 2016-08-20T22:52:49Z

The closed issue #10 gave some justification for int -> long.

codecat · 2016-08-20T22:57:42Z

On which compilers is it 16 bit? There were other places where codepoints were passed as ints.

On gcc 64 bit, long is a 64 bit integer. How about using stdint.h so int32_t can be used instead?

sheredom · 2016-08-22T08:11:36Z

@angelog I had some notion to support VS11 which didn't have stdint, but we could do better than long aye.

I have code in utest.h that does;

#if defined(_MSC_VER)
#define int64_t __int64
#define uint64_t unsigned __int64
#else
#include <stdint.h>
#endif

We could bring that over and make them 32 instead of 64?

…from stdint.h)

codecat · 2016-08-22T18:37:17Z

I like that idea! I've added it to the PR.

codecat added 5 commits August 18, 2016 19:33

Added utf8lwr and utf8upr

bd7d201

Add utf8isupper and utf8islower

268e4de

Documentation and small tweak

f1a92df

Updated readme

bcedf34

Changed utf8codepoint return value to int instead of long to avoid 64…

a739d84

… bit integer issues

codecat reviewed Aug 18, 2016
View reviewed changes

Changed all codepoint types to int32_t (either explicitly defined or …

538c128

…from stdint.h)

sheredom merged commit a754b6d into sheredom:master Aug 23, 2016

codecat deleted the lower-upper branch August 23, 2016 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

codecat commented Aug 18, 2016

codecat Aug 18, 2016

sheredom Aug 20, 2016

sheredom commented Aug 20, 2016

sheredom commented Aug 20, 2016

codecat commented Aug 20, 2016

sheredom commented Aug 22, 2016

codecat commented Aug 22, 2016

Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

Added utf8islower, utf8isupper, utf8lwr, utf8upr #29

Conversation

codecat commented Aug 18, 2016

codecat Aug 18, 2016

Choose a reason for hiding this comment

sheredom Aug 20, 2016

Choose a reason for hiding this comment

sheredom commented Aug 20, 2016

sheredom commented Aug 20, 2016

codecat commented Aug 20, 2016

sheredom commented Aug 22, 2016

codecat commented Aug 22, 2016