Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple of thoughts #14

Closed
mlabbe opened this issue May 27, 2016 · 3 comments · Fixed by #24
Closed

Couple of thoughts #14

mlabbe opened this issue May 27, 2016 · 3 comments · Fixed by #24

Comments

@mlabbe
Copy link

mlabbe commented May 27, 2016

Hey, nice library! I am looking for utf-8 C string parsing and this fits the bill. I had a couple of thoughts after reading the code.

  • While you have dutifully re implemented string.h functions, some of them are considered harmful. Ex: utf8ncpy may not append a null terminating character. If you don't consider it too sanctimonious, supplying non-harmful functions and making people opt-in to the riskier ones (with an ifdef?) might be helpful. I suspect so, since safety is a concern of yours since you return void*.

For instance, a safer utf8ncpy function that guarantees a null terminator (possibly truncating last char) and returns boolean whether the string was truncated or not can be helpful, if certainly, not conformant with anything in string.h. Also, only filling one NULL character in at the end, because zeroing after termination is a waste of cycles. I have been using such a workhorse function for years.

  • Consider using restrict where available. This will shrink code size and reduce waits for memory accesses. Again, a non-conforming change in some applications since applying it will require parameters to not overlap, but that's going to be the case in real world situations anyway.
  • I would be happy to trap strange input parameters by defining an assert macro before including your header. For instance (and sorry to pick on utf8cpy), if the max number of elements is zero, you could check that in an assert which is a NO-OP unless defined by the caller prior to header inclusion.

Here is a snippet that lets you portably apply the restrict keyword if you're interested:

#if defined(__GNUC__) || defined(__clang__)
    #ifdef __cplusplus    
        #define utf8_restrict __restrict
    #else
        #if __STDC_VERSION__ >= 199901L
            #define utf8_restrict restrict
        #endif
    #endif
#elif defined(_MSC_VER) && (_MSC_VER >= 1400) /* vs2005 */
    #define utf8_restrict __restrict
#else
   #define utf8_restrict
#define

@mlabbe
Copy link
Author

mlabbe commented May 27, 2016

Here is my safer strncpy, verbatim paste from my core lib. If you want to adapt or use any part of this, I release it under the public domain.

The FTG_ATTRIBUTES() thing is a macro that generates a compiler warning if the truncation bit is ignored. Since it is a potential security risk if a string is truncated, I force a check. This is appropriate for my code. (I wish I could conditionally avoid this warning if *src was a literal).

/* Fill up to max_copy characters in dst, including null.  Unlike strncpy(), a null
   terminating character is guaranteed to be appended, EVEN if it overwrites
   the last character in the string.

   Only appends a single NULL character instead of NULL filling the string.  The 
   trailing bytes are left uninitialized.

   No bytes are written if max_copy is 0, and FTG_ASSERT is thrown.

   \return 1 on truncation or max_copy==0, zero otherwise.
                                                                                   */
FTG_ATTRIBUTES(FTG_EXT_warn_unused_result) int
ftg_strncpy(char *ftg_restrict dst, const char *ftg_restrict src, size_t max_copy)
{
    size_t n;
    char *d;

    FTG_ASSERT(dst);
    FTG_ASSERT(src);
    FTG_ASSERT(max_copy > 0);

    if (max_copy == 0)
        return 1;

    n = max_copy;
    d = dst;
    while ( n > 0 && *src != '\0' )    
    {
        *d++ = *src++;
        --n;
    }

    /* Truncation case -
       terminate string and return true */
    if ( n == 0 )
    {
        dst[max_copy-1] = '\0';
        return 1;
    }

    /* No truncation.  Append a single NULL and return. */
    *d = '\0';
    return 0;
}

@sheredom
Copy link
Owner

I'll take your comments on restrict on board for sure - I'm not certain I like the assert myself personally (but I'm willing to spend some brain cycles on the thought before I commit one way or the other!)

Thanks for the input though - all comments are good comments 😄

@sheredom
Copy link
Owner

Just for clarity - I've decided against adding the assert, but I have done the restrict change. Thanks again for spending time reviewing my lib!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants