Skip to content

Conversation

@CarterLi
Copy link
Member

Note the AppleAccentColor part was not tested because the AppleAccentColor doesn't exist in the plist file

@CarterLi
Copy link
Member Author

image

Comment on lines 34 to 37
uint32_t bufSize = (uint32_t)strlen(str) + 1;
ffStrbufInitA(strbuf, FASTFETCH_STRBUF_DEFAULT_ALLOC > bufSize ? FASTFETCH_STRBUF_DEFAULT_ALLOC : bufSize);
memcpy(strbuf->chars, str, bufSize);
strbuf->length = bufSize - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really faster than an init and an append?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@LinusDierheimer LinusDierheimer Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, ffStrbufEnsureFree ensures that there is enough space to append the given length, including the null byte.

@CarterLi
Copy link
Member Author

Ok, I misunderstood.

However I'd say that the existing code is not optimised.

void ffStrbufAppendS(FFstrbuf* strbuf, const char* value)
{
    if(value == NULL)
        return;

    for(uint32_t i = 0; value[i] != '\0'; i++)
    {
        if(i % 16 == 0)
            ffStrbufEnsureFree(strbuf, 16);
        strbuf->chars[strbuf->length++] = value[i];
    }

    strbuf->chars[strbuf->length] = '\0';
}
  1. Reallocating in a loop is really not a good idea because memory allocating is slow. It's always better to pre-calculate the required buffer size.
  2. memcpy is always faster than hand coded plain loop ( if compiler failed to optimise the loop to a memcpy call ). A well coded memcpy ( such as the one provided in glibc ) uses SIMD instructions copies 16 ( with SSE2 ) or 32 ( with AVX ) bytes per loop, and can be about 10~20 times faster than hand coded plain loop which copies only 1 byte per loop. The larger the buffer is, the faster memcpy can be.

Quick benchmark: https://quick-bench.com/q/rv4D5N9zynZRRcWb2Td7y0BVneI

image

@LinusDierheimer
Copy link
Collaborator

Makes sense. Thanks for looking into this.

@LinusDierheimer LinusDierheimer merged commit 183c8d7 into fastfetch-cli:master Sep 20, 2022
@CarterLi
Copy link
Member Author

CarterLi commented Sep 20, 2022

ffStrbufAppendS can be further optimised by make it inline-able.

static inline void ffStrbufAppendS(FFstrbuf* strbuf, const char* value)
{
    if(value == NULL)
        return;

    ffStrbufAppendNS(strbuf, (uint32_t)strlen(value), value);
}

If the value is a string literal, its length can be known at compile time. In which case (uint32_t)strlen(value) can be completely optimised away ( it's doable because strlen is a builtin function ).

// Before compiling
ffStrbufAppendS(&strbuf, "123456789");

// After compiling
ffStrbufAppendNS(&strbuf, 9, "123456789");

@LinusDierheimer
Copy link
Collaborator

With the current implementation it might even make sense to force inline it.

@CarterLi
Copy link
Member Author

Quick test: https://gcc.godbolt.org/z/z3rrn9vfa

image

@LinusDierheimer
Copy link
Collaborator

I don't know enough much about that, but i don't think this applies to fastfetch, because the method implementation is in another compilation unit. Might not be a problem with IPO tho. If we want to make sure it is getting inlined, we'd have to implement it as inline method in the header.

@CarterLi
Copy link
Member Author

Yes. A inline-able function must be in the header. Just move them into FFstrbuf.h you are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants