New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster UTF8 strlen implementation (using DSP SIMD) #3479
Conversation
furi/core/string.c
Outdated
|
||
__uadd8(0xFFFFFFFF, vec); result = __sel(zero, one); | ||
if (result > 0) { | ||
if (result == 0x01000000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch of code can be removed. It doesn't give a good performance boost, but the code gets more complicated.
Looking at how the first loop always reads 4 bytes at a time - is it not at a risk of reading past the allocated memory bounds? |
Hi! As far as I know, there is no memory protection on the Flipper. Let knowledgeable people correct me. |
Cortex M4 has a memory protection unit and the stack area is protected, but those allocations are never on the stack - which means it'll be fine unless maybe the string lands at the very end of the mapped SRAM? Accessing memory within the 512MB SRAM bounds past the 256KB flipper has is likely going to result in a bus fault, but I have not verified that. |
Thanks for the answer!
When working on another commit (to support Unicode), I noticed that some firmware functions sometimes read a little more memory than allocated, so I thought it is safe.
Could you tell me some way to check this? |
It might be, it's just something that raised a red flag for me without having actually verified this on the Flipper.
Looking at the memory map of Cortex M3 (M4 has an identical map), you could just try reading past the bottom 256KB region of SRAM: That said - take a look at size_t max_len = string_size(str->string) >> 2;
while (max_len > 0) ? It's just a single extra comparison per loop iteration so it's unlikely to be a major hit, and it'll completely close this possible corner case. |
In theory, I could check the address of the pointer, would that be enough? |
Probably, but if you're going to insert an additional check to the loop, IMO it'd be best to just add the length check as I described above. |
Yes, thanks you! I will try to delve into this question and add a similar check. |
If I'm seeing it right, the bottom loop is already gathering the "remainder", so it might be enough to just change |
The problem seems to be not in the lower loop (it will stop when it sees \0), but in the upper one. When reading a 4-byte vector, the instruction may try to read values outside the memory boundary. This situation, of course, seems to me purely theoretical, since it will most likely mean that the device has run out of memory. |
My point is that the lower loop will already take care of the remainder of the string that is 3 characters or less, so the top loop can be changed from |
Oh, I didn't understand at first :) I thought we were still talking about memory limitations. We should try to measure the performance of this version. Thanx! |
It will be interesting to know what metrics you get - but the price of a single additional if check per the loop iteration should be negligible, even if on your test case it's going to be noticeable due to you doing a million iterations. |
The current thread stack is located on the heap, and there always will be a probability that it is located near the string, but this is not the problem, cos it is protected to "read-only mode". My concern is that you can hit core2-protected RAM. |
Ah true, I misunderstood the purpose of |
I think I need to rewrite this commit completely. I proceeded from the suggestion that we do not know the binary length of the string. The rewritten version should be simpler. |
I took a measurement, the speed did not change :) |
If you don't care about the validity of UTF-8 codepoints, then IIRC utf-8 strlen is as easy as (pseudocode)
This ends up counting all non-continuation bytes, but won't work correctly if the UTF-8 codepoints are malformed. |
It seems |
Undraft when ready. |
Hi! I've done! |
I'm curious, what's the current test benchmark result? Is it still 6-ish seconds? |
Oddly enough, yes. For some reason I thought it would be faster. |
So if it's not faster then why we need it? |
If I understood correctly, this implementation is not faster than the initial one that didn't check the binary length up-front and possibly accessed memory out of bounds. It is still much faster than the "naive" approach:
|
I'm very doubt that we need it: we don't work with strings a lot, and when we do they are quite small. At the same time I do things that you've made quite interesting thing and you should try to contribute it to arm dsp lib. |
That's a fair point, does CMSIS offer string manipulation? CMSIS-DSP doesn't seem to: https://www.keil.com/pack/doc/CMSIS/DSP/html/index.html |
@CookiePLMonster hmm, that was closest thing I was able to think of. Not sure where else it can be useful. |
Does mlib itself have platform-specific optimizations? If so, maybe they'd be interested in that which also means Flipper would eventually get this change back, just in mlib form. |
Not really, but you may try to ask m-lib author, maybe he will be interested in it |
Hi! I understand your doubts! I want to clarify two things for myself. If I wrote unit tests for this function and explained in detail how it works in the comment, would the commit be accepted? I'm working in another commit on Unicode support in Flipper and the API functions I'm rewriting have operations to get Unicode characters by index. Since, in my experience, these operations are always slower than binary operations, I thought I'd try to speed them up. So if there is hope that the commit can be accepted, I would like to extend the work on it. |
There are 2 groups for PRs like this one(low level, cryptic, blackmagic):
All optimizations of that kind requires unit tests with cross testing between optimized and un-optimized versions. The best option is to ping us in GH issues or on discord before starting anything like that. |
What's new
Verification
Checklist (For Reviewer)