Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix rvv utf8 validation bug #381

Merged
merged 2 commits into from
Mar 25, 2024
Merged

fix rvv utf8 validation bug #381

merged 2 commits into from
Mar 25, 2024

Conversation

camel-cdr
Copy link
Contributor

Thanks to @OMaghiarIMGs comment, I looked over the rvv utf8 validation code again, and found a bug in it.
The ASCII fast path must make sure that the next three bytes are valid.
An easy way to do that is to check if they are also ASCII, this is what this patch does.
An alternative would be to subtract three from vl, but then you would need to avoid the possibility of such code turning into an infinite loop, when vl<=3. The first fix seemed more straight forward.

I also cleaned up the tail variable. It was used for three separate things, and two of those could've used a smaller value.

@lemire
Copy link
Member

lemire commented Mar 24, 2024

@camel-cdr If you wanted to demonstrate the bug, what is the simplest example you'd use?

(I would like to have a test that goes with this bug because bugs have a way of coming back.)

@camel-cdr
Copy link
Contributor Author

camel-cdr commented Mar 24, 2024

@lemire It initially got caught in convert_utf8_to_utf16le_tests, once I changed the tail value, it was finicky though and didn't occur with different VLEN.
The best way to do a more focused test is probably starting with an all ASCII input and corrupting a single random byte.
I'll add a test to this commit, but should it only be in the utf8 validation, or in all things that need to validate utf8?

@lemire
Copy link
Member

lemire commented Mar 24, 2024

I'll add a test to this commit, but should it only be in the utf8 validation, or in all things that need to validate utf8?

I am somewhat surprised that we did not catch this issue.

If you add just one test that would fail before we fix this bug, I will be quite happy. Of course, feel free to add a few more.

It is not difficult to add tests, so go go go!

But just enough to catch this one bug would be fantastic.

@camel-cdr
Copy link
Contributor Author

@lemire This inspired me to make the test more general and work for all fastpaths.
The new tests take together about 8 seconds to run in qemu, so it should increase the test time by much.
They find the bug after 500 from 10000 iterations.

@lemire
Copy link
Member

lemire commented Mar 24, 2024

@camel-cdr Let us see if our CI succeeds. We can always adjust later.

@lemire lemire merged commit 4ad4984 into simdutf:master Mar 25, 2024
45 checks passed
@lemire
Copy link
Member

lemire commented Mar 25, 2024

Merging. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants