Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

Closed
wants to merge 5 commits into from

Conversation

PossiblyAShrub
Copy link
Collaborator

I've found a way to exhaustively test all the interesting cases for our UTF-8 decoder which is only ~9 million byte sequences. My reasoning for that is covered in the header of data_lang/utf8_test.cc

These tests required some meta-programming so I could avoid lots of manual bit-math. They are generated from bit patterns like 1111 xxxx which denotes the byte range 0xF0 to 0xFF. This is done using the data_lang/utf8_decoder_tests_gen.py script which is invoked at build time.

I've been running the tests as so:

ninja _bin/cxx-asan/data_lang/utf8_test
./_bin/cxx-asan/data_lang/utf8_test -t utf8_decoder
# or just ./_bin/cxx-asan/data_lang/utf8_test for all tests

They should be run in CI as well.

@andychu
Copy link
Contributor

andychu commented Apr 27, 2024

OK I poked at this slightly, I ran the opt version of the utf8_test and it took 2 seconds, which is fast enough

I merged master into the branch

@andychu
Copy link
Contributor

andychu commented Apr 27, 2024

Hm one thing is I wonder if we can generate a C++ data file that different TEST() blocks read, rather than the .inc file with TEST() blocks

I think this is cleaner for tests


I will have to understand the strategy a little more ... I thought there were SOME cases that could be done "brute force", and then maybe 1 case that needs to be clever about not being too slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants