[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

PossiblyAShrub · 2024-04-11T22:55:11Z

I've found a way to exhaustively test all the interesting cases for our UTF-8 decoder which is only ~9 million byte sequences. My reasoning for that is covered in the header of data_lang/utf8_test.cc

These tests required some meta-programming so I could avoid lots of manual bit-math. They are generated from bit patterns like 1111 xxxx which denotes the byte range 0xF0 to 0xFF. This is done using the data_lang/utf8_decoder_tests_gen.py script which is invoked at build time.

I've been running the tests as so:

ninja _bin/cxx-asan/data_lang/utf8_test
./_bin/cxx-asan/data_lang/utf8_test -t utf8_decoder
# or just ./_bin/cxx-asan/data_lang/utf8_test for all tests

They should be run in CI as well.

…ecoders

… is missing

andychu · 2024-04-27T00:17:00Z

OK I poked at this slightly, I ran the opt version of the utf8_test and it took 2 seconds, which is fast enough

I merged master into the branch

andychu · 2024-04-27T00:19:55Z

Hm one thing is I wonder if we can generate a C++ data file that different TEST() blocks read, rather than the .inc file with TEST() blocks

I think this is cleaner for tests

I will have to understand the strategy a little more ... I thought there were SOME cases that could be done "brute force", and then maybe 1 case that needs to be clever about not being too slow

PossiblyAShrub and others added 5 commits April 11, 2024 16:38

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 d…

d92f8e4

…ecoders

Make test gen script use bash over sh

db13299

Remove <(...) subst in gen script (I guess CI bash doesn't support it?)

a5ac4a5

CI doesn't support clang-format, don't fail the test gen script if it…

53b71a8

… is missing

Merge branch 'master' into exhaustive-utf8

40dc3e9

PossiblyAShrub closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

PossiblyAShrub commented Apr 11, 2024

andychu commented Apr 27, 2024

andychu commented Apr 27, 2024

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

[test] Write exhaustive tests over the bjorn dfa and crockford utf8 decoders #1941

Conversation

PossiblyAShrub commented Apr 11, 2024

andychu commented Apr 27, 2024

andychu commented Apr 27, 2024