This employs a number of tricks to make the inner loop faster:
1. **Elides bounds checks with `unsafe`**. This is the first use of
`unsafe` in regex. It is justified below with benchmarks, and comments
in the source make an argument for correctness.
2. Store meta data about states in the upper bits of a state pointer.
This reduces the amount of branching needed.
3. Create a inner inner loop that handles all transitions between
non-dead, non-match and non-start states. (i.e., The majority of cases.)
In particular, this lets us avoid having to check specifically whether
each state is a match state or not.
4. Start states are only treated specially if there is a prefix detected
that we should scan for. Otherwise, start states are no different than
any other state.
5. Move transitions from `State` and into one giant transition table,
which should hopefully improve locality and make better use of the
cache.
The use of `unsafe` is unfortunate, but it significantly reduces the
number of instructions executed in a search. When the DFA spends a lot
of time in the inner loop, eliding the bounds checks leads to better
performance. In most cases, the boost is worth about 5%, but in some
extreme cases (e.g., a match is the entirety of a large haystack), the
boost can be worth nearly 50%.
Here is a comparison between code without `unsafe` and with `unsafe`:
```
$ cargo-benchcmp rust-safe rust-unsafe --threshold 3
name rust-safe ns/iter rust-unsafe ns/iter diff ns/iter diff %
misc::anchored_literal_long_match 29 (13,448 MB/s) 26 (15,000 MB/s) -3 -10.34%
misc::anchored_literal_short_match 28 (928 MB/s) 26 (1,000 MB/s) -2 -7.14%
misc::easy0_1MB 49 (21,400,061 MB/s) 42 (24,966,738 MB/s) -7 -14.29%
misc::easy1_1K 79 (13,215 MB/s) 76 (13,736 MB/s) -3 -3.80%
misc::easy1_32 79 (658 MB/s) 76 (684 MB/s) -3 -3.80%
misc::easy1_32K 80 (409,850 MB/s) 76 (431,421 MB/s) -4 -5.00%
misc::hard_1K 104 (10,105 MB/s) 100 (10,510 MB/s) -4 -3.85%
misc::match_class_unicode 595 (270 MB/s) 571 (281 MB/s) -24 -4.03%
misc::medium_1MB 51 (20,560,862 MB/s) 44 (23,831,909 MB/s) -7 -13.73%
misc::no_exponential 378 (264 MB/s) 361 (277 MB/s) -17 -4.50%
misc::not_literal 206 (247 MB/s) 196 (260 MB/s) -10 -4.85%
misc::one_pass_long_prefix 116 (224 MB/s) 111 (234 MB/s) -5 -4.31%
misc::one_pass_long_prefix_not 116 (224 MB/s) 108 (240 MB/s) -8 -6.90%
misc::one_pass_short 81 (209 MB/s) 76 (223 MB/s) -5 -6.17%
misc::one_pass_short_not 79 (215 MB/s) 75 (226 MB/s) -4 -5.06%
misc::reallyhard_1K 3,796 (276 MB/s) 3,629 (289 MB/s) -167 -4.40%
misc::reallyhard_1MB 3,765,536 (278 MB/s) 3,602,215 (291 MB/s) -163,321 -4.34%
misc::reallyhard_32 234 (252 MB/s) 222 (265 MB/s) -12 -5.13%
misc::reallyhard_32K 117,917 (278 MB/s) 112,604 (291 MB/s) -5,313 -4.51%
misc::replace_all 144 137 -7 -4.86%
sherlock::before_holmes 2,163,856 (274 MB/s) 2,077,792 (286 MB/s) -86,064 -3.98%
sherlock::everything_greedy 3,641,444 (163 MB/s) 2,578,502 (230 MB/s) -1,062,942 -29.19%
sherlock::everything_greedy_nl 2,109,164 (282 MB/s) 1,080,933 (550 MB/s) -1,028,231 -48.75%
sherlock::holmes_coword_watson 1,087,276 (547 MB/s) 1,037,918 (573 MB/s) -49,358 -4.54%
sherlock::ing_suffix 2,419,816 (245 MB/s) 2,308,945 (257 MB/s) -110,871 -4.58%
sherlock::ing_suffix_limited_space 2,360,927 (251 MB/s) 2,259,791 (263 MB/s) -101,136 -4.28%
sherlock::letters 27,710,372 (21 MB/s) 25,348,374 (23 MB/s) -2,361,998 -8.52%
sherlock::letters_lower 26,888,541 (22 MB/s) 24,759,385 (24 MB/s) -2,129,156 -7.92%
sherlock::letters_upper 3,138,611 (189 MB/s) 2,989,327 (199 MB/s) -149,284 -4.76%
sherlock::line_boundary_sherlock_holmes 2,132,889 (278 MB/s) 2,046,399 (290 MB/s) -86,490 -4.06%
sherlock::name_alt1 35,964 (16,542 MB/s) 37,164 (16,008 MB/s) 1,200 3.34%
sherlock::name_whitespace 88,768 (6,702 MB/s) 85,322 (6,972 MB/s) -3,446 -3.88%
sherlock::quotes 800,085 (743 MB/s) 769,792 (772 MB/s) -30,293 -3.79%
sherlock::the_whitespace 1,315,168 (452 MB/s) 1,238,173 (480 MB/s) -76,995 -5.85%
sherlock::words 11,230,278 (52 MB/s) 9,855,296 (60 MB/s) -1,374,982 -12.24%
```