Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the DFA inner loop. #202

Merged
merged 1 commit into from
Apr 13, 2016
Merged

Optimize the DFA inner loop. #202

merged 1 commit into from
Apr 13, 2016

Commits on Apr 13, 2016

  1. Optimize the DFA inner loop.

    This employs a number of tricks to make the inner loop faster:
    
    1. **Elides bounds checks with `unsafe`**. This is the first use of
    `unsafe` in regex. It is justified below with benchmarks, and comments
    in the source make an argument for correctness.
    2. Store meta data about states in the upper bits of a state pointer.
    This reduces the amount of branching needed.
    3. Create a inner inner loop that handles all transitions between
    non-dead, non-match and non-start states. (i.e., The majority of cases.)
    In particular, this lets us avoid having to check specifically whether
    each state is a match state or not.
    4. Start states are only treated specially if there is a prefix detected
    that we should scan for. Otherwise, start states are no different than
    any other state.
    5. Move transitions from `State` and into one giant transition table,
    which should hopefully improve locality and make better use of the
    cache.
    
    The use of `unsafe` is unfortunate, but it significantly reduces the
    number of instructions executed in a search. When the DFA spends a lot
    of time in the inner loop, eliding the bounds checks leads to better
    performance. In most cases, the boost is worth about 5%, but in some
    extreme cases (e.g., a match is the entirety of a large haystack), the
    boost can be worth nearly 50%.
    
    Here is a comparison between code without `unsafe` and with `unsafe`:
    
    ```
    $ cargo-benchcmp rust-safe rust-unsafe --threshold 3
    name                                     rust-safe ns/iter     rust-unsafe ns/iter     diff ns/iter   diff %
    misc::anchored_literal_long_match        29 (13,448 MB/s)      26 (15,000 MB/s)                  -3  -10.34%
    misc::anchored_literal_short_match       28 (928 MB/s)         26 (1,000 MB/s)                   -2   -7.14%
    misc::easy0_1MB                          49 (21,400,061 MB/s)  42 (24,966,738 MB/s)              -7  -14.29%
    misc::easy1_1K                           79 (13,215 MB/s)      76 (13,736 MB/s)                  -3   -3.80%
    misc::easy1_32                           79 (658 MB/s)         76 (684 MB/s)                     -3   -3.80%
    misc::easy1_32K                          80 (409,850 MB/s)     76 (431,421 MB/s)                 -4   -5.00%
    misc::hard_1K                            104 (10,105 MB/s)     100 (10,510 MB/s)                 -4   -3.85%
    misc::match_class_unicode                595 (270 MB/s)        571 (281 MB/s)                   -24   -4.03%
    misc::medium_1MB                         51 (20,560,862 MB/s)  44 (23,831,909 MB/s)              -7  -13.73%
    misc::no_exponential                     378 (264 MB/s)        361 (277 MB/s)                   -17   -4.50%
    misc::not_literal                        206 (247 MB/s)        196 (260 MB/s)                   -10   -4.85%
    misc::one_pass_long_prefix               116 (224 MB/s)        111 (234 MB/s)                    -5   -4.31%
    misc::one_pass_long_prefix_not           116 (224 MB/s)        108 (240 MB/s)                    -8   -6.90%
    misc::one_pass_short                     81 (209 MB/s)         76 (223 MB/s)                     -5   -6.17%
    misc::one_pass_short_not                 79 (215 MB/s)         75 (226 MB/s)                     -4   -5.06%
    misc::reallyhard_1K                      3,796 (276 MB/s)      3,629 (289 MB/s)                -167   -4.40%
    misc::reallyhard_1MB                     3,765,536 (278 MB/s)  3,602,215 (291 MB/s)        -163,321   -4.34%
    misc::reallyhard_32                      234 (252 MB/s)        222 (265 MB/s)                   -12   -5.13%
    misc::reallyhard_32K                     117,917 (278 MB/s)    112,604 (291 MB/s)            -5,313   -4.51%
    misc::replace_all                        144                   137                               -7   -4.86%
    sherlock::before_holmes                  2,163,856 (274 MB/s)  2,077,792 (286 MB/s)         -86,064   -3.98%
    sherlock::everything_greedy              3,641,444 (163 MB/s)  2,578,502 (230 MB/s)      -1,062,942  -29.19%
    sherlock::everything_greedy_nl           2,109,164 (282 MB/s)  1,080,933 (550 MB/s)      -1,028,231  -48.75%
    sherlock::holmes_coword_watson           1,087,276 (547 MB/s)  1,037,918 (573 MB/s)         -49,358   -4.54%
    sherlock::ing_suffix                     2,419,816 (245 MB/s)  2,308,945 (257 MB/s)        -110,871   -4.58%
    sherlock::ing_suffix_limited_space       2,360,927 (251 MB/s)  2,259,791 (263 MB/s)        -101,136   -4.28%
    sherlock::letters                        27,710,372 (21 MB/s)  25,348,374 (23 MB/s)      -2,361,998   -8.52%
    sherlock::letters_lower                  26,888,541 (22 MB/s)  24,759,385 (24 MB/s)      -2,129,156   -7.92%
    sherlock::letters_upper                  3,138,611 (189 MB/s)  2,989,327 (199 MB/s)        -149,284   -4.76%
    sherlock::line_boundary_sherlock_holmes  2,132,889 (278 MB/s)  2,046,399 (290 MB/s)         -86,490   -4.06%
    sherlock::name_alt1                      35,964 (16,542 MB/s)  37,164 (16,008 MB/s)           1,200    3.34%
    sherlock::name_whitespace                88,768 (6,702 MB/s)   85,322 (6,972 MB/s)           -3,446   -3.88%
    sherlock::quotes                         800,085 (743 MB/s)    769,792 (772 MB/s)           -30,293   -3.79%
    sherlock::the_whitespace                 1,315,168 (452 MB/s)  1,238,173 (480 MB/s)         -76,995   -5.85%
    sherlock::words                          11,230,278 (52 MB/s)  9,855,296 (60 MB/s)       -1,374,982  -12.24%
    ```
    BurntSushi committed Apr 13, 2016
    Configuration menu
    Copy the full SHA
    9fd5019 View commit details
    Browse the repository at this point in the history