Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve StringShrinker algorithm #377

Merged
merged 1 commit into from
Jul 15, 2018
Merged

Conversation

ajalt
Copy link
Contributor

@ajalt ajalt commented Jul 13, 2018

The current string shrink algorithm generates candidates by dropping single characters from the end of the input. This doesn't produce the smallest case, since it doesn't drop characters from the start of the string. It is also linear on the size of the input, so it requires a potentially large number of tries to reach the result it does produce.

This PR changes to algorithm to bisect the input string from both directions, producing a minimal output in log n tries.

As an example of the current behavior, the test:

forAll { it: String -> !it.contains("#") }

produces output like this:

Attempting to shrink failed arg DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ tri7jw{
Shrink #1: <empty string> pass
Shrink #2: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ tri7 fail
Shrink #3: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ fail
Shrink #4: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!)  fail
Shrink #5: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQx fail
Shrink #6: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k fail
Shrink #7: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x, fail
Shrink #8: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B* fail
Shrink #9: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn fail
Shrink #10: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2 fail
Shrink #11: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M# fail
Shrink #12: DPn CPQ7hBAY&LP;7MxPtN^Oy\$ pass
Shrink #13: DPn CPQ7hBAY&LP;7MxPtN^Oy\$F pass
Shrink #14: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz pass
Shrink #15: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz; pass
Shrink #16: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M pass
Shrink #17: aaaaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$ pass
Shrink #18: aaaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$F pass
Shrink #19: aaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz pass
Shrink #20: aaDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz; pass
Shrink #21: aDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M pass
Shrink result => DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#

This is 21 attempts to produce a string 32x longer than optimal.

The new algorithm produces output like this:

Attempting to shrink failed arg t ^j>t\o,x3?eb9#F'>g>vGQ-N}nkx
Shrink #1: <empty string> pass
Shrink #2: t ^j>t\o,x3?eb9 pass
Shrink #3: #F'>g>vGQ-N}nkx fail
Shrink #4: #F'>g>vG fail
Shrink #5: #F'> fail
Shrink #6: #F fail
Shrink #7: # fail
Shrink result => #

This is only 7 tries, and produces the correct output of "#"

The current string shrink algorithm generates candidates by dropping single characters from the end of the input. This doesn't produce the smallest case, since it doesn't drop characters from the start of the string. It is also linear on the size of the input, so it requires a potentially large number of tries to reach the result it does produce.

This PR changes to algorithm to bisect the input string from both directions, producing a minimal output in log n tries.

As an example of the current behavior, the test:

`forAll { it: String -> !it.contains("#") }`

produces output like this:

```
Attempting to shrink failed arg DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ tri7jw{
Shrink #1: <empty string> pass
Shrink #2: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ tri7 fail
Shrink #3: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!) e|Gr+ fail
Shrink #4: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQxU+!)  fail
Shrink #5: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k<}QQx fail
Shrink #6: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x,L;6&k fail
Shrink #7: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B*w,1x, fail
Shrink #8: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn18+B* fail
Shrink #9: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2yEcNn fail
Shrink #10: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#EC\M2 fail
Shrink #11: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M# fail
Shrink #12: DPn CPQ7hBAY&LP;7MxPtN^Oy\$ pass
Shrink #13: DPn CPQ7hBAY&LP;7MxPtN^Oy\$F pass
Shrink #14: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz pass
Shrink #15: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz; pass
Shrink #16: DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M pass
Shrink #17: aaaaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$ pass
Shrink #18: aaaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$F pass
Shrink #19: aaaDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz pass
Shrink #20: aaDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz; pass
Shrink #21: aDPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M pass
Shrink result => DPn CPQ7hBAY&LP;7MxPtN^Oy\$Fz;M#
```

This is 21 attempts to produce a string 32x longer than optimal.

The new algorithm produces output like this:

```
Attempting to shrink failed arg t ^j>t\o,x3?eb9#F'>g>vGQ-N}nkx
Shrink #1: <empty string> pass
Shrink #2: t ^j>t\o,x3?eb9 pass
Shrink #3: #F'>g>vGQ-N}nkx fail
Shrink #4: #F'>g>vG fail
Shrink #5: #F'> fail
Shrink #6: #F fail
Shrink #7: # fail
Shrink result => #
```

This is only 7 tries, and produces the correct output of `"#"`
@sksamuel
Copy link
Member

Brilliant.

@sksamuel sksamuel merged commit 477a293 into kotest:master Jul 15, 2018
@ajalt ajalt deleted the bisect-string-shrink branch July 15, 2018 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants