Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arb.string shrinking simplest character is always 'a' regardless of codepoint #2646

Closed
Tracked by #2678
myuwono opened this issue Nov 13, 2021 · 5 comments · Fixed by #2731
Closed
Tracked by #2678

Arb.string shrinking simplest character is always 'a' regardless of codepoint #2646

myuwono opened this issue Nov 13, 2021 · 5 comments · Fixed by #2731
Labels
bug 🐛 Issues that report a problem or error in the code. enhancement ✨ Suggestions for adding new features or improving existing ones. property-testing 📏 Related to the property testing mechanisms within the testing framework.
Milestone

Comments

@myuwono
Copy link
Contributor

myuwono commented Nov 13, 2021

The offending code: https://github.com/kotest/kotest/blob/master/kotest-property/src/commonMain/kotlin/io/kotest/property/arbitrary/strings.kt#L35

but I can't really think of any deterministic way to fix it, because the lowest codepoint isn't known upfront :(

we can collect the sampled codepoints in a mutable set after the fact but it will incur a pretty noticeable runtime cost.

test to replicate this:

test("string shrinking should honor codepoints") {
  val arbNumeric = Arb.of(('0'..'9').map { Codepoint(it.code) })
  val samples = Arb.string(1..5, arbNumeric).generate(RandomSource.seeded(1234L))
  samples.take(1).first().shrinks.children.value.map { it.value() }.forAll { value ->
    value.shouldContainOnlyDigits()
  }
}

fails with:

The following elements passed:
519
83
1983
5198

The following elements failed:
"a1983" => "a1983" should contain only digits
"5198a" => "5198a" should contain only digits
@myuwono myuwono added the bug 🐛 Issues that report a problem or error in the code. label Nov 13, 2021
@sksamuel
Copy link
Member

sksamuel commented Nov 14, 2021 via email

@sksamuel
Copy link
Member

sksamuel commented Nov 16, 2021

So I think the only way to solve this is to have:

interface Alphabet {
   val codepoints: List<Codepoint>
}

And the lowest can be the first element of that list.

(Alphabet could be replaced with CodepointSet or something).

@myuwono
Copy link
Contributor Author

myuwono commented Nov 17, 2021

sounds good, would that be something like Arb.string(1..20, Codepoints.alphaNumeric()) where that uses the new Codepoints or Alphabet interface? - would that be a breaking change? or should we plan through a deprecation cycle for Arb.string(...)?

@sksamuel
Copy link
Member

We could deprecate one or keep both. If we do this today or tomorrow it can make 5.0

@sksamuel sksamuel added enhancement ✨ Suggestions for adding new features or improving existing ones. property-testing 📏 Related to the property testing mechanisms within the testing framework. labels Nov 27, 2021
@sksamuel sksamuel added this to the 5.1 milestone Nov 27, 2021
@sksamuel sksamuel mentioned this issue Nov 27, 2021
16 tasks
@aSemy
Copy link
Contributor

aSemy commented Dec 10, 2021

Hi, I don't think the simplest codepoint has to be a specific codepoint for a given set of codepoints - one can be randomly selected when the shrinker is created.

/**
 * Shrinks a string. Shrunk variants will be shorter (reduced in length, until they are [minLength] in length),
 * and simplified (characters will be replaced by [simplestChar]).
 *
 * Note: by default [simplestChar] is the character "`a`". If shrinking an `Arb<String>` that does not include "`a`",
 * either provide a suitable alternative, or use the secondary constructor and provide a suitable [Codepoint] [Arb],
 * and a suitable character will be randomly selected.
 */
class StringShrinkerWithMin(
   private val minLength: Int = 0,
   private val simplestChar: Char = 'a',
) : Shrinker<String> {

   /**
    * @see StringShrinkerWithMin
    * @param[codepoints] Variants will be simplified using a random [Char] from this [Arb].
    * @param[rs] Selects an arbitrary 'simplestChar' from [codepoints] - must be seeded for repeatability
    */
   constructor(
      minLength: Int = 0,
      codepoints: Arb<Codepoint>,
      rs: RandomSource = RandomSource.seeded(minLength.toLong())
   ) : this(minLength, codepoints.next(rs).value.toChar())

// ...

I'm working on a PR :)

aSemy added a commit to aSemy/kotest that referenced this issue Dec 10, 2021
@sksamuel sksamuel modified the milestones: 5.1, 5.0.3 Dec 11, 2021
@sksamuel sksamuel removed this from the 5.0.3 milestone Dec 18, 2021
@sksamuel sksamuel added this to the 5.1 milestone Jan 16, 2022
sksamuel pushed a commit that referenced this issue Jan 16, 2022
)

* #2646 StringShrinkerWithMin generates samples based on source-Arb codepoints

* move stringArb out of the arbSample constructor, clarify comments, add clue

* oops, fix assertion

* update StringShrinkerWithMin to select a simplestChar from a pre-shrunk String, so it will be valid and deterministic

* return the variant-selection to match the original

* add javadoc, undo formatting

* actually, it's simpler with sequential 'if's, as it covered all combinations of is/isn't simplest/shortest better than the 'when' statement

* test StringShrinker's simplestCharSelector

(additionally, make the test names consistent and slightly rearrange StringShrinkerWithMin statements for clarity)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Issues that report a problem or error in the code. enhancement ✨ Suggestions for adding new features or improving existing ones. property-testing 📏 Related to the property testing mechanisms within the testing framework.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants