Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add flag to mark test cases as optional to canonical schema #1492

Closed
SaschaMann opened this issue Mar 29, 2019 · 24 comments
Closed

Comments

@SaschaMann
Copy link
Contributor

SaschaMann commented Mar 29, 2019

Proposal: Add a way to mark test cases as optional in the canonical data.
For example with a flag "optional": true for optional cases, null or no flag for cases that (almost) every track should implement. Perhaps with an additional comment or reference to an issue/PR that explains when to implement it and when not to.

This allows automated test generators to handle these cases and leaves it up to the maintainers of the track if they want to add them or not.

Motivation: There are test cases (e.g. #1487 or complex-numbers#L291-L300) that some tracks want to implement, but other tracks don't, for example because it would unnecessarily increase the complexity of the exercise. While every track can implement additional tests manually, once a certain amount of tracks would use such a case, it may be useful to add them to the canonical data. (full context that lead to this issue being created)

@SaschaMann SaschaMann changed the title Add flag to mark test cases as optional to canonical schema Proposal: Add flag to mark test cases as optional to canonical schema Mar 29, 2019
@petertseng
Copy link
Member

To help reviewers evaluate and decide on this proposal, I contribute three additional motivations as potential factors.

Floating-point tests in triangle:

{
"comments": [
" Your track may choose to skip this test ",
" and deal only with integers if appropriate "
],
"description": "sides may be floats",
"property": "scalene",
"input": {
"sides": [0.5, 0.4, 0.6]
},
"expected": true
}

Performance tests, such as pythagorean-triplet (#1347) and alphametics (#1024)

Invalid inputs: recall that we've had discussions in #902 . Currently, unless they are essential to the exercise, we will not put them in canonical-data.json , but acknowledge that tracks may choose to add their own. There may be situations where multiple tracks would choose to add such cases, in which cases it could be valuable to share them, and this could be a mechanism for doing so.

I'm interested to see how this proposal goes.

@pgaspar
Copy link
Member

pgaspar commented Apr 1, 2019

A few questions:

  1. This would require a change on the automatic test generators on every track, right? Because otherwise optional tests will be generated as usual.
  2. What happens when an exercise has multiple tests that are optional for very different reasons? Is using "optional" as a category too broad? (Would we want to maintain a set of narrower categories?)
  3. On the Ruby track we've recently started working on ways for generators to exclude certain tests we don't want. We don't want them for various reasons, mostly related to the Track Anatomy project. Could this be an alternative approach to the same problem?

@NobbZ
Copy link
Member

NobbZ commented Apr 1, 2019

  1. Yes. The assumption is, that most testgenerators will just continue to work without any specialcasing of the optional tests. We are aware though, that some generators might fail badly when seeing a JSON that is not formed as expected. In the chat we assumed, leaking optionals is the lesser of all discussed evils.
  2. In my opinion, just an "optional" flag is not really enough. Personally, I'd like if it were a string, briefly explaining the reason for introducing the case.
  3. Basically it is the canonical form of exclusion. As it is now, you need to document somewhere why the test was removed in ruby, you also need a way to keep track of renamed tests in this documentation and the generators code. In erlang we could currently filter on any property of the JSON object, but if that changes nothing is filtered anymore and the test is back again. If though we had some optionality marker and a guaranteed to be immutable property in the JSON, this exclusion were easy.

@sshine
Copy link
Contributor

sshine commented Apr 1, 2019

@pgaspar wrote:

Is using "optional" as a category too broad?

Yes, and additionally: Aren't all tests technically optional?

For example, the Haskell track's tests for acronym deliberately preserves a test case that was removed from the canonical tests at some point. I'd assume there are similar cases of deliberately leaving out canonical tests, or having them commented out / skipped by individual language tracks at the track's own discretion.

(Would we want to maintain a set of narrower categories?)

We could maintain a set of categories for why tests may be conditionally excluded.

  • Could become en enum field in canonical-schema.json.
  • I'm not sure this category is easy to fill out.
  • Would somehow increase workload for test generator maintainers; my impression is that some tracks have test generators, but they're not always maintained. (E.g. in my short period of maintaining the Ocaml track actively, I never got around to using the test generators as they weren't so documented and I wasn't used to the toolchain used to build them.)

@NobbZ wrote:

I'd like if it were a string, briefly explaining the reason for introducing the case.

Like the following?

"comment": "This test uses big numbers."
"comment": "This test uses floating-point numbers."
  • This way we're not restricted to a narrow set of categories.
  • The canonical tests wouldn't deal with the question of optionality.
  • The problem of e.g. adding a test with a number that exceeds 32 or 64 bits, or a test that uses floating-point numbers when a track implements the exercise with integers, would still affect test generators for language tracks that don't have native/transparent support for such numbers, or don't intend to add special support for the particular exercise.

Since we have had no objections from makers of test generators yet, I would favor this solution for being

  1. least invasive wrt. changing the format for canonical data, making existing test generators technically support the format without changes, and
  2. least complex wrt. optionality: Let maintainers deal with excluding test cases, accept that some tests may not apply to all language tracks, and be more inclusive towards adding tests that only address some tracks and not others.

@sshine
Copy link
Contributor

sshine commented Apr 1, 2019

@NobbZ wrote:

I'd like if it were a string, briefly explaining the reason for introducing the case.

Correction: How would this reason deviate from the test message?

For example, if it were the case of armstrong-numbers in #1487, wouldn't the test description be something like "Test against large number to ensure that a coercing library function does not lose precision"?

@SaschaMann
Copy link
Contributor Author

In my opinion, just an "optional" flag is not really enough. Personally, I'd like if it were a string, briefly explaining the reason for introducing the case.

"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.

Correction: How would this reason deviate from the test message?

Imo, the test descriptions/names are more often an explanation for the student, while these comments/flags are purely for maintainers.

For example, if it were the case of armstrong-numbers in #1487, wouldn't the test description be something like "Test against large number to ensure that a coercing library function does not lose precision"?

That's not the only reason for the test and this doesn't tell maintainers that, in a language where 64 bit integers aren't typically/always used, they might not want to use this because it introduces an additional concept. There are also some examples above where the comment would be far too long for the description.

Why not just use a comment?
@ErikSchierboom brought up in the chat that one should not have to scan comments to find out if a test case should be omitted. The difference is that generators would be able to detect the kind of edge cases described above. For manual creation of test suites, there isn't really a difference.

@pgaspar
Copy link
Member

pgaspar commented Apr 1, 2019

The difference is that generators would be able to detect the kind of edge cases described above.

Maybe it would be easier for me to understand this point with examples of how a generator could use this flag.

One example I can think of is showing a warning whenever a maintainer runs the generator and it sees an optional test. Wouldn't it become a bit repetitive to always see these warnings even after initially verifying that we do want the test? Is the assumption that generators would have a way to "resolve" these warnings somehow?

Sorry if these questions are a bit of a tangent. I'm only looking at this through what I know of the Ruby generators so it's very likely I'm missing something.

@coriolinus
Copy link
Member

coriolinus commented Apr 1, 2019 via email

@SaschaMann
Copy link
Contributor Author

This imposes an additional engineering burden on the people maintaining the generators. As one of them, I personally would prefer to avoid an optional flag. All tests are implicitly optional anyway, as things currently stand.

As @NobbZ said above:

Yes. The assumption is, that most testgenerators will just continue to work without any specialcasing of the optional tests. We are aware though, that some generators might fail badly when seeing a JSON that is not formed as expected. In the chat we assumed, leaking optionals is the lesser of all discussed evils.

Would this break the generators you're maintaining, @coriolinus?


One example I can think of is showing a warning whenever a maintainer runs the generator and it sees an optional test. Wouldn't it become a bit repetitive to always see these warnings even after initially verifying that we do want the test? Is the assumption that generators would have a way to "resolve" these warnings somehow?

This is up to the maintainer of the generator. I'm currently working on creating Julia generators and these are some of the options I'm considering:

  • Just show a warning
  • Check if the case has been implemented, e.g. by keeping track of them somewhere or checking if it's in the section of additional/track-specific/manual tests. If not: show a warning
  • Also thinking of a more complex approach that would check if the additional test would break the example solution, because that can be indicator that it introduces additional complexity (or that the example solution is wrong but that'd be another issue)

Sorry if these questions are a bit of a tangent. I'm only looking at this through what I know of the Ruby generators so it's very likely I'm missing something.

No need to be sorry, it's very valuable to know how different generators would handle this.

@ErikSchierboom
Copy link
Member

"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.

This is my preferred option. I'm hesitant to use comments for this, as I feel people should be able to freely change the comments without breaking any generators.

Applying this to @petertseng's example:

 { 
   "comments": [ 
     " Your track may choose to skip this test    ", 
     " and deal only with integers if appropriate " 
   ], 
   "description": "sides may be floats", 
   "property": "scalene", 
   "input": { 
     "sides": [0.5, 0.4, 0.6] 
   }, 
   "expected": true,
   "optional": "floating-point-numbers"
 }

Or something like that.

@sshine
Copy link
Contributor

sshine commented Apr 12, 2019

"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.

I also like this idea.

#1503 provides another reason for optional inclusion, being unicode.

@ErikSchierboom
Copy link
Member

Is there anyone against @petertseng's proposal?

@SleeplessByte
Copy link
Member

I'm 100% in favour. I do suggest we hold off on actually adding flags until #1496 's other suggestions have a similar consensus. It makes sense to me to (potentially) add a bunch of flags all at once instead of many over a certain timespan in regards with strain to generator maintainers.

@petertseng
Copy link
Member

Is there anyone against @petertseng's proposal?

Small point of order, sorry to interrupt...

I do not perceive myself as having made any proposal in this issue - to help people evaluate whether people are for or against, I think it would be helpful if you would point out the proposal that I must have accidentally made. (Or maybe this is a case of misattribution, in which case I think it would be most courteous to the proposer if the proposal were attributed to the right person)

@Stargator
Copy link
Contributor

Sorry for chiming in late.

I'm opposed to this as the Dart track generally tries to re-run our generator every so often and it would be very repetitive to have to figure out whether we chose to do include an optional test or not.

Additionally, this suggestion would require all the tracks that use generators to update their generator on top of any other existing work or improvements.

All just for the sake of a few tracks that chose to have similar tests added.

I would prefer a more empirical driven proposal, just as exercism having a mechanism for looking at the implementations of every exercise across all the tracks. From that point, exercism could compare the number of tests, the test names, and the expected results with the problem-specification.

Once that happens and we see a better automated way of telling how many tracks have included tests that seem similar, then I think we could evaluate whether to add those specific tests to the specification.

I would much prefer that then adding another key to the specification that our generator would have to handle.

To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.

That seems better to me than having optional tests across all the specs that all the generators would have to be updated to address/ignore.

@kytrinyx
Copy link
Member

kytrinyx commented May 2, 2019

It sounds to me like the choice we're trying to balance is this: Do we make the a call about inclusion/exclusion at the problem-specifications level, or at the track level?

My understanding is that even if we mark some test cases as optional in the specification, then a track might want to include only some optional tests, and exclude some of the tests that are considered "core" to the exercise.

I think it would be worth considering adding a "metadata" or similar section to the spec for each individual test, which could have a unique identifier that doesn't change, a type field (e.g. "basic", "errors", "whatever", ...) and an optional comment field, which explains what the motivation for having that test. We could update all existing specs automatically in one fell swoop to give them identifiers, setting the type to "basic" (or whatever we decide the term should be).

The assumption would then be that the generator would include everything by default, but could have a configuration file that maintainers can update to exclude based on a type or an identifier.

Thoughts?

@ErikSchierboom
Copy link
Member

TL;DR I still prefer the simple "optional": "floating-point-numbers" approach.

My understanding is that even if we mark some test cases as optional in the specification, then a track might want to include only some optional tests, and exclude some of the tests that are considered "core" to the exercise.

I think this is indeed correct. although excluding "core" tests is really, really uncommon I think. Once a track start skipping "core" tests, maybe that is an indication that this exercise is not particularly well suited for the track?

We could update all existing specs automatically in one fell swoop to give them identifiers, setting the type to "basic" (or whatever we decide the term should be).

I'm not completely sold on this. Having the field be present in all records is great for consistency and tooling (like excluding things etc.), but it seems like it is only really necessary for those corner cases in which some tracks might want to exclude a test case.

My preference would still be just a simple additional property:

"optional": "floating-point-numbers"

My reasoning for this is: it is easy to parse, simple to filter on and can be judiciously applied to only those "problematic" test cases.

I'm opposed to this as the Dart track generally tries to re-run our generator every so often and it would be very repetitive to have to figure out whether we chose to do include an optional test or not.

Well, the amount of work this entails really depends on how often we add an "optional" test, right? I expect there to be relatively little, although this is just an educated guess of course.

To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.

This would be nice to have. We currently do this a bit more informally in the PR/issue discussion.

@SaschaMann
Copy link
Contributor Author

SaschaMann commented May 7, 2019

To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.

I also think this would be very nice to have but implementing that would likely take way more work than changing the generators (which is optional! A track can also choose to include all optional tests, which is usually the default when the additional key is ignored, or ignore them entirely). One would have to create parsers for test suites of more than 50 tracks and the tests may not have a direct reference to the canonical data.

@ErikSchierboom
Copy link
Member

A track can also choose to include all optional tests, which is usually the default when the additional key is ignored, or ignore them entirely.

This is also my viewpoint.

@SaschaMann
Copy link
Contributor Author

#1518 adds the most-liked implementation from above. I don't want to conclude or finalise the discussion in any way, but I've had some spare time to implement it and many seemed to like that one, so I've added the PR.

@sshine
Copy link
Contributor

sshine commented Jun 26, 2019

In the interest of moving forward, I would like to ask @Stargator, who said that maintainers of test generators will have extra work in dealing with the "optional": "..." addition to the canonical schema:

  1. Does the presence of a new dictionary value not only pose a problem to those who do typed parsing of the schema? (E.g. using Haskell's aeson.) My assumption would be that maintainers of test generators can ignore this property to the extent that their generated tests work fine without it.

  2. It would in fact help the OCaml test generator if such a property was available, because then we could e.g. disable triangle's floating-point tests, since the exercise is intentionally written with integers for that track without fuzzy matching on test case comments. See triangle: actively ignore cases with float input  ocaml#338.

All just for the sake of a few tracks that chose to have similar tests added.

You are right in saying that there are very few deviations (both in terms of loose comments in canonical data saying something is optional, and in terms of tracks that choose for themselves for something to be optional or additional).

But I don't think you can use this count as a reasonable argument against this feature, because it's structural: We want tracks to conform, because this is easier to maintain, but we want tracks to be able to break out (e.g. let the Haskell track test that pangram is made lazy without having to express that in JSON).

An example of something you don't see is Unicode in track-specific tests. We want more Unicode tests. But we don't want to force Unicode on all tracks. Having "optional": "unicode" will enable meaningful tests like exercism/haskell#827 so mentors don't have to point out trivial mistakes made in a language that is Unicode by default.

I would prefer a more empirical driven proposal [...]

So would I.

[...] looking at the implementations of every exercise across all the tracks. From that point, exercism could compare the number of tests, the test names, and the expected results with the problem-specification.

I've done empirical tests of all track repos for other purposes, and it's a lot of work. In my latest attempt I started to write a jq tutorial with use-cases from Exercism, hoping that this would eventually become a cookbook for track maintainers. But it's stashed, because it's a lot of work.

So I also don't think that this is a reasonable argument to oppose the feature.

Once that happens and we see a better automated way of telling how many tracks have included tests that seem similar, then I think we could evaluate whether to add those specific tests to the specification.

To begin with, we don't even know how many tracks that keep track of the canonical test version for each exercise in a systematic way. See for example exercism/javascript#628 for a track that's getting there. I'm willing to bet that the number is quite low.

So we either start fuzzy-matching across language syntaxes (using multiple historical phrasings for canonical data for tests where the wording changed altogether) or we push for a canonical version system for each language (which must subsequently be maintained), until we eventually get to a point where we can ask: How common are tests on each track?

It's a lot of work.

And until then mergers of problem-specifications have a good intuition about what tests are not being added when they could because our current all-or-nothing policy prevents us from seeing what we're missing out on.

To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.

So would I.

But I don't think that the work involved in this alternative is realistic.

While adding this key is extremely low-effort and parallelizable.

I, for one, will start labelling "optional": "unicode" and "optional": "big-integer" and "optional": "floating-point" to a number of exercises for the Haskell and OCaml track.

Edit: Removed a practical note to the PR itself.

@ErikSchierboom
Copy link
Member

But I don't think that the work involved in this alternative is realistic.

While adding this key is extremely low-effort and parallelizable.

I agree with both statements. It is a relatively minor change that will definitely have a positive impact on a number of tracks.

p.s. the C# and F# generators won't be affected by this change, they will have to opt-in to use the feature

@SleeplessByte
Copy link
Member

To begin with, we don't even know how many tracks that keep track of the canonical test version for each exercise in a systematic way. See for example exercism/javascript#628 for a track that's getting there. I'm willing to bet that the number is quite low.

I believe it was 6 or so the last time I checked. That's 6 out of ~50.

@SaschaMann
Copy link
Contributor Author

Resolved by #1518 (unless I'm missing something)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants