Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split test cases into specific levels of functionality? #497

Closed
ianstormtaylor opened this issue Dec 12, 2020 · 11 comments · Fixed by #819
Closed

split test cases into specific levels of functionality? #497

ianstormtaylor opened this issue Dec 12, 2020 · 11 comments · Fixed by #819

Comments

@ianstormtaylor
Copy link
Contributor

ianstormtaylor commented Dec 12, 2020

Hey, thanks for this benchmark! It's a great idea.

I think right now the existing test cases aren't testing for the same behavior in each library though. Depending on the library, it might support "guarding", "validating", "parsing", "transforming", etc. But if you compare guarding of one to parsing of another it's not a direct comparison.

For example, as far as I can tell right now toi is simply guarding a value and returning a boolean in its test case, whereas superstruct is parsing an input, cloning it, and returning the parsed value in its test case (instead of using the simpler is export).

It might make sense to add more specific levels of functionality to each test case. And some libraries simply won't allow certain functionality. For example (using superstruct because I'm the author so I know it best):

import { is, assert, coerce, object, ... } from 'superstruct'

const type = object({ ... })

class SuperstructCase extends Case {
  name = 'superstruct'

  // The `is` test is for simply returning a boolean indicating whether 
  // a value is valid. It doesn't need to create errors, parse values, etc.
  // This is the most basic and the minimum to be included.
  is(value) {
    return is(value, type)
  }

  // The `assert` test is for throwing a proper `Error` object, with a 
  // reason for why validation failed. It shouldn't be re-implemented
  // for libraries that only support `is` type guards because the errors
  // won't actually contain any information.
  assert(value) {
    assert(value, type)
  }

  // The `parse` test is for libraries that allow some parsing step on
  // the value to ensure that it is valid—for things like default values,
  // coercing inputs, etc. It should return the parsed value, and the
  // value *should not* equal the input (ie. cloned).
  parse(value) {
    return coerce(value, type)
  }
}

You'd be able to show a section of graphs for each level of functionality. Then people can make more informed decisions—eg. a library might be much faster in is, but not allow for parse.

I think this would make the comparisons much more accurate.

@moltar
Copy link
Owner

moltar commented Dec 18, 2020

@ianstormtaylor thank you for your input. I completely agree. This started small by comparing just a few libs and then it grew to this monster and now we are comparing apples to oranges. I like your idea about having extra test cases.

@ianstormtaylor
Copy link
Contributor Author

@moltar 😄 glad to hear it! I think the graphs and stuff that you'll be able to show once it's split will be really cool. And I'm looking forward to digging into Superstruct's performance when it's apples-to-apples.

@moltar
Copy link
Owner

moltar commented Mar 3, 2021

Let's think about what kind of test buckets we want then? Any ideas?

@ianstormtaylor
Copy link
Contributor Author

I think is, assert and parse are good to start.

@hoeck
Copy link
Collaborator

hoeck commented Apr 26, 2021

In addition to the three modes above I'd like to see a second dimension: strict vs non-strict validation of objects.

Strict:
only allowing known attributes in an object, reporting an error when unknown keys are present.

Non-strict:
similar to how Typescript checks interface type compatibility: if all known attributes are present, everything is good.

Strict vs non-strict checking has performance implications that I'd like to see how other libraries tackle it.

Unfortunately there is a third mode in this dimension, that is allowing non-strict objects as input but returning strict object that has all unknown keys stripped.

As a side note, I think that non-strict runtype checking is actually leading to a false sense of security as you essentially allow arbitrary attributes on objects that are passed into you application and maybe even deeper into the database or other services. So, maybe only benchmark for strict + non-strict-input-strict-output cases?

I'd love to help once it is clear what we need and how to report it.

@hoeck
Copy link
Collaborator

hoeck commented Jun 28, 2021

Found some time over the weekend trying to solidify my ideas hoeck@eef6d89

It basically introduces:

  1. Support for more than one benchmark case per library: https://github.com/hoeck/typescript-runtime-type-benchmarks/blob/eef6d89fc5d7d8ef249b47bd8a8f3474c66d5a55/cases/valita.ts
  2. Interactive visualization using gh-pages: https://hoeck.github.io/typescript-runtime-type-benchmarks/

Next steps I can think of are:

  • add a parsing benchmark case
  • update the tests to work with the multiple benchmark cases
  • add node-version and library filter to the visualization
  • decide what to do with the existing graphs in the readme: keep as is / use the new group-graph / keep only a single graph and refer to the gh-pages ui for detailed info about node-versions and outlier-removal?

Any ideas, suggestions, feedback, code-complaints?

Should I move forward with this @moltar ?

@moltar
Copy link
Owner

moltar commented Jun 29, 2021

@hoeck love the progress, especially the interactive visualization using gh-pages! I was dreaming of this, but didn't think I'll ever have time to get to that!

What I might have done differently on the additional benchmark cases is to have a single interface, which every benchmark implements. Not to have two tests per class.

We can then group and organize different classes of test cases into folders and perhaps just run the benchmarking tool with options to point to different folders, or just loop thru folders programmatically and run all benchmarks.

Another option is to add a concept of tags or features to the interface, which can be just an array of strings, and then use that as part of the UI to select/unselect different options. Then we can organize the test cases however we want, e.g. group by package in a dir, or put them all in one dir, with just different filenames, and use tags to organize.

@hoeck
Copy link
Collaborator

hoeck commented Jun 29, 2021

Thanks for the Feedback 😁!

I agree that grouping the cases can be improved. Was just not sure about how much of the existing structure you'd like to keep or why everything was written like it is in the first place (e.g. using an abstract class over an interface).

Having a single interface and multiple files per package sounds like the good & common way to do it. I would like to keep the setup simple though, so no automatic reading and importing of directories. In my experience this leads to more complex builds than just having a few explicit import statements. Automatic importing could probably be added later once the folder structure is in place anyway.

Tags sound like a nice add-on layer over choosing packages and benchmarks individually and I thought of this too but would first like to try to get the basics off the ground.

I'll update my fork with said way of grouping benchmarks when I have some spare time (probably in about a week) and keep you updated.

@moltar moltar mentioned this issue Jul 7, 2021
@hoeck
Copy link
Collaborator

hoeck commented Aug 31, 2021

Hey there,

found some time to work on this further. @moltar could you please have a look at the lastest commit at https://github.com/hoeck/typescript-runtime-type-benchmarks/tree/refactor-benchmarks ?

I've simplified adding new benchmarks and libraries. Its no done via a typed register function. Benchmark cases now look like this:

import { register } from '../benchmarks';
import * as v from '@badrap/valita';

const dataType = v.object({
  number: v.number(),
  negNumber: v.number(),
  maxNumber: v.number(),
  string: v.string(),
  longString: v.string(),
  boolean: v.boolean(),
  deeplyNested: v.object({
    foo: v.string(),
    num: v.number(),
    bool: v.boolean(),
  }),
});

register('valita', 'validate', data => {
  return dataType.parse(data);
});

register('valita', 'validateStrict', data => {
  return dataType.parse(data, { mode: 'strict' });
});

The benchmark case implementation is just the single function passed to register. Grouping benchmarks in different files is trivial. You could even add them dynamically but for now I am just using simple import calls.

For the strict & non-strict benchmarks its easier to keep them in 1 file as the runtype definition is the same only the check arguments differ for most libraries.

Benchmarks also have their own tests which are executed for each registered case.

If you give me a thumbs up I would continue to clean this up, migrate all existing cases and create a mergable PR.

@moltar
Copy link
Owner

moltar commented Nov 1, 2021

@hoeck Can you please make a PR and I'll review it! Thanks! 😄

@hoeck
Copy link
Collaborator

hoeck commented Dec 30, 2021

Just FYI I finally found some time and I am working on this. right now. I am also really curious on the results 😁.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants