Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation action now executes after failed schema check #436

Closed
ivands opened this issue Feb 12, 2024 · 41 comments
Closed

Transformation action now executes after failed schema check #436

ivands opened this issue Feb 12, 2024 · 41 comments
Assignees
Labels
intended The behavior is expected

Comments

@ivands
Copy link

ivands commented Feb 12, 2024

After upgrading to 0.28.1 from 0.20.1 the transformation action executes after a failed schema check.
This should never happen and is a bug.

const schema = transform(
	string('Invalid string', [regex(/^[0-9]+ (milliseconds?|seconds?|minutes?|hours?|days?)/)]),
	() => {
		throw new Error('This error should not happen for invalid input')
	}
}

// This will throw my error instead of a valibot error.
parse(schema, 'INVALID')
@fabian-hiller
Copy link
Owner

fabian-hiller commented Feb 12, 2024

It's actually not a bug. We changed this behavior in v0.22.0. But we also added a second parameter to the transformation function that you can use to check if there were any issues. Note that the input type is still correctly typed. transform will only be executed if the input type matches your schema.

@fabian-hiller fabian-hiller self-assigned this Feb 12, 2024
@fabian-hiller fabian-hiller added the intended The behavior is expected label Feb 12, 2024
@fabian-hiller
Copy link
Owner

Feel free to share the exact code of your schema with me. This might help me understand your problem better and find an appropriate solution for you.

@ivands
Copy link
Author

ivands commented Feb 12, 2024

My transformation code expects the validation to pass before executing.
Why would this behavior suddenly change?
The current behavior is super confusing.
And not well documented.

const schema = transform(
	string('Invalid string', [regex(/^[0-9]+ (milliseconds?|seconds?|minutes?|hours?|days?)/)]),
	(value) => {
		return spacialParseFunctionThatNeedsTheRegexToMatch(value)
	}
}

@fabian-hiller
Copy link
Owner

The reason for this change is that with the previous implementation, it was not possible to execute the pipeline of an object if any issues occurred before. This leads to problems especially with form validation, because it is quite normal that some fields do not match the requirements when the form is submitted. With the previous implementation, it was not possible to show all issues at once. That's why we now distinguish between issues that affect the data type and issues that don't. This allows us to run the outer pipelines as soon as the types are correct, even if there are minor problems with specific fields. The disadvantage of this change is that we also need to run transform as soon as the data type is valid, otherwise the data type of the outer pipelines could be wrong. You can read more about this in issue #76 and #145.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Feb 12, 2024

Does the following change to your schema work for you?

import * as v from 'valibot';

const Schema = v.transform(
  v.string('Invalid string', [
    v.regex(/^[0-9]+ (milliseconds?|seconds?|minutes?|hours?|days?)/),
  ]),
  (value, { issues }) => {
    if (!issues) {
      return spacialParseFunctionThatNeedsTheRegexToMatch(value);
    }
    return value; // or something else
  }
);

That's the same with a bit less code:

import * as v from 'valibot';

const Schema = v.transform(
  v.string('Invalid string', [
    v.regex(/^[0-9]+ (milliseconds?|seconds?|minutes?|hours?|days?)/),
  ]),
  (value, { issues }) => (issues ? value : spacialFunction(value))
);

Not that you can also outsource the logic into your custom function to further reduce the code:

import * as v from 'valibot';

const Schema = v.transform(
  v.string('Invalid string', [
    v.regex(/^[0-9]+ (milliseconds?|seconds?|minutes?|hours?|days?)/),
  ]),
  spacialFunction
);

@ivands
Copy link
Author

ivands commented Feb 12, 2024

The disadvantage of this change is that we also need to run transform as soon as the data type is valid, otherwise the data type of the outer pipelines could be wrong.

The data type could still be wrong.
You can't assume the transform function only needs a valid data type.
The validations can be important.
And when they are, the transform function will need to return something else.

Good luck fixing the types for that issue XD

I need to think about this problem for a while.
I have to say, tho.
Having the transform function not execute for invalid types but execute for invalid validations is super confusing and will have people stumbling.

BTW, Checking the issues list does solve the problem.

@ivands
Copy link
Author

ivands commented Feb 12, 2024

That's why we now distinguish between issues that affect the data type and issues that don't.

I have a question about this.
Why distinguish at all?
What is wrong with always transforming the input even if the data type is wrong?

@ivands
Copy link
Author

ivands commented Feb 12, 2024

This solution also breaks the output type.

const Schema = transform(string([minLength(10)]), (value, { issues }) => {
  // We can't transform the data because we expected the validations to pass.
  if (issues) {
    return value;
  }
  
  return Number(value);
});

// The output type is now string | number instead of number
type SchemaOutput = Output<typeof Schema>;

@ivands
Copy link
Author

ivands commented Feb 12, 2024

I don't see anything wrong with executing the pipeline even if an issue happened before.
However, I feel like transforming the data is where we should introduce a barrier.
And the transform pipeline doesn't execute after an issue happened before.
Or maybe we need a new concept for this.
Like alwaysTransform vs safeTransform

But distinguishing between data type issue's and validation issue's makes no sense to me.
Both can impact the transformation.

@fabian-hiller
Copy link
Owner

What is wrong with always transforming the input even if the data type is wrong?
I don't see anything wrong with executing the pipeline even if an issue happened before.

This would force us to type the input as unknown, which makes it useless because you have to validate it anyway before using it.

Strictly speaking, we do not distinguish between data type issues and other issues. Instead, we set .typed at the result of ._parse to true or false, depending on whether we know for sure that the expected data type was passed. And depending on that, we execute the pipeline and also `transform'.

This exchange is very important and I am grateful for your feedback.

@fabian-hiller
Copy link
Owner

This solution also breaks the output type.

I would just pass the number 0 or -1 in this case. Since there are issues, the output of the schema won't be important to you, since you will probably only be showing the error messages to your users and won't care about the output.

@ivands
Copy link
Author

ivands commented Feb 12, 2024

I would just pass the number 0 or -1 in this case. Since there are issues, the output of the schema won't be important to you, since you will probably only be showing the error messages to your users and won't care about the output.

That would do the trick.
But it's super weird to have to return a value just to fix the typings.
It also kinda breaks down when you work with complex types.
Because you'll have to make a nullable version of your complex types.

const Schema = transform(object({...}), (value, { issues }) => {
  if (issues) {
    // Weird and unnecessary 
    return createNullableVersionOfComplexObject()

    // Also suboptimal
    return undefined as unknown as ComplexType
  }
  
  return transfromStringToComplexObject(value);
});

This would force us to type the input as unknown, which makes it useless because you have to validate it anyway before using it.

Exactly, transforming before type validating actually makes no sense.
If you also agree that running the pipeline validations can also impact the transformation, then the logical conclusion would be only to execute the transform with valid input.

Ironically, typing the input as unknown might be less confusing now.
Because you're not getting a false sense of safety.

@fabian-hiller
Copy link
Owner

It seems to me that we have three choices. We can try to make this behavior configurable. We can leave it as is, with weird behavior in special cases, or we cannot perform transformations if there are issues, and mark the input as untyped even if the type is valid. The disadvantage of the third option is that we will no longer be able to run outer pipelines.

@fabian-hiller
Copy link
Owner

One question. Do you change the data type in transform? Otherwise you could execute the transformation with toCustom directly inside the pipeline and set the abortPipeEarly configuration to true. This would fix your problem.

@ivands
Copy link
Author

ivands commented Feb 13, 2024

It seems to me that we have three choices. We can try to make this behavior configurable. We can leave it as is, with weird behavior in special cases, or we cannot perform transformations if there are issues, and mark the input as untyped even if the type is valid. The disadvantage of the third option is that we will no longer be able to run outer pipelines.

I don't know what your thoughts are on this.
But Valibot is a validation library.
I prefer my validation library not to have weird edge-case behavior.
I fear this current behavior is unexpected and will lead to security problems.
Option 2 ain't it.

I'm fine with option 1.
It could be configurable on a single transformation level.
Because some transformations are safe without validation & some might not be.
The same goes for custom validations.
But I would side with predictability as default (Aka, you can't transform unsafe input).

Option 3 also seems fine.
Do we have examples of why running outer pipelines is important after a transformation?
It seems to me that the problems outlined in #76 and #145 don't need this feature.
I might be wrong tho.

One question. Do you change the data type in transform?

Yep.

@ivands
Copy link
Author

ivands commented Feb 13, 2024

I was reading about the problem some more.
And I agree with what you said here.
#76 (comment)

Skipping the .transform but still performing .refine seems like a bug.
But shouldn't the solution be to skip both the refine and the transform?

With the previous implementation, it was not possible to show all issues at once.

To be honest, after realizing all the edge cases that arise from this feature, I think it might have been better not to support this. XD Keep it simple and predictable.
Do people need to know that their password & confirm password aren't equal when their password isn't min length of 8?
I don't think so.

@fabian-hiller
Copy link
Owner

I fear this current behavior is unexpected and will lead to security problems.

In what case do you think this could lead to a security problem?

It could be configurable on a single transformation level. Because some transformations are safe without validation & some might not be.

I agree. The next step would be to figure out the API for it. The implementation should be simple.

But shouldn't the solution be to skip both the refine and the transform?

Could be, but then external pipelines cannot do their validation.

To be honest, after realizing all the edge cases that arise from this feature, I think it might have been better not to support this.

One of the biggest selling points of Valibot is its small and tree-shakeable bundle sizes. That's why a lot of developers use Valibot for form validation. I myself am the author of a form library called Modular Forms. That's why I prioritized it over the edge case for transform.

@ivands
Copy link
Author

ivands commented Feb 13, 2024

In what case do you think this could lead to a security problem?

I don't have an example off hand, but unpredictable behavior is why bugs exist.
And this is a validation library, after all.

One of the biggest selling points of Valibot is its small and tree-shakeable bundle sizes. That's why a lot of developers use Valibot for form validation. I myself am the author of a form library called Modular Forms. That's why I prioritized it over the edge case for transform.

That makes a lot of sense.
I would just add that Valibot is also the best option for serverless as well.
We use it to do server-side validation.
I guess that's why I'm prioritizing predictability.

Let me think about it some more.
I'll have more time tomorrow or the day after.

@fabian-hiller
Copy link
Owner

Thank you for your feedback on this. I look forward to improve Valibot with you!

@Demivan
Copy link
Contributor

Demivan commented Feb 15, 2024

Hi, sorry for interrupting the dialog, but I have few comments.
I agree with @ivands that the current behavior is really unintuitive. This is exactly the problem I wrote about in comments #76 (comment) and #76 (comment).
And after trying new "typed" pipelines, I can say that they are not really useful for form validation. There are still a lot of "type" issues, because of undefined for the fields that the user did not fill up yet.
@fabian-hiller, maybe we can set up a call, so I can show you the form schema and discuss pain points with the current valibot pipeline implementation.

@fabian-hiller
Copy link
Owner

Yes, feel free to reach out on Discord (fabianhiller) or express your pain points and ideas here in the comments.

@Demivan
Copy link
Contributor

Demivan commented Feb 15, 2024

Added you in Discord.
The biggest pain point is the issue with undefined. As most of the fields don't have any data when the form is initially displayed, they are failing with "type" issue and pipelines are not executed.
Other issue is the one that @ivands experienced, transform and pipe are executed with invalid data, because they are executed regardless of whether other pipelines fail.

@ivands
Copy link
Author

ivands commented Feb 15, 2024

I don't think we can solve the problem of showing every relevant error in a form with Valibot.
The only solution I can think of will require duplicate code.
Let's look at a plain JS example:

function assert_valid_form(input) {
	if (typeof input.password !== "string") {
		throw new Error('Password is required');
	}
	
	if (typeof input.confirm !== "string") {
		throw new Error('Confirm password is required');
	}

	if (input.password !== input.confirm) {
		throw new Error('Password should match');
	}
}

This assert function is fine but can't show all the relevant errors.
Let's see what happens when we add this ability.

function assert_password_type(input) {
	if (typeof input.password !== "string") {
		throw new Error('Password is required');
	}
}

function assert_confirm_type(input) {
	if (typeof input.confirm !== "string") {
		throw new Error('Confirm password is required');
	}
}

function assert_if_password_equals_confirm(input) {
	if (typeof input.password !== "string" || typeof input.confirm !== "string" || input.password !== input.confirm) {
		throw new Error('Passwords should match');
	}
}

I broke up every relevant error into its own function so that we can execute them in parallel and catch every error.
Notice we now have duplicate code going on.
The assert_if_password_equals_confirm function needs to check the types as well.
The reason is that some checks might require the input to be in some sort of structure or type.

This problem doesn't magically go away when using Valibot.
Validations could be coupled together.
And if you want different error messages for each case, you'll always end up with duplicate code.
Please correct me if I'm wrong on this.

Now let's see how we could write this in Valibot.

const schema1 = object({
	password: string('Invalid password'),
	confirm: string('Invalid confirm password'),
})

// We would need a function to rewrite all error messages inside the schema.
const schema2 = rewriteError(object({
	password: string(),
	confirm: string(),
}, [
	custom((input) => {
		return input.password === input.confirm
	})
]), 'Passwords should match')

// We would need to validate multiple schemas at once.
parse([ schema1, schema2 ], input)

I think we should revert the code to its original behavior.
And maybe add something like this.
This is just a quick concept tho.
Let me know if I made any mistakes along the way.

@fabian-hiller
Copy link
Owner

The biggest pain point is the issue with undefined. As most of the fields don't have any data when the form is initially displayed, they are failing with "type" issue and pipelines are not executed.

Can you explain why they are undefined? An undefined value should lead to an empty string with an <input /> element. Do you use it with any form library? What framework are you using?

Other issue is the one that @ivands experienced, transform and pipe are executed with invalid data, because they are executed regardless of whether other pipelines fail.

For pipes, you can set abortPipeEarly to true to control this behavior. Or do you need an more granular approach? For transform I am toying with the idea of not executing it if there are any issues. But first I need to think about the drawbacks.

@fabian-hiller
Copy link
Owner

@ivands the default value of an <input /> element should be an empty string. Therefore, this code should never fail the type check. Or am I wrong?

I don't think we can solve the problem of showing every relevant error in a form with Valibot.

That's probably true. With Valibot I try to find the best compromises for a very good overall experience.

@Demivan
Copy link
Contributor

Demivan commented Feb 16, 2024

Can you explain why they are undefined? An undefined value should lead to an empty string with an element. Do you use it with any form library? What framework are you using?

I'm using own custom form library. Using Vue.js with Quasar components.
Simplest example of undefined input is a select list that has no option selected yet.

For pipes, you can set abortPipeEarly to true to control this behavior. Or do you need an more granular approach?

About abortPipeEarly - I don't want all pipes to abort, just some. Would be nice to mark some pipe as required so the next pipe will not run.

For transform I am toying with the idea of not executing it if there are any issues. But first I need to think about the drawbacks.

This will lead to inccorect types in parent pipes if child transform is not executed. But executing transform without checking for valid input is not good either.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Feb 16, 2024

Simplest example of undefined input is a select list that has no option selected yet.

It should still be an empty string, but I understand the problem, e.g. when using picklist. Any ideas how Valibot could do better here? How do other schema libraries handle this?

I don't want all pipes to abort, just some. Would be nice to mark some pipe as required so the next pipe will not run.

Thanks for this feedback! Do you have any ideas on how we could design the API for use cases like this?

This will lead to inccorect types in parent pipes if child transform is not executed. But executing transform without checking for valid input is not good either.

This is not necessarily true. We can change the implementation to run transform only if there is no issue, and in the other case we do not run the transformation and set .typed to false.

@Demivan
Copy link
Contributor

Demivan commented Feb 16, 2024

Any ideas how Valibot could do better here? How do other schema libraries handle this?

Right now the issue is the "type" check. It allows to bypass some input validation and causes transforms and pipes to run on potentially invalid data. I don't think it gives enough to valibot and causes confusion and inconsistency.

I like zod transform and refine solution. They are a part of the same pipeline, output of one step is passed as input to the next. And transform is not executed if there are any validation issues. refine is executed, but you can mark issue as fatal and prevent execution on next refines in the pipeline.

Do you have any ideas on how we could design the API for use cases like this?

It should be possible to mark issues as fatal: true. With this I don't think abortPipeEarly is even needed.

@fabian-hiller
Copy link
Owner

Can you provide code examples of Zod's API and the fatal configuration for Valibot that you have in mind?

@Demivan
Copy link
Contributor

Demivan commented Feb 17, 2024

Zod refine/transform API https://zod.dev/?id=relationship-to-refinements:

const nameToGreeting = z
  .string()
  // Transform will not be called on non-strings. Matches valibot "type" check
  .transform((val) => val.toUpperCase())
  .refine((val) => val.length > 15)
  // Tranform will not be called if length is <= 15
  .transform((val) => `Hello ${val}`)
  .refine((val) => val.indexOf("!") === -1);

Fatal issues https://zod.dev/?id=abort-early:

const schema = z.number()
  .superRefine((val, ctx) => {
    if (val < 10) {
      ctx.addIssue({
        code: z.ZodIssueCode.custom,
        message: "should be >= 10",
        // Mark issue as fatal preventing next refine/pipe execution
        fatal: true,
      });

      // Need to return custom marker type too
      return z.NEVER;
    }
  })
  // Will not be executed if val < 10
  .refine(val => val !== 12);

@ivands
Copy link
Author

ivands commented Feb 17, 2024

I have some questions about this approach.

  1. Will refine also not be called because there is a transformation between the pipeline?
  2. Are all type check validations fatal? ( They probably should be )
  3. How would you express that some validations could be considered fatal and some aren't? For example, I might have a place where a maxLength should be considered fatal and some other place where it isn't.

@Demivan
Copy link
Contributor

Demivan commented Feb 17, 2024

  1. Yes. Transform will interrupt the pipeline if it is not executed for some reason.
  2. Yes
  3. You would use custom validation (refine) instead of maxLength. Native validations are not fatal because all of them can work with proper type.
    In the real world application, you would probably still use maxLength and then have another refine/transform that checks that maxLength validation did not fail, if that is important for your application. You have access to context with all issues and can check for it - same as in valibot Transformation action now executes after failed schema check #436 (comment)

@ivands
Copy link
Author

ivands commented Feb 17, 2024

And how about outer pipelines.

Example:

object({
  field: transform(string(), v => v.length),
}, [
  // This pipeline should probably also not execute because of the transform.
  custom(v => {
    return v.field === 10
  })
])

@Demivan
Copy link
Contributor

Demivan commented Feb 17, 2024

Yep. With zod that one will not execute if field is not a string or if any refine fails.

This leads to one of the downsides of zod that is not addressed by it. If any field fails, object pipeline/transform is not executed, preventing "compare two fields" refine from being executed even if those two field are valid.

I have contributed partialRefine to zod but it was never merged: colinhacks/zod#2360

@ivands
Copy link
Author

ivands commented Feb 17, 2024

This isn't a downside of zod tho.
Pipeline execution just can't continue.
The same problem exists in plane JS.
I feel like we are trying to solve something that fundamentally can't be solved (or only with weird edge cases).
Can anyone tell me why breaking down this problem into multiple schemas isn't desirable?
Because this "fatal" error flag doesn't do anything to solve the main issue; it's just a different way of telling Valibot to continue or terminate pipeline execution.

@Demivan
Copy link
Contributor

Demivan commented Feb 17, 2024

The same problem exists in #436 (comment).

It only exists when using exceptions:

function validate_password_type(input, issues) {
	if (typeof input.password !== "string") {
		issues['password'].push('Password is required')
	}
}

function validate_confirm_type(input, issues) {
	if (typeof input.confirm !== "string") {
		issues['confirm'].push('Confirm password is required')
	}
}

function validate_if_password_equals_confirm(input, issues) {
	if (issues['password'].length !== 0 || issues['confirm'].length !== 0) {
		return
	}

	if (input.password !== input.confirm) {
		issues['confirm'].push('Passwords should match');
	}
}

function validate_schema(input) {
	const issues = {
		'password': [],
		'confirm': []
	}
	validate_password_type(input, issues)
	validate_confirm_type(input, issues)
	validate_if_password_equals_confirm(input, issues)
	return issues
}


const result = validate_schema({
	'password': 'a',
	'confirm': 'b'
})

Can anyone tell me why breaking down this problem into multiple schemas isn't desirable?

It is a nice idea. You don't even need to define schema twice, you can use pick to pick properties that you want to check separately. But it would be nice to have a more convenient way of writing this for such a common use-case.

Because this "fatal" error flag doesn't do anything to solve the main issue; it's just a different way of telling Valibot to continue or terminate pipeline execution.

"fatal" flag would cause validations to terminate. But transforms should not be run anyway, I agree with you on this one.

The reason both valibot and zod try to run validations even if previous validations fail, is to surface more issues. If you don't need that, you can use abortPipeEarly.

@ivands
Copy link
Author

ivands commented Feb 17, 2024

It only exists when using exceptions:

I'm having a hard time trying to explain this with just text and some example code.
So, what I'm trying to say might not be that clear.
My point was that it's not possible to show all relevant errors without doing some checks twice.

function validate_password_type(input, issues) {
	if (typeof input.password !== "string") {
		issues['password'].push('Password is required')
	}
}

function validate_confirm_type(input, issues) {
	if (typeof input.confirm !== "string") {
		issues['confirm'].push('Confirm password is required')
	}
}

function validate_if_password_equals_confirm(input, issues) {
	// This part is the same as doing the type checks again.
	// Looking at the issues array to see if the type checks failed is another way of doing the same thing.
	if (issues['password'].length !== 0 || issues['confirm'].length !== 0) {
		return
	}
	
	// Trying to run this code without doing the type checks is what you're trying to do with Valibot.
	// But as you can see from this example, that's impossible.
	// You might think we can decide what kind of validations are FATAL and what kind aren't.
	// But I argue that's up to the developer writing the custom validation.
	// Hence, you might as well write a second schema.
	// Because the relation between different validations can't be defined in Valibot.
	if (input.password !== input.confirm) {
		issues['confirm'].push('Passwords should match');
	}
}

If you agree that transformations should only run when no issues surface, then that should also extend to outer pipelines.
Because they both suffer from the same problem.
Example:

object({
  field: string([ maxLength(5) ])
}, [
  // This pipeline shouldn't run because of the same problems with transform.
  // Users will have an expectation that the object schema passed.
  custom(v => v.field.length > 10)
])

I hope this makes sense.

@Demivan
Copy link
Contributor

Demivan commented Feb 17, 2024

I agree with both of your examples. By default, it should be like you are saying, both transformations and outer pipelines should not run if any field is invalid.

This part is the same as doing the type checks again.

It is not. You are not actually doing the work twice. And validations could be more complex. And you are not writing the code twice.

Trying to run this code without doing the type checks is what you're trying to do with Valibot.

I'm not trying to do that. As you can see from my code example, I did not remove checks for issues.

My suggestion is to make this check explicit:

if (issues['password'].length !== 0 || issues['confirm'].length !== 0) {

const schema = object(
  {
    username: string(),
    password: string([length(100)]),
    confirm: string(),
  },
  [
    customPartial(
      // User explicitly saying that it is fine to run this validation with
      // just "password" and "confirm" valid
      ['password', 'confirm'],
      (input) => input.password === input.confirm
    ),
    // This will not run if any field is invalid
    custom(input => input.username.length > 5)
  ]
);

@ivands
Copy link
Author

ivands commented Feb 17, 2024

My apologies then. I was confused with the current solution that Fabian introduced and your proposal I guess.
I actually like your customPartial proposal the best.

customPartial(
  [
     //You're being explicit about the validations you're relying on. (That's the part that was missing in the current solution.)
     // It could also support field paths
     ['deepObject', 'password'],
     ['deepObject', 'confirm'],
  ],
  (input) => input.deepObject.password === input.deepObject.confirm
)

And we could create a TS type that will filters out everything except the relevant fields inside the input.

type Input = {
  deepObject: {
    password: string
    confirm: string
  }
}

@fabian-hiller
Copy link
Owner

I had a call with @Demivan and thought about the current behavior of transform. I agree, and suspect that most users expect it to be executed only when the input has no issues. So I plan to change the implementation of transform. For now, I don't plan to make this behavior configurable, because I don't want to add configurations that aren't needed. But I am open to feedback on this and may add it later.

@fabian-hiller
Copy link
Owner

v0.29.0 is available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intended The behavior is expected
Projects
None yet
Development

No branches or pull requests

3 participants