[Proposal] Add validation messages to Specs #83

kieraneglin · 2020-05-25T17:13:50Z

Context: #76

Problem statement

Norm provides fantastic validation utilities but the errors generated aren't end-user friendly, leading to a split in our validation tooling. In order to use Norm for validation across our stack we'd need the ability to have human-readable errors generated while supporting localization.

Design goals

Clarity
Supplied validation messages should be specific to a logical validation unit. A message should give context about a failure rather than serving as a catch-all statement.

Compatibility
Validation messages should be opt-in and should not break existing implementations.

Validation safety
Specs should fail fast so that potentially invalid data isn't fed to further specs for the same attribute.

Attempted approaches

Return tuples
A spec can now optionally return a tuple of {boolean(), String.t()} where boolean() is the result of the validation and String.t() is a validation message.

This approach created ugly specs (think spec(&({String.length(&1) > 0, "must not be blank}))), didn't work for built-ins like is_integer, and made validation message composition very difficult.

Per-spec messages
A spec can optionally accept a second argument of a validation message. This solves the shortcomings of the above approach, but fails on the design goal of clarity. To address this, I've added the ability to create a chained spec from a series of simpler specs. This is the approach I ended up going with.

Proposed approach

API

integer_spec = spec(is_integer, "must be an integer")
conform(1, integer_spec) # {:ok, 1}
conform("", integer_spec) # {:error, %{ ... message: "must be an integer" ...}}

safe_spec = [
  spec(is_binary, "must be a binary"),
  spec(&(String.length(&1) > 0), "can't be blank")
]
conform(1, safe_spec) # `1` would cause `String.length` to blow up, but since it fails fast this never happens
conform("", safe_spec) # {:error, %{ ... message: "can't be blank" ...}}

good_messages_spec = [
  spec(is_binary, "must be a string"),
  spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
  spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
]

# Untested
schema(%User{
  username: [
    spec(is_binary),
    spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters")
  ],
  email: spec(at_symbol_spec, "must contain a @"),
  age: spec(is_integer)
})

Upsides

Clear validation messages
Subjectively clear API
Supports localization
Mostly backward-compatible (see "downsides")
Specs fail fast, returning the first error

Downsides

Seemingly ambiguous syntax
Based on some pattern matching features of Norm as well as pattern matching with Elixir in general, you might write a spec like this:

list_spec = [
  spec(is_binary),
  spec(is_integer),
  spec(is_atom)
]

conform(["", 1, :wow], list_spec) # Seems that a pattern matching-like approach would work here.

This could be alleviated by using a specific method/type to disambiguate chained specs. Something like this:

list_spec = Spec.chain([ ... ]) # Returns %Spec.Chain{}

Chained specs are exclusively "and"
Norm supports and/or for composing specs. While these are supported within a single spec in the chain, the individual specs are treated with "and" logic. Example:

chained_spec = [
  spec(is_binary),
  spec(&(String.length(&1) > 0))
]
# is _mostly_ equivalent to:
spec(is_binary and &(String.length(&1) > 0))

This means you can't have or logic between elements on a chained spec.

In practice I've found this to be both logically clear and desirable, but I'm use there's a use case that would be negatively impacted by this.

Not 100% backward-compatible
Even if you don't use a message, the message key still exists in the returned spec struct with a value of nil. While this doesn't modify any logic or names within the spec struct, strict pattern matching checks could fail. This is compounded by the fact that Norm, being a validation library, allows people to confidently check for the exact structure of a validation response in their tests or even their application code.

I'm sure there's a path forward here, but it isn't addressed in this initial POC.

Wrapping up

This PR is meant to be both a low-effort code change and a discussion around different approaches and their pros + cons.

The code changes work as outlined, but some tests need to be updated (due to that backwards-compatibility issue) and many more tests need to be written if there's interest in this approach.

keathley · 2020-05-30T00:06:22Z

@kieraneglin This is an amazing writeup! Thanks so much for putting the time in. I've been digesting this some. I'll leave some more comments soon. But I wanted you to know that I'd seen this and I'm thinking through it.

baldwindavid · 2020-06-04T18:41:48Z

Thanks for thinking this through. A few notes:

I like the terseness of just using a list for a chain of specs. However, given that norm already matches on tuples and atoms, my first thought might have been that it was matching a list rather than being a special chain syntax. I guess the syntax just feels a little surprising to me and I might expect something more explicit like your mention of using chain. Maybe a name like specs, which builds on top of the already understood spec name.

I have been using contract throughout my codebase, but haven't really used norm elsewhere. Validations are one place I might consider it, but Ecto changesets mostly occupy that space for me. Do you in any way see this is a replacement or supplement to changesets? There are some cases where I might have a changeset and want to do a bit of further validation using specs already available to me via norm at the callsite, but continue to use the changeset after. Or perhaps, someone likes changesets, but wants to use norm in place of the validate_* functions provided by changesets.

I could see this working with your API by using a simple helper like the following:

def conform_changeset(changeset, attribute_name, spec) do
  case changeset
       |> apply_changes()
       |> conform(spec) do
    {:error, %{message: message}} -> Ecto.Changeset.add_error(attribute_name, message)
    _ -> changeset
  end
end

Using some of your examples, but with the specs syntax, it might be used something like...

good_messages_spec = specs([
  spec(is_binary, "must be a string"),
  spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
  spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
])

user
|> User.changeset(attrs)
|> conform_changeset(:username, schema(%User{username: good_messages_spec}))
# returns a changeset

Those are validations that I would probably have done via the changeset, but just trying to demonstrate usage.

One difference between this and changesets is that it fails fast. Other than for constraints, changesets will validate everything in one go. To match that behavior, perhaps specs (or chain or whatever) could take a second boolean argument that instructs whether to fail fast or not. That helper method could then be changed to support multiple errors...

def conform_changeset(changeset, attribute_name, spec) do
  case changeset
       |> apply_changes()
       |> conform(spec) do
    {:error, %{message: message}} ->
      Ecto.Changeset.add_error(attribute_name, message)

    {:error, errors} ->
      Enum.reduce(errors, changeset, fn %{message: message}, changeset ->
        Ecto.Changeset.add_error(attribute_name, message)
      end)

    _ ->
      changeset
  end
end

If failing fast was not the desire, it could be set on the spec...

good_messages_spec = specs(
  [
    spec(is_binary, "must be a string"),
    spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
    spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
  ],
  false # do not fail fast
)

In some ways, it might be purer to specify whether to fail fast as an option to conform, but adding options to that method might be a non-starter.

I understand if changesets are irrelevant to your intent here.

Either way, nice error messages would be a welcome addition to the library. Thanks!

baldwindavid · 2020-06-04T18:46:25Z

After posting the above, I noticed the "Context" link in the description above. haha
Based upon that, it appears that the intent is, in fact, to supplement changesets. That's great. I'm going to leave my message there as it notes some syntax considerations. Thanks again!

kieraneglin · 2020-06-04T19:19:59Z

@baldwindavid thank you for the input! With regards to changesets specifically, I have a few more thoughts.

To confirm your findings, this was initially planned to tie into changesets to allow unification of our validation logic via Norm (for others, direct link to my first comment).

To follow up on the failing-fast comments, I still think that failing fast in the context of a changeset is the preferable mode of operation. I also acknowledge that my thoughts on this are heavily influenced since there's a specific use case that I'm working toward. To remove ambiguity, when I say "fail fast" I mean that within the context of a individual chained spec - it would not affect the behavior of the rest of the changeset or even other specs within the schema. As a few contrived examples:

def changeset(record, params) do
  record
  |> cast(params, @allowed_params)
  |> validate_required([:username, :email])
  |> validate_schema(valid_record_schema)
  |> validate_format(:email, ~r/@/) # I would still expect this to run, independent of `valid_record_schema`'s fail-fast behaviour
end

In this case, the proposed new API + my basic changeset helper wouldn't alter the behavior of the subsequent validations.

Since failing fast is scoped to a single chained spec within a schema, other Norm schema validations would still run like so:

def changeset(record, params) do
  user_schema = schema(%{
    username: chained([
      spec(is_binary, "must be a string"),
      spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters")
    ]),
    email: spec(at_symbol_spec, "must contain a @"),
    first_name: spec(is_binary, "must be a string")
  })
  
  record
  |> cast(params, @allowed_params)
  |> validate_schema(user_schema)
end

changeset(record, %{ username: 1, email: "not-real", first_name: "Kieran" })
# would roughly return
# %Changeset{
#   valid?: false,
#   errors: [
#     { username: "must be a string" },
#     { email: "must contain a @" },
#   ]
# }

Edit: I forgot to account for what cast actually does so this example is a little wonky. Hopefully the point is still illustrated regardless.

The reason I'm partial to this behavior is that it gives the user enough context to start solving all present issues but without feeding potentially invalid data down the chained spec. It also prevents an overload of redundant errors ("Username must be present" + "Username must be a string" + "Username must be at least 3 characters" being shown to the user all at once isn't ideal).

I don't think it applies in this case, but as a backup there's nothing stopping you from having multiple validate_schema's in your changeset each checking different things.

I'd be interested to hear your thoughts on this!

baldwindavid · 2020-06-04T19:45:09Z

@kieraneglin - Thanks for the message. You're right that it is flexible enough that one can choose to break these into separate specs outside of chained if they really want to return multiple errors.

After further consideration, introducing an option for whether to fail fast or not probably just adds more mental overhead than it is worth, so it is probably best to just choose one or the other.

Even so, I would probably still lean toward not failing fast (returning multiple errors) because that is how Ecto changesets do it. Additionally, I actually prefer returning multiple errors for something like an invalid password. Either way is workable though.

kieraneglin · 2020-06-04T20:10:44Z

@baldwindavid I'm on mobile so this will be a pretty minimal example, but another thing worth mentioning is that this doesn't break the existing mechanism for chaining specs. It may not solve your case, but this is yet another option for chain composition:

username_spec = chained([
  spec(must_exist and is_binary and is_right_length, "must be correct"),
  spec(no_bad_words, "can't contain bad words")
])

Fast failure still happens but it allows you to group specs as needed.

baldwindavid · 2020-06-04T21:18:48Z

@kieraneglin Yep, I really like the addition of the validation message per spec. But the more I think about it, I'm really wondering if that could be its own PR decoupled from chaining. The concept of chained is kind of cool, but not sure it is 100% necessary to achieve the necessary validation functionality. It's something that one can easily add as a helper in their own codebase that fails fast or not depending upon their preference. I'm not against it being added, but don't think the whole thing falls apart without it.

keathley · 2020-06-05T14:35:32Z

I like the idea of having a way to compose specs together. In fact, I started adding an AllOf struct to support this a while ago: https://github.com/keathley/norm/blob/master/lib/norm/core/all_of.ex. It's not exposed by the API. But it should be relatively straightforward to get it tested an add it. AllOf should be able to work with any Conformable. So it's not just limited to specs. We support "or" type logic with OneOf. With AllOf we should be able to provide "and" logic. I believe that the semantics are equivalent between AllOf and chained but let me know if I'm missing something here.

Overall I like the idea of adding custom messages with spec. I need to play with it to see how it feels if there are deeply nested schemas and selections. But overall I like the approach.

fuelen · 2021-06-07T15:52:20Z

Hey, guys!
Any movements around this issue? Is there anything I can help with? Really interested in this feature.

keathley · 2021-06-22T14:05:00Z

Its in my stack to evaluate. Just have to pop enough items to get back down to it

Updated specs to support messages

bc750e8

kieraneglin mentioned this pull request May 25, 2020

[Suggestion] Support for validation messaging #76

Open

keathley mentioned this pull request Jun 9, 2020

Add all_of #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Add validation messages to Specs #83

[Proposal] Add validation messages to Specs #83

kieraneglin commented May 25, 2020 •

edited

Loading

keathley commented May 30, 2020

baldwindavid commented Jun 4, 2020

baldwindavid commented Jun 4, 2020

kieraneglin commented Jun 4, 2020 •

edited

Loading

baldwindavid commented Jun 4, 2020

kieraneglin commented Jun 4, 2020

baldwindavid commented Jun 4, 2020

keathley commented Jun 5, 2020

fuelen commented Jun 7, 2021

keathley commented Jun 22, 2021

[Proposal] Add validation messages to Specs #83

Are you sure you want to change the base?

[Proposal] Add validation messages to Specs #83

Conversation

kieraneglin commented May 25, 2020 • edited Loading

Problem statement

Design goals

Attempted approaches

Proposed approach

API

Upsides

Downsides

Wrapping up

keathley commented May 30, 2020

baldwindavid commented Jun 4, 2020

baldwindavid commented Jun 4, 2020

kieraneglin commented Jun 4, 2020 • edited Loading

baldwindavid commented Jun 4, 2020

kieraneglin commented Jun 4, 2020

baldwindavid commented Jun 4, 2020

keathley commented Jun 5, 2020

fuelen commented Jun 7, 2021

keathley commented Jun 22, 2021

kieraneglin commented May 25, 2020 •

edited

Loading

kieraneglin commented Jun 4, 2020 •

edited

Loading