Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add validation messages to Specs #83

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kieraneglin
Copy link

@kieraneglin kieraneglin commented May 25, 2020

Context: #76

Problem statement

Norm provides fantastic validation utilities but the errors generated aren't end-user friendly, leading to a split in our validation tooling. In order to use Norm for validation across our stack we'd need the ability to have human-readable errors generated while supporting localization.

Design goals

Clarity
Supplied validation messages should be specific to a logical validation unit. A message should give context about a failure rather than serving as a catch-all statement.

Compatibility
Validation messages should be opt-in and should not break existing implementations.

Validation safety
Specs should fail fast so that potentially invalid data isn't fed to further specs for the same attribute.

Attempted approaches

Return tuples
A spec can now optionally return a tuple of {boolean(), String.t()} where boolean() is the result of the validation and String.t() is a validation message.

This approach created ugly specs (think spec(&({String.length(&1) > 0, "must not be blank}))), didn't work for built-ins like is_integer, and made validation message composition very difficult.

Per-spec messages
A spec can optionally accept a second argument of a validation message. This solves the shortcomings of the above approach, but fails on the design goal of clarity. To address this, I've added the ability to create a chained spec from a series of simpler specs. This is the approach I ended up going with.

Proposed approach

API

integer_spec = spec(is_integer, "must be an integer")
conform(1, integer_spec) # {:ok, 1}
conform("", integer_spec) # {:error, %{ ... message: "must be an integer" ...}}

safe_spec = [
  spec(is_binary, "must be a binary"),
  spec(&(String.length(&1) > 0), "can't be blank")
]
conform(1, safe_spec) # `1` would cause `String.length` to blow up, but since it fails fast this never happens
conform("", safe_spec) # {:error, %{ ... message: "can't be blank" ...}}

good_messages_spec = [
  spec(is_binary, "must be a string"),
  spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
  spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
]

# Untested
schema(%User{
  username: [
    spec(is_binary),
    spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters")
  ],
  email: spec(at_symbol_spec, "must contain a @"),
  age: spec(is_integer)
})

Upsides

  • Clear validation messages
  • Subjectively clear API
  • Supports localization
  • Mostly backward-compatible (see "downsides")
  • Specs fail fast, returning the first error

Downsides

Seemingly ambiguous syntax
Based on some pattern matching features of Norm as well as pattern matching with Elixir in general, you might write a spec like this:

list_spec = [
  spec(is_binary),
  spec(is_integer),
  spec(is_atom)
]

conform(["", 1, :wow], list_spec) # Seems that a pattern matching-like approach would work here.

This could be alleviated by using a specific method/type to disambiguate chained specs. Something like this:

list_spec = Spec.chain([ ... ]) # Returns %Spec.Chain{}

Chained specs are exclusively "and"
Norm supports and/or for composing specs. While these are supported within a single spec in the chain, the individual specs are treated with "and" logic. Example:

chained_spec = [
  spec(is_binary),
  spec(&(String.length(&1) > 0))
]
# is _mostly_ equivalent to:
spec(is_binary and &(String.length(&1) > 0))

This means you can't have or logic between elements on a chained spec.

In practice I've found this to be both logically clear and desirable, but I'm use there's a use case that would be negatively impacted by this.

Not 100% backward-compatible
Even if you don't use a message, the message key still exists in the returned spec struct with a value of nil. While this doesn't modify any logic or names within the spec struct, strict pattern matching checks could fail. This is compounded by the fact that Norm, being a validation library, allows people to confidently check for the exact structure of a validation response in their tests or even their application code.

I'm sure there's a path forward here, but it isn't addressed in this initial POC.

Wrapping up

This PR is meant to be both a low-effort code change and a discussion around different approaches and their pros + cons.

The code changes work as outlined, but some tests need to be updated (due to that backwards-compatibility issue) and many more tests need to be written if there's interest in this approach.

@keathley
Copy link
Member

@kieraneglin This is an amazing writeup! Thanks so much for putting the time in. I've been digesting this some. I'll leave some more comments soon. But I wanted you to know that I'd seen this and I'm thinking through it.

@baldwindavid
Copy link

Thanks for thinking this through. A few notes:

I like the terseness of just using a list for a chain of specs. However, given that norm already matches on tuples and atoms, my first thought might have been that it was matching a list rather than being a special chain syntax. I guess the syntax just feels a little surprising to me and I might expect something more explicit like your mention of using chain. Maybe a name like specs, which builds on top of the already understood spec name.

I have been using contract throughout my codebase, but haven't really used norm elsewhere. Validations are one place I might consider it, but Ecto changesets mostly occupy that space for me. Do you in any way see this is a replacement or supplement to changesets? There are some cases where I might have a changeset and want to do a bit of further validation using specs already available to me via norm at the callsite, but continue to use the changeset after. Or perhaps, someone likes changesets, but wants to use norm in place of the validate_* functions provided by changesets.

I could see this working with your API by using a simple helper like the following:

def conform_changeset(changeset, attribute_name, spec) do
  case changeset
       |> apply_changes()
       |> conform(spec) do
    {:error, %{message: message}} -> Ecto.Changeset.add_error(attribute_name, message)
    _ -> changeset
  end
end

Using some of your examples, but with the specs syntax, it might be used something like...

good_messages_spec = specs([
  spec(is_binary, "must be a string"),
  spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
  spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
])

user
|> User.changeset(attrs)
|> conform_changeset(:username, schema(%User{username: good_messages_spec}))
# returns a changeset

Those are validations that I would probably have done via the changeset, but just trying to demonstrate usage.

One difference between this and changesets is that it fails fast. Other than for constraints, changesets will validate everything in one go. To match that behavior, perhaps specs (or chain or whatever) could take a second boolean argument that instructs whether to fail fast or not. That helper method could then be changed to support multiple errors...

def conform_changeset(changeset, attribute_name, spec) do
  case changeset
       |> apply_changes()
       |> conform(spec) do
    {:error, %{message: message}} ->
      Ecto.Changeset.add_error(attribute_name, message)

    {:error, errors} ->
      Enum.reduce(errors, changeset, fn %{message: message}, changeset ->
        Ecto.Changeset.add_error(attribute_name, message)
      end)

    _ ->
      changeset
  end
end

If failing fast was not the desire, it could be set on the spec...

good_messages_spec = specs(
  [
    spec(is_binary, "must be a string"),
    spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters"),
    spec(&(!String.contains(&1, "heck")), "can't contain any forbidden words")
  ],
  false # do not fail fast
)

In some ways, it might be purer to specify whether to fail fast as an option to conform, but adding options to that method might be a non-starter.

I understand if changesets are irrelevant to your intent here.

Either way, nice error messages would be a welcome addition to the library. Thanks!

@baldwindavid
Copy link

After posting the above, I noticed the "Context" link in the description above. haha
Based upon that, it appears that the intent is, in fact, to supplement changesets. That's great. I'm going to leave my message there as it notes some syntax considerations. Thanks again!

@kieraneglin
Copy link
Author

kieraneglin commented Jun 4, 2020

@baldwindavid thank you for the input! With regards to changesets specifically, I have a few more thoughts.

To confirm your findings, this was initially planned to tie into changesets to allow unification of our validation logic via Norm (for others, direct link to my first comment).

To follow up on the failing-fast comments, I still think that failing fast in the context of a changeset is the preferable mode of operation. I also acknowledge that my thoughts on this are heavily influenced since there's a specific use case that I'm working toward. To remove ambiguity, when I say "fail fast" I mean that within the context of a individual chained spec - it would not affect the behavior of the rest of the changeset or even other specs within the schema. As a few contrived examples:

def changeset(record, params) do
  record
  |> cast(params, @allowed_params)
  |> validate_required([:username, :email])
  |> validate_schema(valid_record_schema)
  |> validate_format(:email, ~r/@/) # I would still expect this to run, independent of `valid_record_schema`'s fail-fast behaviour
end

In this case, the proposed new API + my basic changeset helper wouldn't alter the behavior of the subsequent validations.

Since failing fast is scoped to a single chained spec within a schema, other Norm schema validations would still run like so:

def changeset(record, params) do
  user_schema = schema(%{
    username: chained([
      spec(is_binary, "must be a string"),
      spec(&(String.length(&1) in 3..15, "must be between 3 and 15 characters")
    ]),
    email: spec(at_symbol_spec, "must contain a @"),
    first_name: spec(is_binary, "must be a string")
  })
  
  record
  |> cast(params, @allowed_params)
  |> validate_schema(user_schema)
end

changeset(record, %{ username: 1, email: "not-real", first_name: "Kieran" })
# would roughly return
# %Changeset{
#   valid?: false,
#   errors: [
#     { username: "must be a string" },
#     { email: "must contain a @" },
#   ]
# }

Edit: I forgot to account for what cast actually does so this example is a little wonky. Hopefully the point is still illustrated regardless.

The reason I'm partial to this behavior is that it gives the user enough context to start solving all present issues but without feeding potentially invalid data down the chained spec. It also prevents an overload of redundant errors ("Username must be present" + "Username must be a string" + "Username must be at least 3 characters" being shown to the user all at once isn't ideal).

I don't think it applies in this case, but as a backup there's nothing stopping you from having multiple validate_schema's in your changeset each checking different things.

I'd be interested to hear your thoughts on this!

@baldwindavid
Copy link

@kieraneglin - Thanks for the message. You're right that it is flexible enough that one can choose to break these into separate specs outside of chained if they really want to return multiple errors.

After further consideration, introducing an option for whether to fail fast or not probably just adds more mental overhead than it is worth, so it is probably best to just choose one or the other.

Even so, I would probably still lean toward not failing fast (returning multiple errors) because that is how Ecto changesets do it. Additionally, I actually prefer returning multiple errors for something like an invalid password. Either way is workable though.

@kieraneglin
Copy link
Author

@baldwindavid I'm on mobile so this will be a pretty minimal example, but another thing worth mentioning is that this doesn't break the existing mechanism for chaining specs. It may not solve your case, but this is yet another option for chain composition:

username_spec = chained([
  spec(must_exist and is_binary and is_right_length, "must be correct"),
  spec(no_bad_words, "can't contain bad words")
])

Fast failure still happens but it allows you to group specs as needed.

@baldwindavid
Copy link

@kieraneglin Yep, I really like the addition of the validation message per spec. But the more I think about it, I'm really wondering if that could be its own PR decoupled from chaining. The concept of chained is kind of cool, but not sure it is 100% necessary to achieve the necessary validation functionality. It's something that one can easily add as a helper in their own codebase that fails fast or not depending upon their preference. I'm not against it being added, but don't think the whole thing falls apart without it.

@keathley
Copy link
Member

keathley commented Jun 5, 2020

I like the idea of having a way to compose specs together. In fact, I started adding an AllOf struct to support this a while ago: https://github.com/keathley/norm/blob/master/lib/norm/core/all_of.ex. It's not exposed by the API. But it should be relatively straightforward to get it tested an add it. AllOf should be able to work with any Conformable. So it's not just limited to specs. We support "or" type logic with OneOf. With AllOf we should be able to provide "and" logic. I believe that the semantics are equivalent between AllOf and chained but let me know if I'm missing something here.

Overall I like the idea of adding custom messages with spec. I need to play with it to see how it feels if there are deeply nested schemas and selections. But overall I like the approach.

@keathley keathley mentioned this pull request Jun 9, 2020
@fuelen
Copy link

fuelen commented Jun 7, 2021

Hey, guys!
Any movements around this issue? Is there anything I can help with? Really interested in this feature.

@keathley
Copy link
Member

Its in my stack to evaluate. Just have to pop enough items to get back down to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants