Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation constraints to API struct tags #8116

Closed
jimmidyson opened this issue May 12, 2015 · 28 comments
Closed

Add validation constraints to API struct tags #8116

jimmidyson opened this issue May 12, 2015 · 28 comments
Labels
area/api Indicates an issue on api area. area/client-libraries area/swagger kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.

Comments

@jimmidyson
Copy link
Member

Right now the validation constraints on API resources are in comments. Adding these into tags would make validation more generic in code, but also would allow easier generation & publication of a JSON schema for comsumption in other languages.

A bit of background, we produce a JSON schema & Java POJOs in fabric8 at https://github.com/fabric8io/origin-schema-generator/. Right now adding in validation constraints is a bit clunky. Adding validation constraints into tags would allow us to generate a cleaner JSON schema with clearer constraints.

@nikhiljindal
Copy link
Contributor

We have "omitempty" tags to indicate whether the field is required or optional.
What other tags do you have in mind?

cc @bgrant0607

@nikhiljindal nikhiljindal added kind/documentation Categorizes issue or PR as related to documentation. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels May 12, 2015
@bgrant0607
Copy link
Member

See also emicklei/go-restful#198

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label May 12, 2015
@jimmidyson
Copy link
Member Author

Yeah we're using the omitempty stuff to drive the json schema for required, but I'm interested in adding in validation constraints to the json schema like pattern, min, max, etc. You can see our current generated schema at http://repo1.maven.org/maven2/io/fabric8/schemagenerator/kubernetes-model/0.0.27/kubernetes-model-0.0.27-schema.json including pattern validation such as:

name: {
  type: "string",
  description: "name of the container; must be a DNS_LABEL and unique within the pod; cannot be updated",
  maxLength: 63,
  pattern: "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
}

for name as a DNS_LABEL (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/v1beta3/types.go#L37-L40).

I'd like to be able to do the same for all properties that must conform to certain constraints. This has helped us to do some client side validation in web UIs, maven plugin, etc without having to develop specific validators for different Kubernetes clients in different languages, relying on JSON schema validation is much simpler.

I'd really like to have a JSON schema for the API published by Kubernetes to be honest. Any chance of that?

@bgrant0607
Copy link
Member

PRs welcome.

See also #4210 and #6487

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. area/client-libraries and removed kind/documentation Categorizes issue or PR as related to documentation. labels May 12, 2015
@jimmidyson
Copy link
Member Author

@bgrant0607 Would you be happy with a PR just for v1 API?

@bgrant0607
Copy link
Member

@jimmidyson

I'd want the properties to appear in our generated Swagger schema, so the support needs to be added to go-restful.

We also need to think about how to keep the information up to date and accurate. I think the only practical way to do that is to drive validation off the tags, in which case it would need to be implemented for all API versions. We're in the process of getting rid of v1beta1 and v1beta2, so that should reduce the scope. We'd need to change from validation of our internal rep. to version-specific validation, which we want to do (see #3084), but haven't done yet.

As an incremental step, we could just add the tags to the v1 API. However, we'd need to think about how to craft a presubmit check to ensure they didn't immediately become out of date, and we'd rip out the tags eventually if nobody completed the work to drive validation from them. Speaking from experience, we can't rely on code reviewers to keep the tags correct.

Since reflection is slow, we'd also probably need a validation code generator, as we're doing with conversion and defaulting.

@jimmidyson
Copy link
Member Author

@bgrant0607 Well that complicates stuff then :) But completely understandable. I'm thinking a go generator for JSON validation driven by tags sounds like a pretty useful tool anyway in general.

@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels May 11, 2016
@mbohlool
Copy link
Contributor

My initial thoughts on this is to have one validation type for each validation in a validation package. For example a Range validation would be like this:

package validation

type Range struct {
  min, max int
}

func (r Range) Validate(val interface{}) bool {
  value, ok := val.(int)
  return ok && value >=r.min && value <=r.max
}

func (r Range) setModelProperty(prop *ModelProperty) {
  prop.Maximum = r.max
  prop.Minimum = r.min
  prop.Description = fmt.Sprintf("The value must be between %d and %d.", r.min, r.max)
} 

(I don't like ModelProperty in above code and would like to generalize it more, will do that in the actual PR)

then we can add a tag to any int property to use Range validator:

JobCount int `validator:"Range(0,100)"`

The validator code can use reflection (or generated code for performance) to find Range validator then create it with provided range values and do validation.

Any thoughts?

@nikhiljindal
Copy link
Contributor

cc @kubernetes/sig-api-machinery @whitlockjc

@lavalamp
Copy link
Member

@mbohlool Yeah that looks promising. I'd tweak it just a bit and include the type as part of the validator (we can generate them like we do pkg/util/sets if that gets tedious).

package validation

type IntRange struct {
  min, max int
}

func NewIntRange(min, max int) { ... }

func (r Range) Validate(val int) bool {
  return val >=r.min && val <=r.max
}

...

JobCount int `validator:"IntRange(0,100)"`

(Anyway, there should not be any plain ints in our api, they should all have a size, int32 or int64.)

How do you plan to express regexps? (I'm wondering on account of nested quotation that might be necessary)

I think we definitely need to generate, not use reflection, or @smarterclayton will not be happy.

What do you think about having a special comment format? It might require fewer levels of quotation.

@smarterclayton
Copy link
Contributor

Since this overlaps so much with swagger, we should definitely look at the "swagger generator from struct comments" work if possible.

@smarterclayton
Copy link
Contributor

And yes, no more reflection :)

@garethr
Copy link
Contributor

garethr commented Aug 10, 2017

As an example where this would be useful, I just posted to SIG API Machinery.

Service takes a spec.type value which the OpenAPI spec
describes as a string. However if you read the description
it's clear only certain values are valid:

https://github.com/garethr/kubernetes-json-schema/blob/master/master-standalone/service.json#L140

So in the spec it would be great if this was represented by an enum:

"enum": ["ExternalName", "ClusterIP", "NodePort", "LoadBalancer"]

As another example, several descriptions state that strings have to be
a DNS_LABEL. This has a formal regex in the docs, but this isn't
represented in the spec. It's not clear OpenAPI supports regex, which
JSON Schema does, so this second point might be an OpenAPI 2 issue
bubbling through. But this could be described in JSON Schema as:

"regex": "[a-z0-9]([-a-z0-9]*[a-z0-9])?"

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018
@bgrant0607
Copy link
Member

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2018
@liggitt liggitt added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 14, 2019
@matthyx
Copy link
Contributor

matthyx commented Feb 12, 2020

Hi, is this issue still relevant today?

@sco11morgan
Copy link

Yes still relevant. Getting validation constraints into the JSON schema will improve tooling that uses the schema. instrumenta/kubeval#18 (comment)

@lavalamp
Copy link
Member

Yes. We intend to use the serverside apply code to do general schema things like this, as we haven't had a schema-aware universally-run code location yet. So, this will become doable soon.

@apelisse

@matthyx
Copy link
Contributor

matthyx commented Feb 12, 2020

Ok, so @lavalamp can I help somehow here? How can we help tools like kubeval have a better validation?

@lavalamp
Copy link
Member

What is kubeval?

The proper way to consume schema data as a client is to read the cluster's OpenAPI spec. We will be adding more x-kubernetes-* tags in places where there aren't existing OpenAPI tags to express the concept.

@matthyx
Copy link
Contributor

matthyx commented Feb 12, 2020

kubeval is what @sco11morgan mentioned in his comment... I took it as an example of an OpenAPI consumer.

Alright, so as I understand it, sig/api-machinery is already on it and I don't have to jump in?

@lavalamp
Copy link
Member

Ah sorry, didn't see that-- one other thing that might help today, is server side dry-run, which is beta and going GA this release.

If you have cycles to spare and energy for this, the apply working group might be something to check out.

@apelisse
Copy link
Member

apelisse commented Feb 12, 2020

Client-side validation has benefits over "server-side dry-run". Ideally we'd like to reduce that gap as much as possible, but there are things that are very hard to do with dry-run, including: bootstrapping (CRs need the CRD, objects needs their namespace), and permissions (one needs write permission on the cluster to dry-run).

As Daniel mentioned, there are many things that we're trying to add to improve the OpenAPI with extra semantics that is not supported natively (immutability, unions, sets, associative lists). sets and associative lists are already pretty well defined and could be added to kubeval.

We're also planning on being able to move some of the validation to be declarative (mostly openapi native conditions) by having extra markers on internal-types (similar to what kubebuilder does), which would also improve the quality of the openapi validation. This requires that we move forward with server-side apply, and we're actively working on that and hopefully very close to reach that goal.

cc @mariantalla

@matthyx
Copy link
Contributor

matthyx commented Feb 28, 2020

hopefully very close to reach that goal.

ok, let me know if I can help anyhow 👍

@thockin
Copy link
Member

thockin commented Feb 14, 2024

@jpbetz @cici37 should we close this one or dup it to one of the CEL issues?

@jpbetz
Copy link
Contributor

jpbetz commented Feb 14, 2024

Looks like the ask here is to switch from using comment tags (// +listType=set) to go struct tags (Foos []string 'listType:"set"')?

I don't think we can fit much of what we have planned for declarative validation is go struct tags.

cc @alexzielenski

@apelisse
Copy link
Member

Go tags have the benefits that they can be read during runtime through reflection, while comments can not. But on the other hand, we don't want people to import the actual types everywhere and/or to depend on that anyway, so OpenAPI is a better way to carry the information. I think that ask is obsolete.

@thockin
Copy link
Member

thockin commented Feb 14, 2024

Ahh, I misunderstood. Go struct tags are syntactically horrible and not what we want humans writing/reviewing.

If we had a proper IDL, we could generate struct tags, getting the best of both worlds, maybe, but we don't.

Shall we close this, then?

@thockin thockin closed this as completed Feb 26, 2024
@thockin thockin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/client-libraries area/swagger kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.
Projects
None yet
Development

No branches or pull requests