protoc plugin to generate polyglot message validators
Clone or download
rmichela and rodaine Implement java ExplicitValidatorIndex (#121)
* Implement java ExplicitValidatorIndex

* Use getOrDefault in ExplicitValidatorIndex
Latest commit 37ac200 Dec 13, 2018

README.md

protoc-gen-validate (PGV)

This project is currently in alpha. The API should be considered unstable and likely to change

PGV is a protoc plugin to generate polyglot message validators. While protocol buffers effectively guarantee the types of structured data, they cannot enforce semantic rules for values. This plugin adds support to protoc-generated code to validate such constraints.

Developers import the PGV extension and annotate the messages and fields in their proto files with constraint rules:

syntax = "proto3";

package examplepb;

import "validate/validate.proto";

message Person {
  uint64 id    = 1 [(validate.rules).uint64.gt    = 999];

  string email = 2 [(validate.rules).string.email = true];

  string name  = 3 [(validate.rules).string = {
                      pattern:   "^[^[0-9]A-Za-z]+( [^[0-9]A-Za-z]+)*$",
                      max_bytes: 256,
                   }];

  Location home = 4 [(validate.rules).message.required = true];

  message Location {
    double lat = 1 [(validate.rules).double = { gte: -90,  lte: 90 }];
    double lng = 2 [(validate.rules).double = { gte: -180, lte: 180 }];
  }
}

Executing protoc with PGV and the target language's default plugin will create Validate methods on the generated types:

p := new(Person)

err := p.Validate() // err: Id must be greater than 999
p.Id = 1000

err = p.Validate() // err: Email must be a valid email address
p.Email = "example@lyft.com"

err = p.Validate() // err: Name must match pattern '^[^\d\s]+( [^\d\s]+)*$'
p.Name = "Protocol Buffer"

err = p.Validate() // err: Home is required
p.Location = &Location{37.7, 999}

err = p.Validate() // err: Home.Lng must be within [-180, 180]
p.Location.Lng = -122.4

err = p.Validate() // err: nil

Usage

Installation

Installing PGV can currently only be done from source:

# fetches this repo into $GOPATH
go get -d github.com/lyft/protoc-gen-validate

# installs PGV into $GOPATH/bin
make build

Dependencies

  • protoc compiler in $PATH
  • protoc-gen-validate in $PATH
  • official language-specific plugin for target language(s)
  • Only proto3 syntax is currently supported. proto2 syntax support is planned.

Parameters

  • lang: specify the target language to generate. Currently, the only supported options are:
    • go
    • gogo for gogo proto (experimental)
    • cc for c++ (partially implemented)
    • java

Support for python is planned.

Examples

Go

Go generation should occur into the same output path as the official plugin. For a proto file example.proto, the corresponding validation code is generated into ../generated/example.pb.validate.go:

protoc \
  -I . \
  -I ${GOPATH}/src \
  -I ${GOPATH}/src/github.com/lyft/protoc-gen-validate \
  --go_out=":../generated" \
  --validate_out="lang=go:../generated" \
  example.proto

All messages generated include the new Validate() error method. PGV requires no additional runtime dependencies from the existing generated code.

Gogo

There is an experimental support for gogo protobuf plugin for go. Use the following command to generate gogo-compatible validation code:

protoc \
  -I . \
  -I ${GOPATH}/src \
  -I ${GOPATH}/src/github.com/lyft/protoc-gen-validate \
  --gogofast_out=":../generated"\
  --validate_out="lang=gogo:../generated" \ example.proto

Gogo support has the following limitations:

  • only gogofast plugin is supported and tested, meaning that the fields should be properly annotated with gogoproto annotations;
  • gogoproto.nullable is supported on fields;
  • gogoproto.stdduration is supported on fields;
  • gogoproto.stdtime is supported on fields;

Java

Java generation is integrated with the existing protobuf toolchain for java projects. For Maven projects, add the following to your pom.xml.

<dependencies>
    <dependency>
        <groupId>com.lyft.protoc-gen-validate</groupId>
        <artifactId>pgv-java-stub</artifactId>
        <version>${pgv.version}</version>
    </dependency>
</dependencies>

<build>
    <extensions>
        <extension>
            <groupId>kr.motd.maven</groupId>
            <artifactId>os-maven-plugin</artifactId>
            <version>1.4.1.Final</version>
        </extension>
    </extensions>
    <plugins>
        <plugin>
            <groupId>org.xolstice.maven.plugins</groupId>
            <artifactId>protobuf-maven-plugin</artifactId>
            <version>0.5.0</version>
            <configuration>
                <protocArtifact>com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}</protocArtifact>
            </configuration>
                <execution>
                    <id>protoc-java-pgv</id>
                    <goals>
                        <goal>compile-custom</goal>
                    </goals>
                    <configuration>
                        <pluginParameter>lang=java</pluginParameter>
                        <pluginId>java-pgv</pluginId>
                        <pluginArtifact>com.lyft.protoc-gen-validate:pgv:${pgv.version}:exe:${os.detected.classifier}</pluginArtifact>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Gradle projects follow a similar pattern.

Constraint Rules

The provided constraints are modeled largerly after those in JSON Schema. PGV rules can be mixed for the same field; the plugin ensures the rules applied to a field cannot contradict before code generation.

Check the constraint rule comparison matrix for language-specific constraint capabilities.

Numerics

All numeric types (float, double, int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64) share the same rules.

  • const: the field must be exactly the specified value.

    // x must equal 1.23 exactly
    float x = 1 [(validate.rules).float.const = 1.23];
  • lt/lte/gt/gte: these inequalities (<, <=, >, >=, respectively) allow for deriving ranges in which the field must reside.

    // x must be less than 10
    int32 x = 1 [(validate.rules).int32.lt = 10];
    
    // x must be greater than or equal to 20
    uint64 x = 1 [(validate.rules).uint64.gte = 20];
    
    // x must be in the range [30, 40)
    fixed32 x = 1 [(validate.rules).fixed32 = {gte:30, lt: 40}];

    Inverting the values of lt(e) and gt(e) is valid and creates an exclusive range.

    // x must be outside the range [30, 40)
    double x = 1 [(validate.rules).double = {lt:30, gte:40}];
  • in/not_in: these two rules permit specifying white/blacklists for the values of a field.

    // x must be either 1, 2, or 3
    uint32 x = 1 [(validate.rules).uint32 = {in: [1,2,3]}];
    
    // x cannot be 0 nor 0.99
    float x = 1 [(validate.rules).float = {not_in: [0, 0.99]}];

Bools

  • const: the field must be exactly the specified value.

    // x must be set to true
    bool x = 1 [(validate.rules).bool.const = true];
    
    // x cannot be set to true
    bool x = 1 [(validate.rules).bool.const = false];

Strings

  • const: the field must be exactly the specified value.

    // x must be set to "foo"
    string x = 1 [(validate.rules).string.const = "foo"];
  • len/min_len/max_len: these rules constrain the number of characters (Unicode code points) in the field. Note that the number of characters may differ from the number of bytes in the string. The string is considered as-is, and does not normalize.

    // x must be exactly 5 characters long
    string x = 1 [(validate.rules).string.len = 5];
    
    // x must be at least 3 characters long
    string x = 1 [(validate.rules).string.min_len = 3];
    
    // x must be between 5 and 10 characters, inclusive
    string x = 1 [(validate.rules).string = {min_len: 5, max_len: 10}];
  • min_bytes/max_bytes: these rules constrain the number of bytes in the field.

    // x must be at most 15 bytes long
    string x = 1 [(validate.rules).string.max_bytes = 15];
    
    // x must be between 128 and 1024 bytes long
    string x = 1 [(validate.rules).string = {min_bytes: 128, max_bytes: 1024}];
  • pattern: the field must match the specified RE2-compliant regular expression. The included expression should elide any delimiters (ie, /\d+/ should just be \d+).

    // x must be a non-empty, case-insensitive hexadecimal string
    string x = 1 [(validate.rules).string.pattern = "(?i)^[0-9a-f]+$"];
  • prefix/suffix/contains: the field must contain the specified substring in an optionally explicit location.

    // x must begin with "foo"
    string x = 1 [(validate.rules).string.prefix = "foo"];
    
    // x must end with "bar"
    string x = 1 [(validate.rules).string.suffix = "bar"];
    
    // x must contain "baz" anywhere inside it
    string x = 1 [(validate.rules).string.contains = "baz"];
    
    // x must begin with "fizz" and end with "buzz"
    string x = 1 [(validate.rules).string = {prefix: "fizz", suffix: "buzz"}];
    
    // x must end with ".proto" and be less than 64 characters
    string x = 1 [(validate.rules).string = {suffix: ".proto", max_len:64}];
  • in/not_in: these two rules permit specifying white/blacklists for the values of a field.

    // x must be either "foo", "bar", or "baz"
    string x = 1 [(validate.rules).string = {in: ["foo", "bar", "baz"]}];
    
    // x cannot be "fizz" nor "buzz"
    string x = 1 [(validate.rules).string = {not_in: ["fizz", "buzz"]}];
  • well-known formats: these rules provide advanced constraints for common string patterns. These constraints will typically be more permissive and performant than equivalent regular expression patterns, while providing more explanatory failure descriptions.

    // x must be a valid email address (via RFC 1034)
    string x = 1 [(validate.rules).string.email = true];
    
    // x must be a valid hostname (via RFC 1034)
    string x = 1 [(validate.rules).string.hostname = true];
    
    // x must be a valid IP address (either v4 or v6)
    string x = 1 [(validate.rules).string.ip = true];
    
    // x must be a valid IPv4 address
    // eg: "192.168.0.1"
    string x = 1 [(validate.rules).string.ipv4 = true];
    
    // x must be a valid IPv6 address
    // eg: "fe80::3"
    string x = 1 [(validate.rules).string.ipv6 = true];
    
    // x must be a valid absolute URI (via RFC 3986)
    string x = 1 [(validate.rules).string.uri = true];
    
    // x must be a valid URI reference (either absolute or relative)
    string x = 1 [(validate.rules).string.uri_ref = true];

Bytes

Literal values should be expressed with strings, using escaping where necessary.

  • const: the field must be exactly the specified value.

    // x must be set to "foo" ("\x66\x6f\x6f")
    bytes x = 1 [(validate.rules).bytes.const = "foo"];
    
    // x must be set to "\xf0\x90\x28\xbc"
    bytes x = 1 [(validate.rules).bytes.const = "\xf0\x90\x28\xbc"];
  • len/min_len/max_len: these rules constrain the number of bytes in the field.

    // x must be exactly 3 bytes
    bytes x = 1 [(validate.rules).bytes.len = 3];
    
    // x must be at least 3 bytes long
    bytes x = 1 [(validate.rules).bytes.min_len = 3];
    
    // x must be between 5 and 10 bytes, inclusive
    bytes x = 1 [(validate.rules).bytes = {min_len: 5, max_len: 10}];
  • pattern: the field must match the specified RE2-compliant regular expression. The included expression should elide any delimiters (ie, /\d+/ should just be \d+).

    // x must be a non-empty, ASCII byte sequence
    bytes x = 1 [(validate.rules).bytes.pattern = "^[\x00-\x7F]+$"];
  • prefix/suffix/contains: the field must contain the specified byte sequence in an optionally explicit location.

    // x must begin with "\x99"
    bytes x = 1 [(validate.rules).bytes.prefix = "\x99"];
    
    // x must end with "buz\x7a"
    bytes x = 1 [(validate.rules).bytes.suffix = "buz\x7a"];
    
    // x must contain "baz" anywhere inside it
    bytes x = 1 [(validate.rules).bytes.contains = "baz"];
  • in/not_in: these two rules permit specifying white/blacklists for the values of a field.

    // x must be either "foo", "bar", or "baz"
    bytes x = 1 [(validate.rules).bytes = {in: ["foo", "bar", "baz"]}];
    
    // x cannot be "fizz" nor "buzz"
    bytes x = 1 [(validate.rules).bytes = {not_in: ["fizz", "buzz"]}];
  • well-known formats: these rules provide advanced constraints for common patterns. These constraints will typically be more permissive and performant than equivalent regular expression patterns, while providing more explanatory failure descriptions.

    // x must be a valid IP address (either v4 or v6) in byte format
    bytes x = 1 [(validate.rules).bytes.ip = true];
    
    // x must be a valid IPv4 address in byte format
    // eg: "\xC0\xA8\x00\x01"
    bytes x = 1 [(validate.rules).bytes.ipv4 = true];
    
    // x must be a valid IPv6 address in byte format
    // eg: "\x20\x01\x0D\xB8\x85\xA3\x00\x00\x00\x00\x8A\x2E\x03\x70\x73\x34"
    bytes x = 1 [(validate.rules).bytes.ipv6 = true];

Enums

All literal values should use the numeric (int32) value as defined in the enum descriptor.

The following examples use this State enum

enum State {
  INACTIVE = 0;
  PENDING  = 1;
  ACTIVE   = 2;
}
  • const: the field must be exactly the specified value.

    // x must be set to ACTIVE (2)
    State x = 1 [(validate.rules).enum.const = 2];
  • defined_only: the field must be one of the specified values in the enum descriptor.

    // x can only be INACTIVE, PENDING, or ACTIVE
    State x = 1 [(validate.rules).enum.defined_only = true];
  • in/not_in: these two rules permit specifying white/blacklists for the values of a field.

    // x must be either INACTIVE (0) or ACTIVE (2)
    State x = 1 [(validate.rules).enum = {in: [0,2]}];
    
    // x cannot be PENDING (1)
    State x = 1 [(validate.rules).enum = {not_in: [1]}];

Messages

If a field contains a message and the message has been generated with PGV, validation will be performed recursively. Message's not generated with PGV are skipped.

// if Person was generated with PGV and x is set,
// x's fields will be validated.
Person x = 1;
  • skip: this rule specifies that the validation rules of this field should not be evaluated.

    // The fields on Person x will not be validated.
    Person x = 1 [(validate.rules).message.skip = true];
  • required: this rule specifies that the field cannot be unset.

    // x cannot be unset
    Person x = 1 [(validate.rules).message.required = true];
    
    // x cannot be unset, but the validations on x will not be performed
    Person x = 1 [(validate.rules).message = {required: true, skip: true}];

Repeated

  • min_items/max_items: these rules control how many elements are contained in the field

    // x must contain at least 3 elements
    repeated int32 x = 1 [(validate.rules).repeated.min_items = 3];
    
    // x must contain between 5 and 10 Persons, inclusive
    repeated Person x = 1 [(validate.rules).repeated = {min_items: 5, max_items: 10}];
    
    // x must contain exactly 7 elements
    repeated double x = 1 [(validate.rules).repeated = {min_items: 7, max_items: 7}];
  • unique: this rule requires that all elements in the field must be unique. This rule does not support repeated messages.

    // x must contain unique int64 values
    repeated int64 x = 1 [(validate.rules).repeated.unique = true];
  • items: this rule specifies constraints that should be applied to each element in the field. Repeated message fields also have their validation rules applied unless skip is specified on this constraint.

    // x must contain positive float values
    repeated float x = 1 [(validate.rules).repeated.items.float.gt = 0];
    
    // x must contain Persons but don't validate them
    repeated Person x = 1 [(validate.rules).repeated.items.message.skip = true];

Maps

  • min_pairs/max_pairs: these rules control how many KV pairs are contained in this field

    // x must contain at most 3 KV pairs
    map<string, uint64> x = 1 [(validate.rules).map.min_pairs = 3];
    
    // x must contain between 5 and 10 KV pairs
    map<string, string> x = 1 [(validate.rules)].map = {min_pairs: 5, max_pairs: 10}];
    
    // x must contain exactly 7 KV pairs
    map<string, Person> x = 1 [(validate.rules)].map = {min_pairs: 7, max_pairs: 7}];
  • no_sparse: for map fields with message values, setting this rule to true disallows keys with unset values.

    // all values in x must be set
    map<uint64, Person> x = 1 [(validate.rules).map.no_sparse = true];
  • keys: this rule specifies constraints that are applied to the keys in the field.

    // x's keys must all be negative
    <sint32, string> x = [(validate.rules).map.keys.sint32.lt = 0];
  • values: this rule specifies constraints that are be applied to each value in the field. Repeated message fields also have their validation rules applied unless skip is specified on this constraint.

    // x must contain strings of at least 3 characters
    map<string, string> x = 1 [(validate.rules).map.values.string.min_len = 3];
    
    // x must contain Persons but doesn't validate them
    map<string, Person> x = 1 [(validate.rules).map.values.message.skip = true];

Well-Known Types (WKTs)

A set of WKTs are packaged with protoc and common message patterns useful in many domains.

Scalar Value Wrappers

In the proto3 syntax, there is no way of distinguishing between unset and the zero value of a scalar field. The value WKTs permit this differentiation by wrapping them in a message. PGV permits using the same scalar rules that the wrapper encapsulates.

// if it is set, x must be greater than 3
google.protobuf.Int32Value x = 1 [(validate.rules).int32.gt = 3];

Anys

  • required: this rule specifies that the field must be set

    // x cannot be unset
    google.protobuf.Any x = 1 [(validate.rules).any.required = true];
  • in/not_in: these two rules permit specifying white/blacklists for the type_url value in this field. Consider using a oneof union instead of in if possible.

    // x must not be the Duration or Timestamp WKT
    google.protobuf.Any x = 1 [(validate.rules).any = {not_in: [
        "type.googleapis.com/google.protobuf.Duration",
        "type.googleapis.com/google.protobuf.Timestamp"
      ]}];

Durations

  • required: this rule specifies that the field must be set

    // x cannot be unset
    google.protobuf.Duration x = 1 [(validate.rules).duration.required = true];
  • const: the field must be exactly the specified value.

    // x must equal 1.5s exactly
    google.protobuf.Duration x = 1 [(validate.rules).duration.const = {
        seconds: 1,
        nanos:   500000000
      }];
  • lt/lte/gt/gte: these inequalities (<, <=, >, >=, respectively) allow for deriving ranges in which the field must reside.

    // x must be less than 10s
    google.protobuf.Duration x = 1 [(validate.rules).duration.lt.seconds = 10];
    
    // x must be greater than or equal to 20ns
    google.protobuf.Duration x = 1 [(validate.rules).duration.gte.nanos = 20];
    
    // x must be in the range [0s, 1s)
    google.protobuf.Duration x = 1 [(validate.rules).duration = {
        gte: {},
        lt:  {seconds: 1}
      }];

    Inverting the values of lt(e) and gt(e) is valid and creates an exclusive range.

    // x must be outside the range [0s, 1s)
    google.protobuf.Duration x = 1 [(validate.rules).duration = {
        lt:  {},
        gte: {seconds: 1}
      }];
  • in/not_in: these two rules permit specifying white/blacklists for the values of a field.

    // x must be either 0s or 1s
    google.protobuf.Duration x = 1 [(validate.rules).duration = {in: [
        {},
        {seconds: 1}
      ]}];
    
    // x cannot be 20s nor 500ns
    google.protobuf.Duration x = 1 [(validate.rules).duration = {not_in: [
        {seconds: 20},
        {nanos: 500}
      ]}];

Timestamps

  • required: this rule specifies that the field must be set

    // x cannot be unset
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp.required = true];
  • const: the field must be exactly the specified value.

    // x must equal 2009/11/10T23:00:00.500Z exactly
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp = {
        seconds: 63393490800,
        nanos:   500000000
      }];
  • lt/lte/gt/gte: these inequalities (<, <=, >, >=, respectively) allow for deriving ranges in which the field must reside.

    // x must be less than the Unix Epoch
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp.lt.seconds = 0];
    
    // x must be greater than or equal to 2009/11/10T23:00:00Z
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp.gte.seconds = 63393490800];
    
    // x must be in the range [epoch, 2009/11/10T23:00:00Z)
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp = {
        gte: {},
        lt:  {seconds: 63393490800}
      }];

    Inverting the values of lt(e) and gt(e) is valid and creates an exclusive range.

    // x must be outside the range [epoch, 2009/11/10T23:00:00Z)
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp = {
        lt:  {},
        gte: {seconds: 63393490800}
      }];
  • lt_now/gt_now: these inequalities allow for ranges relative to the current time. These rules cannot be used with the absolute rules above.

    // x must be less than the current timestamp
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp.lt_now = true];
  • within: this rule specifies that the field's value should be within a duration of the current time. This rule can be used in conjunction with lt_now and gt_now to control those ranges.

    // x must be within ±1s of the current time
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp.within.seconds = 1];
    
    // x must be within the range (now, now+1h)
    google.protobuf.Timestamp x = 1 [(validate.rules).timestamp = {
        gt_now: true,
        within: {seconds: 3600}
      }];

Message-Global

  • disabled: All validation rules for the fields on a message can be nullified, including any message fields that support validation themselves.

    message Person {
      option (validate.disabled) = true;
    
      // x will not be required to be greater than 123
      uint64 x = 1 [(validate.rules).uint64.gt = 123];
    
      // y's fields will not be validated
      Person y = 2;
    }

OneOfs

  • required: require that one of the fields in a oneof must be set. By default, none or one of the unioned fields can be set. Enabling this rules disallows having all of them unset.

    oneof id {
      // either x, y, or z must be set.
      option (validate.required) = true;
    
      string x = 1;
      int32  y = 2;
      Person z = 3;
    }

Development

PGV is written in Go on top of the protoc-gen-star framework and compiles to a standalone binary.

Dependencies

All PGV dependencies are currently checked into the project. To test PGV, protoc must be installed, either from source, the provided releases, or a package manager. The official protoc plugin for the target language(s) should be installed as well.

Make Targets

  • make build: generates the constraints proto and compiles PGV into $GOPATH/bin

  • make lint: runs static-analysis rules against the PGV codebase, including golint, go vet, and gofmt -s

  • make tests: runs all tests with race detection and coverage percentage

  • make quick: runs all tests without the race detector or coverage percentage

  • make cover: runs all tests with race detection, generating a coverage report and opening it in a browser

  • make kitchensink: generates the proto files in /tests/kitchensink. This includes the officially generated code, as well as the validations.

  • make testcases: generates the proto files in /tests/harness/cases. These are used by the test harness to verify the validation rules generated for each language.

  • make harness: executes the test-cases against each language's test harness.

Run all tests under Bazel

Ensure that your PATH is setup to include protoc-gen-go and protoc, then:

bazel run //tests/harness/executor:executor

Docker

PGV comes with a Dockerfile for consistent development tooling and CI. The main entrypoint is make with quick as the default target. This repo should be volumed into /go/src/github.com/lyft/protoc-gen-validate for the proper behavior.

# build the image
docker build -t lyft/protoc-gen-validate .

# executes the default make target: quick
docker run --rm \
  -v $(PWD):/go/src/github.com/lyft/protoc-gen-validate \
  lyft/protoc-gen-validate

# executes the 'build' & 'generate-testdata' make targets
docker run --rm \
  -v $(PWD):/go/src/github.com/lyft/protoc-gen-validate \
  lyft/protoc-gen-validate \
  build generate-testdata