Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonschema: Switch to map based encoding #76

Merged
merged 1 commit into from Jan 16, 2017

Conversation

smyrman
Copy link
Collaborator

@smyrman smyrman commented Jan 1, 2017

Provides a public interface for #35, and switches the internal encoding
logic to rely on maps, as propsed in #51. This makes the implementation
simpler and more straight forward.

Compared to the poposed solution 1 in #35 that would rely on passing an
io.Writer to an encoder method, this solution will more easily handle a
corner-case where a custom validator writes zero bytes to the io.Writer.

The reduction in complexity does not come for free though. This looks to
be 4x slower for very tiny schemas, and aproximately 2x slower for small
schemas. As jsonschema is used as API documentation, and can be easily
cached, this performance loss should still be quite acceptable.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 1, 2017

@rs, @yanfali. This provides the following public interface for custom validators to implement:

type Builder interface {
	// JSONSchema should return a map containing JSON Schema Draft 4 properties that can be set based on
	// FieldValidator data. Application specific properties can be added as well, but should not conflict with any
	// legal JSON Schema keys.
	JSONSchema() (map[string]interface{}, error)
}

This is, based on "Solution 2" from #35, even you preferred solution 1 there @rs. The internal logic is also switched to rely on maps.

As far as I can see, this solution has the following benefits over solution 1:

  • Simpler
  • Allows future support for users to encode JSON Schema into other formats than JSON, e.g. YAML as supported by Swagger.
  • Avoids a corner case for an io.Writer based solution to [jsonschema] Support custom Validators #35 where no bytes are written by a custom validator. See schema.go#L104

It has the following disadvantages compared to solution 1:

  • Slower
  • Allocates more memory

What do you think of this approach?

UPDATE: Renamed Formatter interface -> Builder based on review comments.
UPDATE: Added "Allocates more memory" as disadvantage.

Copy link
Contributor

@yanfali yanfali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I promised you a more complex encoding for benchmarking, I'll try and get to that this evening. It's definitely much cleaner, not sure how much slower it will be so it would be good to get some numbers.

// | | | "type": "number",
// | | | "minimum": 0E+00
// | | | "minimum": 0,
// | | | "type": "number"
// | | },
// | | "foobar": {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this. Shouldn't it at least output "type": "object", I understand the spec may allow this but it feels inconsistent with rest of document.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is still doing that (at the end). It's just that the keys will now be ordered alphabetically, so I had to reorder the keys/values in this example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the example. See comment below

const (
// schemaSizeHint is the maximum number of keys that can be added to a schema map based on schema.Schema.
schemaSizeHint = 7
// fieldSizeHint is the maximum number of keys that can be added to a schema map based on schema.Field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment matches the way you use it in the JSONSchema methods. It's actually the minimum and then you add a local value. I think it would be clearer if you updated this comment and used constants within each encoder file so it was clearer what it was you were trying to optimize here.

I'm also not sure how much value we get out of the specifying this hint, as many of the fields we may add are optional. The default allocator size is probably sufficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a hard comment to write, which is never a good sign I guess.

It's not the minimum, but the minimum maximum in a way. The minimum number of fields is 0. This number is the maximum number of properties that can be added directly from the Field struct, excluding the filed.Validator attribute, which is what I am trying to state...

I will see if there is a better way.

I'm also not sure how much value we get out of the specifying this hint, as many of the fields we may add are optional. The default allocator size is probably sufficient.

I will try without a size hints, and then rerun the benchmarks:-)

// | | | "type": "number",
// | | | "minimum": 0E+00
// | | | "minimum": 0,
// | | | "type": "number"
// | | },
// | | "foobar": {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this much. I know it's permitted by the spec but it feels like we should be stricter when we output. At the very least "type": "object".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it is still outputing the same, just ordered differently, and a different formating for the numbers (e.g. 0 instead of 0E+00).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry comment is about foobar example

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, foobar. That is what we should encode if the field.Validator is set to nil. We could change the example, but I think it's the correct think to write in this situation. If it's set to nil, we can't enforce it to be an object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the example to include a new description @yanfali. Hopefully it's at least a bit more clear from the example what's going on now...

if err != nil {
return err
}
fm, err := formatter.JSONSchema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fm/fieldMap/
The terseness hurts the readability.

if err != nil {
return err
}
formatField(fm, field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a bit backward, adding the meta fields after having created the map but making JSONSchema responsible for allocating the memory is a direct consequence of this. I would suggest renaming formatField. Maybe encodeFields and encodeField? Also I suggest reversing the order of the the parameters; it's often a nicer API that passes in read-only parameters in first and mutable ones last.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we come up with a new name, we should probably change that consistently, e.g. also for formatSchema.

I am not sure encodeFields is better though, as it's no longer encoding data. I will try to think of something...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's often a nicer API that passes in read-only parameters in first and mutable ones last.

Not really. Usually it's the mutable one that is passed in first, both in C and in go. See for instance append, copy or any other example in the standard lib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite right, ignore me.

// encodeValidator writes JSON Schema keys and values based on v, without the outer curly braces, to w. Note
// that not all FieldValidator types are supported at the moment.
func encodeValidator(w io.Writer, v schema.FieldValidator) (err error) {
func validatorFormatter(v schema.FieldValidator) (f Formatter, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early return would probably be slightly clearer. I'm wary of relying on initialized and named returned values for this use case as it may trip up people later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it do use early returns.

if v == nil {
return nil
f = formatterFunc(nilFormatter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return formatterFunc(nilFormatter), nil

}
switch t := v.(type) {
case Formatter:
f = t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave out this use case if you just return early above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The case above is returning a formatter when filed.Validator == nil.

This case is implementing #35 by allowing custom Validators to implement the Formatter interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a test for this case now btw.

case Formatter:
f = t
case *schema.Bool:
f = (*boolFormatter)(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (*boolFormatter)(t), nil

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will change all of them:-)

default:
return ErrNotImplemented
err = ErrNotImplemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, ErrNotImplemented

@yanfali
Copy link
Contributor

yanfali commented Jan 1, 2017 via email

@yanfali
Copy link
Contributor

yanfali commented Jan 1, 2017 via email

smyrman added a commit to smyrman/rest-layer that referenced this pull request Jan 1, 2017
@smyrman
Copy link
Collaborator Author

smyrman commented Jan 1, 2017

I find setting null in json tends to be worthless. Much better to not set
the field at all and use undefined in JS or OK pattern in go.

True, but that is besides the point...

I still think for nil it's much better to set the type for consistency at
the receiver when emitting as it's explicit. Implicit behavior tends to get
people doing different things because of interpretation. I'm arguing set
the type to object for the nil case.

Maybe you misunderstand?

I meen if schema.Field.Validator is not set or equal to nil (in Go), we don't know which type the value may have. The way I read the code in the schema package, it may have any type. So it's pretty simple what the jsonschema package should do.

The behaviour as it stands is identical to what it was before this change.

The only viable alternative would be to raise an error if we don't want to support Validator set to nil in the rest-layer/schema package, but I personally don't see why we would want that.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 1, 2017

On another topic, the last change addresses most of the review comments.

I will do one more commit soon where I remove the size hints. and rerun the benchmarks. I will leave it as one commit for now and rebase later, in case you want to add/run one more benchmark @yanfali.

smyrman added a commit to smyrman/rest-layer that referenced this pull request Jan 1, 2017
smyrman added a commit to smyrman/rest-layer that referenced this pull request Jan 1, 2017
@smyrman
Copy link
Collaborator Author

smyrman commented Jan 1, 2017

Removing the size hint appears to be a good call @yanfali.

Compared to having the size hints in, we get the same number of allocations and a lower number of bytes allocated without size hints for the Student schema. As for the reported speedup, that's within the margin of error, but at least it leaves the code a bit simpler:

benchmark                                                    old ns/op     new ns/op     delta
BenchmarkEncoder/Schema=Student                              20355         20169         -0.91%

benchmark                                                    old allocs     new allocs     delta
BenchmarkEncoder/Schema=Student                              136            136            +0.00%

benchmark                                                    old bytes     new bytes     delta
BenchmarkEncoder/Schema=Student                              7033          6746          -4.08%
sindrero@sjokolade ../rest-layer/schema/encoding/jsonschema (git)-[remotes/smyrman/jsonschema-ma

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 1, 2017

The only viable alternative would be to raise an error if we don't want to support Validator set to nil in the rest-layer/schema package, but I personally don't see why we would want that.

I will check what the schema package does today. If it does indeed raise an error, we should do the same in the jsonschema package for now at least.

UPDATE: Yep, nil Validators appears valid (just like nil Handlers or any other schema.Field value that you may or may not set).

Test code: https://gist.github.com/smyrman/ea15603ec2c4adcf727b4b651184692a

@yanfali
Copy link
Contributor

yanfali commented Jan 2, 2017

So if a field is defined as nil in schema we are to output a JSONSchema was empty object which is a stand-in for any type? I can sort of understand that interpretation but it doesn't help the consumer of that schema understand that this field member is used for which is effectively ignore this field?

This is the one major weakness of JSON as a specification, lack of comments. If we're going to do that, I would suggest we explicitly put a "description" that describes this behavior for this field. As a consumer of such an API I would be grateful for explicit documentation explaining this rather than leaving me to guess what it means. This wouldn't violate the JSONSchema spec and would be a reasonable compromise. I use JSONSchema as a way to communicate intent to other humans, not just machines. I also feel this better conveys the intent of leaving the Validator nil.

func (fw *fieldWriter) resetPropertiesCount() {
fw.propertiesCount = 0
}
for key, field := range fields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/key/fieldName/

}
func addFields(m map[string]interface{}, fields schema.Fields) error {
props := make(map[string]interface{}, len(fields))
required := make([]string, 0, len(fields))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required := []string{} should be fine.

ew.writeString(`"type": "string", "format": "date-time"`)
return ew.err
func (v timeBuilder) JSONSchema() (map[string]interface{}, error) {
m := make(map[string]interface{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid the allocation, package private var and then return it as is.

var timeJSONSchema map[string]interface{}
func init() {
    timeJSONSchema = make(map[string]interface{})
    timeJSONSchema["type"] = "string"
    timeJSONSchema["format"] = "data-time"
)

Then simply return timeJSONSchema, nil

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't quite work. If new fields are added later, they would be added to the global var. E.g. if one timeJSONSchema field is set to be required, all later instances would as well.

I can do what @rs suggests below instead, and do initialization on the allocation line.

ew.writeString(`"type": "array"`)
func (v arrayBuilder) JSONSchema() (map[string]interface{}, error) {
m := make(map[string]interface{})
m["type"] = "array"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer:

m := map[string]interface{}{
    "type": "array",
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just lazy when I removed the sizeHint for the allocation. I will get it done.

_, ew.err = ew.w.Write([]byte(s))
}
// builderFunc is an adapter that allows pure functions to implement the Builder interface.
type builderFunc func() (map[string]interface{}, error)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we expose this?

Copy link
Collaborator Author

@smyrman smyrman Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I don't see any way end-users could implement a jsonschema.Builder other than letting a FieldValidator implementation also implement this interface, so I don't see any way in which BuilderFunc could be used by end-users....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS! I am happy to expose it later, if we can find a real end-user use-case for it.

)

func (v objectBuilder) JSONSchema() (map[string]interface{}, error) {
m := make(map[string]interface{})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, prefer map[string]interface{}{}. Only use make when you want to pre-allocate.

smyrman added a commit to smyrman/rest-layer that referenced this pull request Jan 2, 2017
@smyrman
Copy link
Collaborator Author

smyrman commented Jan 2, 2017

So if a field is defined as nil in schema we are to output a JSONSchema was empty object which is a stand-in for any type? I can sort of understand that interpretation but it doesn't help the consumer of that schema understand that this field member is used for which is effectively ignore this field?

Not necessarily an empty object, just an object without the "type" property. Field properties such as "required", "description" and "readOnly" could still appear in the object, if that's what's added to schema.Field.

UPDATE: To clarify: "if a field is defined as nil" -- I assumed you meant "if a field's Validator is defined as nil". The "filed" itself can't be set to nil, as schema.Fields is of type map[string]Field. It can of course hold a default empty value, and in that case, the JSON Schema object would indeed be empty, as was the case in the example test.

This is the one major weakness of JSON as a specification, lack of comments. If we're going to do that, I would suggest we explicitly put a "description" that describes this behavior for this field.

I understand it's confusing, but if we set a default description for some fields and not for others, that becomes an inconsistent and unexpected behaviour. Instead the one specifying the schema should add a fitting description to all of his fields. Hopefully it should be possible to understand that if the "type" property is not set in JSON, it means that any type can be passed in.

UPDATE: I have changed the example test to include such descriptions btw, which should make it more obvious what's going on.

UPDATE: -- Also, if we where to set a default description, which I don't think we should, that should happen in the schema package, not in the jsonschema package, so that other potential encodings of schema would hold the same description.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 2, 2017

I will do one more commit soon where I remove the size hints. and rerun the benchmarks. I will leave it as one commit for now and rebase later, in case you want to add/run one more benchmark @yanfali.

OK, rebased to one commit as I don't see it's going to be beneficial to do any more benchmarking specifically for the sizeHint removal...

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 2, 2017

Updated the docs as well.

@rs
Copy link
Owner

rs commented Jan 2, 2017

I like this solution better finaly. GG.

I would rename the method of the Builder interface to BuildJSONSchema. It is better to recall the name of the interface in a single method interface.

Copy link
Contributor

@yanfali yanfali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good now. Thanks!

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 3, 2017

Let's merge #78 first, and I will run the benchmarks.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 3, 2017

Benchmark results are here!

To summarize, we see that:

  • A tiny schema (one field) is 5x slower when the target buffer is pre-allocated, does 3x the number of allocations and allocates more than 8x the amount of memory in total.
  • A small schema is more than 2x slower, has almost 2.5x as many allocations, and allocates 5x the amount of memory in total.
  • A quite complex schema is just below 2x slower, does less than 2x the amount of allocations, and allocates slightly above 3x the amount of bytes.

Details:

benchmark                                                    old ns/op     new ns/op     delta
BenchmarkEncoder/Schema={Fields:{"b":Bool{}}}                1022          5203          +409.10%
BenchmarkEncoder/Schema={Fields:{"s":String{}}}              1005          5168          +414.23%
BenchmarkEncoder/Schema={Fields:{"s":String{MaxLen:42}}}     1285          5831          +353.77%
BenchmarkEncoder/Schema=Simple                               9128          20294         +122.33%
BenchmarkEncoder/Schema=Complex1                             34508         69803         +102.28%
BenchmarkEncoder/Schema=Complex2                             87383         165865        +89.81%

benchmark                                                    old allocs     new allocs     delta
BenchmarkEncoder/Schema={Fields:{"b":Bool{}}}                12             37             +208.33%
BenchmarkEncoder/Schema={Fields:{"s":String{}}}              12             37             +208.33%
BenchmarkEncoder/Schema={Fields:{"s":String{MaxLen:42}}}     14             41             +192.86%
BenchmarkEncoder/Schema=Simple                               55             133            +141.82%
BenchmarkEncoder/Schema=Complex1                             228            430            +88.60%
BenchmarkEncoder/Schema=Complex2                             545            1004           +84.22%

benchmark                                                    old bytes     new bytes     delta
BenchmarkEncoder/Schema={Fields:{"b":Bool{}}}                240           2001          +733.75%
BenchmarkEncoder/Schema={Fields:{"s":String{}}}              224           2001          +793.30%
BenchmarkEncoder/Schema={Fields:{"s":String{MaxLen:42}}}     248           2104          +748.39%
BenchmarkEncoder/Schema=Simple                               1200          6088          +407.33%
BenchmarkEncoder/Schema=Complex1                             6161          19268         +212.74%
BenchmarkEncoder/Schema=Complex2                             14235         44641         +213.60%

Command used to produce benchmarks:

$ go test -run=NONE -bench=. -benchmem -cpu 1

Comparison done using benchmp.

}
```

To easier extend a `FieldValidator` from the `schema` package, you can call `ValidatorBuilder` inside `BuildJSONSchem()`:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now exposed ValidatorBuilder to be a public function. Let me know if we should do this differently.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 3, 2017

@rs, @yanfali: This is ready for merge if you are happy with it.

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 3, 2017

@rs, @yanfali: This is ready for merge if you are happy with it.
Hold on -- need to update one docstring after searchreplacing JSONSchema() -> BuildJSONSchema()

@smyrman
Copy link
Collaborator Author

smyrman commented Jan 3, 2017

OK, ready.

@yanfali
Copy link
Contributor

yanfali commented Jan 3, 2017

Those numbers are pretty good for the realistic use case. I have no objection to merging.

Provides a public interface for rs#35, and switches the internal encoding
logic to rely on maps, as propsed in rs#51. This makes the implementation
simpler and more straight forward.

Compared to the poposed solution 1 in rs#35 that would rely on passing an
io.Writer to an encoder method, this solution will more easily handle a
corner-case where a custom validator writes zero bytes to the io.Writer.

The reduction in complexity does not come for free though. This looks to
be 5x slower for very tiny schemas, and aproximately 2x slower for more
complex schemas. As jsonschema is used as API documentation, and can be
easily cached, this performance loss should still be quite acceptable.
@smyrman
Copy link
Collaborator Author

smyrman commented Jan 15, 2017

rebased again due to new commits to master (no changes).

@rs, are you OK with merging this?

Or do you want to consider an alternate implementation?

@rs rs merged commit affb30a into rs:master Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants