Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen: make error info available when tuples process data that is too long. #99

Merged
merged 1 commit into from Oct 21, 2020

Conversation

warpfork
Copy link
Collaborator

Make error info available when tuples process data that is too long.

(Should fix #97 .)

This requires introducing an error-carrying NodeAssembler,
because the AssembleValue methods don't have the ability to return errors themselves.

AssembleValue methods have not needed to return errors before!
Most lists don't have any reason to error: out of our whole system,
it's only struct-with-tuple-representation that can have errors here,
due to tuples having length limits.
AssembleValue for maps doesn't have a similar challenge either,
because key invalidity can always be indicated by errors returned from
the key assembly process.

I'm not a big fan of this diff -- error carrying thunks like this are
ugly to write and they're also pretty ugly to use -- but I'm not sure
what would be better. ListAssembler.AssembleValue returning an error?
Turning ListAssembler into a two phase thing, e.g. with an Advance
method that fills some of the same role as AssembleKey does for maps,
and gives us a place to return errors?

…oo long.

This requires introducing an error-carrying NodeAssembler,
because the AssembleValue methods don't have the ability to return errors themselves.

AssembleValue methods have not needed to return errors before!
Most lists don't have any reason to error: out of our whole system,
it's only struct-with-tuple-representation that can have errors here,
due to tuples having length limits.
AssembleValue for maps doesn't have a similar challenge either,
because key invalidity can always be indicated by errors returned from
the key assembly process.

I'm not a big fan of this diff -- error carrying thunks like this are
ugly to write and they're also pretty ugly to use -- but I'm not sure
what would be better.  ListAssembler.AssembleValue returning an error?
Turning ListAssembler into a two phase thing, e.g. with an Advance
method that fills some of the same role as AssembleKey does for maps,
and gives us a place to return errors?
Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems preferable to the other two approaches you mention in terms of code complexity 👍

@mvdan
Copy link
Contributor

mvdan commented Oct 20, 2020

I agree that this solution feels a bit hacky, but I don't think it's terrible. The other options you suggest would make the interface more complex, and this is an edge case that the vast majority of users shouldn't run into anyway.

What about returning a nil assembler (like before), and we document that the caller should do a nil check? We can document that a nil can be returned in specific invalid cases like this one here. That doesn't work if we want to surface the ErrNoSuchField error, though.

Another option, perhaps a bit cleaner, would be to return a NodeAssembler which is entirely a no-op, and have the Finish call return the error. This could result in the caller doing more work as they would think the NodeAssembler is working normally, but the positive is that we'd report the error in a better place.

How about a mix of both:

assembler := x.AssembleValue()
if assembler == nil {
    // Something went wrong. Fetch the error via Finish.
    err := x.Finish()
    // handle error
}
// continue assembling

@warpfork
Copy link
Collaborator Author

Not a fan of nils in general. Getting a nil dereference panic is generally my least favorite error to debug. Even if the line numbers do it, it's... not a joy. Can't write documentation that people will find when they put that error message string into their search engine of choice unless you control what the error message text is.

(Returning nil from methods like MapIterator() is less bothersome, because the only cases where that can appear are those where one already should've switched on the Kind(). Although also, now that I think of this... if we update the Node interface to have many fewer error returns and panic instead, we should probably make the MapIterator and ListIterator methods panic in those cases too, rather than just return nil. But I digress.)

A no-op'ing NodeAssembler that holds onto the error is an interesting idea. I think the caller doing more work is going to be too big a cost in practice though: if we tried to hold onto that logic recursively, that could be a lot of extra work. (Also, it would be unclear what we'd do for yielding further child assemblers. Yield more no-op'ers recursively? Possible but not feeling good.)

@warpfork
Copy link
Collaborator Author

(Okay, this comment is going to be a no-op, because I talked myself back out of it by the end, but still, to make the reasoning available...)

You got me thinking about the exact position of the error a bit more. We should think about whether this gives us the right "line numbers" if we imagine this thing paired with a serial decoding function that can give line numbers and offsets in the stream when the

If we do what this diff currently does -- channeling the error through a dummy NodeAssembler and raising it on the next opportunity -- we will get the error roughly one step too late.

In other words, if we have a schema type Foo struct { a Int, b Int, c Int } representation tuple, and we parse json [1, 2, 3, 4]...

Hm.

Nope, on second thought: this is fine, because our internal reasoning typically elides separator tokens that are codec details anyway. The distinction between "the comma before the four" and "the four" is so minute we just don't regard it: none of our codecs pass up distinct event for "the wire format indicated an element coming", they just pass up the element, and get the child assembler and call its assign method in the same breath. So: this is not a concern: the code as is works, nevermind.

@warpfork
Copy link
Collaborator Author

Think I'm gonna give this a merge then; thanks for the additional review!

@warpfork warpfork merged commit e0aac3b into master Oct 21, 2020
@warpfork warpfork deleted the codegen-error-from-tuple-overshoot branch October 21, 2020 19:28
@aschmahmann aschmahmann mentioned this pull request Feb 18, 2021
73 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panics possible when using dagcbor to unmarshal data that doesn't fit in in a schema
3 participants