Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected error handling semantics? #6

Closed
zeux opened this issue Jan 23, 2019 · 10 comments · Fixed by #17
Closed

Expected error handling semantics? #6

zeux opened this issue Jan 23, 2019 · 10 comments · Fixed by #17
Assignees

Comments

@zeux
Copy link
Contributor

zeux commented Jan 23, 2019

I'd like to make sure that errors are handled consistently, with an eventual goal of fuzzing the parser to make sure it is memory safe.

When cgltf_parse returns cgltf_result_success, cgltf_free frees all the data correctly (hopefully? :)).

However, when cgltf_parse returns an error, several issues may occur:

  • The error can be returned too early, in which case it's unsafe to use cgltf_free since some pointers aren't even filled!
  • The error can be returned mid-parse, in which case pointers aren't fixed up so it's possibly unsafe (?) to use cgltf_free - not 100% sure about this, I haven't studied the internal structures in detail.
  • Failure to call cgltf_free after a failed parse may or may not result in a memory leak.

I'd like to clean that up but I'm not sure what the expectations are. I can see two possibilities:

  • cgltf_free assumes a fully constructed correctly parsed object; cgltf_parse must clean up after itself in event of any error (possibly by calling some internal function, or even cgltf_free itself, in strategic places)
  • cgltf_parse makes sure that the data structures are coherent wrt pointers that cgltf_free may access, cgltf_free is safe to call in all cases

What's the expected behavior here?

@jkuhlmann jkuhlmann self-assigned this Jan 23, 2019
@mosra
Copy link

mosra commented Jan 23, 2019

Possibly related question (I hope it's okay to post it here): how will cgltf be handling out-of-bounds accessors / buffer views? This was one of the major pain points with tinygltf, where a file with malformed accessor / buffer view offsets would still parse as if everything was okay, requiring the user to check it on every access instead. In my opinion it would make sense to do such checks in cgltf_parse directly, to avoid accidental unsafe memory accesses on user side when a broken file is encountered.

@pr8x
Copy link
Contributor

pr8x commented Jan 23, 2019

cgltf_free assumes a fully constructed correctly parsed object; cgltf_parse must clean up after itself in event of any error (possibly by calling some internal function, or even cgltf_free itself, in strategic places)

I think this is the best approach. We probably should introduce some sort of parser_cleanup that takes care of releasing the partially constructed parser state.
We should consider allocating the cgltf_data internally and return a nullptr in error case. That way we don't have to bother with invalid data passed to cgltf_free. This is how most libraries solved this problem. Downside is, obviously, additional dynamic allocations.

@jkuhlmann
Copy link
Owner

Yeah, I can certainly see that calling cgltf_free after an error can be problematic at the moment.

I kind of like the idea that you never get access to the cgltf_data instance if there was an error. I designed it the way it currently is because I was trying to avoid allocations. That didn't really work out too well though with all the variable-sized things that have to be stored.

Anyway, I think from an implementation point-of-view your second possibility would be best, @zeux . If it wasn't possible to call cgltf_free in case of parsing errors, it would be quite difficult to free the already allocated memory again. @toytonics 's proposed parser_cleanup would just be a more robust cgltf_free anyway.

So, my suggestion is that we should make sure all pointers are always initialized to NULL.
And then we tell the user to always call cgltf_free regardless of the return value of cgltf_parse.

Or am I missing something?

@mosra It would definitely be a good idea to catch this kind of error while fixing up the pointers. What kind of response would you expect in this case? NULL pointer? Return with a parsing error? Or something else?

@zeux
Copy link
Contributor Author

zeux commented Jan 23, 2019

Yeah I was planning to move to zero-initialized everything anyway. Having cgltf_free be safe to call always would be good. But just to be clear - will we force the user to do that, or will we do the cleanup ourselves? The current examples in the README suggest that cgltf_free shouldn't be called on error.

I'll think about the interface tradeoff a bit more.

For indices being in-bounds vs not, first step is to make sure that cgltf_data itself is well-formed; right now for example out-of-bounds indices in the input gltf will create out-of-bounds pointers. I plan to fix it by changing the relocation stage to fail in that case. Once we're fuzz-safe for the basic parser we can discuss other validation needs, maybe it would be good to file a separate issue describing the desired behavior.

@pr8x
Copy link
Contributor

pr8x commented Jan 23, 2019

I designed it the way it currently is because I was trying to avoid allocations. That didn't really work out too well though with all the variable-sized things that have to be stored.

In the end you're trading a lot usability and possibly UB against a small (if any; depends on the scenario) performance improvement. The allocation of cgltf_data usually happens once and strings and stuff need to be allocated anyway.

@zeux
Copy link
Contributor Author

zeux commented Jan 23, 2019

Additional thoughts on deeper validation:

There's many many conditions that could cause the user program that assumes a well-formed GLTF to cause memory safety issues. There are basic examples, like "number of items in all vertex streams in a primitive must coincide". There are examples like "all indices in the accessor point into a valid range as defined by vertex streams" that are more complex (see below). And then there's "scene nodes form a tree" - cycles in node tree would cause most programs to misbehave.

So I'd suggest implementing a separate function, cgltf_validate, that takes a parsed cgltf_data and performs deeper validation on it. This function will need access to buffer data to validate indices - we could pass an additional array of buffer pointers to this function so that it can query them. This should be separate from basic memory safety of the parser itself.

@zeux
Copy link
Contributor Author

zeux commented Jan 23, 2019

And yeah I like the idea of changing the interface to return a pointer to cgltf_data. That would make the interface more obvious - you’d only need to call free on a non-NULL pointer, and we wouldn’t return a non-NULL pointer for failed parse. It’s a breaking change but seems worth it.

@jkuhlmann
Copy link
Owner

jkuhlmann commented Jan 23, 2019

Ok, this all sounds good to me:

  • Change interface to allocate cgltf_data instead of the user providing it. I still think we'll probably call cgltf_free internally if something goes wrong anyway, right? Also, I think it should always be valid for the user to call cgltf_free even if a NULL pointer is passed in (just like regular free; and yes that would be different from what the README currently suggests).
  • Pointer fix-ups should make sure the index is valid.
  • A separate validation function for more heavy-weight validation is also great.

@zeux
Copy link
Contributor Author

zeux commented Jan 23, 2019

Yeah I intend to reuse cgltf_free internally and maintain parse data in a deallocatable state at all times. This sounds good; I’ll create a separate issue for thorough validation and work on the interface changes separately.

@mosra mosra mentioned this issue Jan 23, 2019
2 tasks
@mosra
Copy link

mosra commented Jan 23, 2019

@zeux just opened #14 with all the points from above :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants