Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 3: encoding #20

Closed
springmeyer opened this issue Jun 30, 2017 · 7 comments
Closed

Phase 3: encoding #20

springmeyer opened this issue Jun 30, 2017 · 7 comments

Comments

@springmeyer
Copy link
Contributor

springmeyer commented Jun 30, 2017

This issue is to track overall discussion on the next phase of development for this library: encoding.

The previous phases were Phase 1 (enumerations) and Phase 2 (decoding) (landed in mapbox-gl-native in mapbox/mapbox-gl-native#9312 🎉 ).

The primary goals for the encoding implementation are:

  • High performance / minimal overhead / low to zero allocation
  • Headers are modular and able to be used in a zero cost way (you don't get any extra dependencies than absolutely required for a given header - e.g. no dependency on geometry.hpp or variant unless the header absolutely needs it)
  • API is low-level like protozero. Downstream developers will be able to easily abstract on top of it be be able to integrate into libraries like mapnik-vector-tile / tippecanoe.

Roles

  • @joto will lead development of design over next 3 weeks (July 3-21)
  • @flippmoke will support, test, advise on viability of API for integration downstream

/cc @mapbox/core-tech

@joto
Copy link
Contributor

joto commented Jul 6, 2017

Here are some notes/todo items:

Use cases

Common use cases that the library must support:

  • Reading VT, decoding all layers.
  • Reading VT, only decode some layers based on type and/or name.
  • When reading layer, only use some features based on tags, only decode geometry when needed
  • Copy parts of VT based on layer type/name and/or feature tags
  • Copy parts of a VT and add newly generated parts (layers, features)
  • Create VT from scratch
  • Work together with geometry.hpp lib. Should be able to create/read geometry.hpp geometries.

It would be good to have small test programs for all use cases. They can be used for development to make sure use cases are supported and the interface is easy to use and later are the basis for tests and docs.

Features

  • When writing features, some kind of rollback is needed if the user decides to abandon writing a feature. This is used for instance when it turns out a geometry is invalid.
  • Think about range checking for geometry values (fit in extent plus some extra? fit in 16 bit?). Avoid values that might overflow in our code or in user code.
  • Think about range checks in names and string tags to avoid overflows and excessive memory use.
  • Geometry pre-allocation (reserve()) based on count in geometry commands?
  • Error reporting. Add error messages in exceptions. Think about classification of exception classes.
  • Check iterator type traits for layer/feature iterators. Is this an input iterator or a forward iterator?

TODO

  • We need more test data. A collection of "typical" vector tiles with varying content would be useful. Small tiles, large tiles, just on layer, many layers, different tag value types, different geometries etc.

@mapsam
Copy link
Member

mapsam commented Jul 11, 2017

Thanks for writing this up @joto!

We need more test data

I'd be interested in helping out with this. I started working on mvt-fixtures (which is currently being used by this library) and can add more there. Also recognizing having fixtures in a separate repository might be a pain to maintain, happy to add some here.

@joto
Copy link
Contributor

joto commented Jul 13, 2017

@mapsam I think having the fixtures in an extra repository is okay. Yes, a bit harder to use here, but easier to use across projects.

Here are some things you can help with:

  • We could use a collection of vector tiles from different sources. Something like the one started for the benchmarks, but more diverse sources: Different zoom levels, different types (street map tiles, QA tiles, very simple tiles without tags, complex tiles with many tags and layers, etc.) Whatever people use vector tiles for. Doesn't have to be all copied in here, but some URLs could also help as a first step. The collection is a good basis to make sure different use cases are handled well in the lib. It is too easy to overlook use cases and assume things which might not be true for all vector tiles. Typical vector tiles will only have very few tag keys per layer for instance, but that is not true for QA tiles. So this is to make sure we cover all bases.
  • We need some systematic way of creating tiles with all (or as many as possible) legal and illegal corner cases (like no features in a layer, etc.) There is a good start in mvt-fixtures/fixtures/invalid, but I suspect there must be more.
  • We need tiles that are deliberately created with invalid data. Such as a length field in the data saying there are 20 billion points in this linestring even if there aren't. This is essential to test our library against attackss. This needs knowledge of the underlying data format. Best approach here: Write a program that creates a normal tile, then change a byte here or there and write it to disk. Keep the program as documentation and when you want to change the test examples later.
  • We should think about some way of structuring the test vector tiles and have some description of the tests not just in the README, but more structured format. For instance a json file for each test that has the information: Is this a valid tile according to spec version 1? version 2? Not valid but should be readable anyway? Not valid and must throw error? Recoverable error (ignore this feature but keep reading other features) or unrecoverable error (somebody is messing with us, stop reading from this tile)? I am sure there are more criteria here. It is important to think about the different categories of errors here, so the library can do the right thing.

@mapsam
Copy link
Member

mapsam commented Jul 19, 2017

@joto I like the idea of improving how to describe fixtures, such as a JSON object. Could we use https://github.com/yinqiwen/pbjson to create protobufs from these JSON objects?

@joto
Copy link
Contributor

joto commented Jul 24, 2017

@mapsam I was just thinking about using JSON as a way to encode some metadata about the tests not the test data itself. The test data can be created in many ways, for some it might be possible to use pbjson or something like it, for other we really need our own code where we can fiddle with the bits.

Test data should be easy to create and we also want to make sure that we don't introduce too many dependencies. After all we don't want to test pbjson or anything like it but our own code, so it might be better to stick with tried and tested libs like the official Google protocol buffers libraries to create the test data. And for the tests we don't need the performance that a C++ lib will provide, maybe the convenience of a Javascript or Python library is better here. But feel free to experiment!

@springmeyer
Copy link
Contributor Author

springmeyer commented Aug 1, 2017

We need tiles that are deliberately created with invalid data. Such as a length field in the data saying there are 20 billion points in this linestring even if there aren't. This is essential to test our library against attackss. This needs knowledge of the underlying data format. Best approach here: Write a program that creates a normal tile, then change a byte here or there and write it to disk. Keep the program as documentation and when you want to change the test examples later.

This is really well articulated and would be very very valuable. I agree that it is critical to have the changes to make the tiles invalid be targeted, specific, and documented in code (so you could return when you think of another invalid change).

I will add however that the random approach is also valuable in finding edge cases (fuzzing) things that would break with invalid input we can't think of and I noticed https://github.com/google/libprotobuf-mutator which could be used for that effort.

@springmeyer
Copy link
Contributor Author

Going to close this ticket as I think we are resolved on all the issues discussed above.

To recap:

The encoding implementation is done, thanks to @joto. We've decided it will live in a newly named repo that has no dependency on anything but protozero. See mapbox/vtzero#1.

The small, single-purpose/malicious fixture discussion has blossomed into mvt-fixtures 3.x. The suite is ready to use to add many more fixtures and we can track work there. See mapbox/mvt-fixtures#15 and mapbox/mvt-fixtures#21

The large/varied tile samples discussed above is not done, but would be easy to start working on if we had a ticket dedicated to the idea. So I've created mapbox/mvt-fixtures#22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants