New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorTiles Part 2.1: The Reckoning (again) #1622

Merged
merged 95 commits into from Oct 5, 2016

Conversation

Projects
None yet
4 participants
@fosskers
Contributor

fosskers commented Sep 6, 2016

TODO

  • Layer IO via Spark
    • Simplify API
    • Avro codecs
    • Fix serialization problems
  • Tests
    • Avro encoding
    • IO Demo

Motivation

Decoding and encoding individual tiles is fine, but more work is necessary to glue VectorTiles into the rest of GeoTrellis.

This builds off #1563 .

kevinzau and others added some commits Mar 17, 2016

refactored Vector Tile for (hopefully) easier use and encoding in the…
… future, added better filter funcitonality, added more test data and test data generating examples
(vt) Moved protobuf-specific code to its own subpackage
- This is to make for easier addition of backends
(vt) Added traits for generic high-level VectorTile types
- These traits don't assume a backend, and should first be extended
  by classes like `ProtobufTile`, etc.
(vt) Suggestion for `ProtobufLayer` implementation
- Using lazy Streams here allows us to avoid strictly holding the Stream head,
  meaning no Features of a geomtype we don't care about will be parsed,
  unless we ask.

- This is advantangeous for queries as well. If you are looking for a Feature
  match on some metadata point, only Features will be parsed until you find
  what you're looking for.

- Potential disadvantage being intermittent instances of the opposite
  Single/Multi you're looking for will fully parse as well. The alternative
  is the ad-hoc reimplementation of laziness with internal mutable data
  structures.

  Consider the following scenarios, where P and MP are Polygon and
  MultiPolygon respectively:

  [P MP P MP P MP]  -- A list of alternating raw Polygon features.

  (1) The user wants to find a particular Polygon, which unknown to them
      is the second one in the list. They have to parse the first P,
      do *something* to the first MP, parse the second P and match on it,
      then stop.

      With Streams, the original list now looks like: [ MP P MP ]
      With custom laziness, it looks like: [ MP MP P MP ]

      The custom laziness wins for speed here, since we were able to
      cancel the parsing of the first MP early, and the Streams
      approach fully parsed the first MP.

  (2) The user wants to perform another operation, this time across all
      Ps. Both approaches must thus map over the entire list.

      With Streams, the original list is now empty: []
      With custom laziness, it looks like: [ MP MP MP ]

      It's harder to tell who wins here, since while the Streams
      had to waste time fully parsing each MP, the custom laziness
      had to reparse old MPs it had already looked through.

  (3) The user wants to perform another operation on all Ps. The streams
      can go ahead since everything has been parsed. The custom approach
      must reparse all the MPs to check for Ps, since it wouldn't know
      there weren't any left.

  (4) The user wants to perform an operation on MPs this time. The streams
      can go ahead since the MPs are already parsed. The custom approach
      has to reparse the MPs for the fourth time.

  My takeaway: the custom approach is better for one-off operations on a
  particular geometry type. The stream approach quickly overtakes the other
  if you plan multiple operations over the same geometries.
(vt) Make `ProtobufGeom` a multiparam typeclass
- This reduces redundant code, and takes into account how the `Seq[Command]`
will actually be traversed (that is, fairly agnostically as to what true
Geometry type lies beneath). This way, one doesn't have to "backtrack" when
parsing Multi{Line,Polygon}s, or try any analytics to discover which true
Geometry you have before you attempt actual parsing.

fosskers added some commits Aug 30, 2016

(vt) Tiles should know their own `Extent`
- And so the only thing require we during IO is that Extent. This keeps
  the API simple.
(vt) extent -> tileWidth
- Renamed to avoid confusion with GeoTrellis `Extent`.
(vt) Feature segregation should be lazy as well
- This shaves off a bit more time from the vanilla decoding process
(vt) Added more equality tests
- Something is wrong with the MultiPolygons...
(vt) Use a better Avro schema for VectorTiles
- Avoiding borrowing the Tuple schema, it was causing problems.
@fosskers

This comment has been minimized.

Contributor

fosskers commented Sep 6, 2016

Rebased off of the lastest additions to the original PR.

fosskers added some commits Sep 6, 2016

(vt) Fixed VT Avro schema
- It handles the embedded `Extent` properly now.
(vt) Avro encode `Array[Byte]` as `ByteBuffer`
- This avoids a Spark exception.
(vt) Remove unneeded `ByteArrayCodec`
- Since VectorTiles encode their bytes themselves to reduce complexity.
@fosskers

This comment has been minimized.

Contributor

fosskers commented Sep 7, 2016

A demo of this functionality can be found here: https://github.com/fosskers/vectortile-io

@fosskers fosskers changed the title from [WIP] VectorTiles Part 2.1: The Reckoning (again) to VectorTiles Part 2.1: The Reckoning (again) Sep 7, 2016

@fosskers fosskers referenced this pull request Oct 5, 2016

Merged

VectorTiles Part 3: Return of the VectorTiles #1651

2 of 2 tasks complete

@lossyrob lossyrob merged commit c043b87 into locationtech:master Oct 5, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lossyrob lossyrob added this to the 1.0 milestone Oct 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment