Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support querying the polylines more efficiently #26

Closed
RomanIakovlev opened this issue Dec 10, 2018 · 8 comments
Closed

Support querying the polylines more efficiently #26

RomanIakovlev opened this issue Dec 10, 2018 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@RomanIakovlev
Copy link
Owner

The use case to query a single geographical position seems to work rather well, but when it comes to querying multiple positions, there is a room for improvement. Particularly, there is a use case to query the contiguous sequence of points (a polyline, e.g. a GPS trace). In that situation, it might make sense to skip repeated quad tree queries and try to first test the same geometry as the one which contained the previous point in the polyline.

@RomanIakovlev RomanIakovlev added this to the 2018g milestone Dec 10, 2018
@RomanIakovlev
Copy link
Owner Author

RomanIakovlev commented Dec 10, 2018

One interesting question here is how the API of such a querying method should look like. To keep the spirit of current singe-point querying method, the polyline querying method should probably accept the array of primitive doubles, which would contain a consecutive pairs of latitude and longitude, like {latitude1, longitude1, latitude2, longitude2, ..., latitudeN, longitudeN}.

But how the return type should look like is a question, particularly, in the light of need to support of overlapping timezones, as introduced in release 2018g of the source data. Suggestions are welcome.

@dustin-johnson
Copy link

I have taken the approach in TimeZoneMap of returning a TimeZone object in response to a location query. This object has a method to query the minimum distance from a location that could be traveled to exit the time zone.

By the way, I very much appreciate Timeshape. It heavily influenced the design of TimeZoneMap, and I'd be interested in any feedback you might have.

@RomanIakovlev
Copy link
Owner Author

Hi @dustin-johnson, thanks for your kind feedback! I appreciate the fact that Timeshape was an inspiration for someone. I've checked out your library. Looks like the main difference from Timeshape is the following (but please correct me if I'm wrong, since I might have missed something):

  1. You don't require java.time.* and java.util.Optional, to make your library run on Android.
  2. You return Stream<TimeZone> instead of Optional<ZoneId>, so multiple time zones can be returned. I plan to add support for this in the next release of Timeshape, but it's not there yet, because till version 2018g of source data there were no overlapping time zones.
  3. You return the geometry of the time zone's bounding polygon with the query result. I have opted not to do so to avoid exposing the actual geometry engine (ESRI) to the users, to keep the surface area of the library smaller. Right now the API of Timeshape is just Java, without any 3rd party dependencies.
  4. You use FlatBuffers instead of Protobuf. I guess, this comes with some performance improvement? I'd be curious to see the numbers.
  5. You don't use indexing, and instead search through all the known time zones upon the query. I don't have numbers to back me up on this, but I think this should come with a certain performance penalty, because quad-tree search should be orders of magnitude faster compared to test if point belongs to a polygon. It might or might not matter in practice, because the query might be fast enough anyway.

All in all, I think you did a good job with your library. In fact, I'd appreciate if those efforts could go into the Timeshape instead! :) I'm very much open to changes and improvements in Timeshape, to accommodate needs of its users on any platform, supporting as many use cases as possible without compromising main goal of the library. What do you think about contributing to Timeshape, instead of maintaining the TimeZoneMap? We could merge these two libraries, to have just a single one for Java to provide such features.

My main motivation for this suggestion is to unite efforts on a single goal, rather than doing the same work twice in different libs. I think, in Open Source community, it's usually the better way, because the total value for the community will be higher. I mean, it's better to have one great library providing certain feature, rather than two just good ones, isn't it? Of course, with multiple maintainers there will be some challenges, like the need to find compromises on technical decisions, or taking different priorities into considerations, but I think so far the technical differences between our libs are rather minor, and I think there should be no big problem of merging them, if we approach this with open minds.

Please give it some thought and let me know if you're interested, at least in general. Of course we'll have to have some more thorough technical discussion before committing to this, but it doesn't make sense to start such a discussion if you prefer to go alone.

@dustin-johnson
Copy link

Hi @RomanIakovlev, so sorry for the delayed response. I'm certainly open to the idea of merging the projects. While the code is quite different, the overall structure and concept is very heavily inspired from your work, and so it makes sense to marge this project with its origins.

I'm currently tweaking and exploring quite a bit with the design and features, so I wonder if it makes sense to talk more about the logistics of merging them (and if it still makes sense) when I'm closer to completion. I don't expect that to take more than a week or two. In the meantime, I'd be honored if you wanted to start merging the projects. It would give you an opportunity to dig into TimeZoneMap a little deeper and see how compatible they are in their nitty-gritty details.

Specifically responding to questions/musings:
3 - You're right that exposing the ESRI Polygons significantly increases the surface area of the API. I debated it quite a bit myself. In the end I decided that the library can be so much more useful than just a timezone finder if the polygons are exposed. I'm thinking of applications that want to render time zones on a map, etc.
4 - I'd hoped that there would be performance improvements, but I didn't see any dramatic difference. In the end I decided to keep flatbuffers for no really good reason and wouldn't be opposed to switching back to protobuff. Perhaps it would be worth trying to compare them more rigorously.
5 - It's definitely slower to search through the polygons manually instead of indexed in a quad-tree, but my testing showed it as not that much slower (perhaps 1/2 as fast as quad-tree). At first I thought this was significant, but now I'm really not so sure. I believe the primary path ought to be to load a subset of the map data, and I've optimized that path quite a lot. Now the time zone regions are stored in the tar with their bounding envelopes as their file names. This allows the forRegion() loader to entirely skip deserializing those regions that don't overlap. Also, now the time zone regions are loaded if they intersect with the desired bounding box, and the loaded regions are clipped to the bounding box. This greatly simplifies the user's mental model of the map system, as they needn't worry about providing a bounding box that is big enough to fit the needed time zone inside it, and it greatly reduces the amount of memory needed to hold the mapping data. In my testing, a 2 degree x 2 degree square map often has less than a thousand points in it but is still 100s of kilometers big. For these small maps, having to loop through the small amount of regions is very fast.

@RomanIakovlev
Copy link
Owner Author

Hi @dustin-johnson, it's great you like the idea of merging the libraries! I'll open a new issue to discuss and track the progress of this.

@RomanIakovlev RomanIakovlev added the enhancement New feature or request label Jan 9, 2019
@juarezr
Copy link
Contributor

juarezr commented Jan 10, 2019

3 - You're right that exposing the ESRI Polygons significantly increases the surface area of the API. I debated it quite a bit myself. In the end I decided that the library can be so much more useful than just a timezone finder if the polygons are exposed. I'm thinking of applications that want to render time zones on a map, etc.

Maybe timeshape could be split in three libraries:

  1. base library with unstable API not meant for being used in applications.
  2. low dependency/impedance wrapper library with current API with high stability garanties using 1 as implementation
  3. specialized wrapper library with polygon library dependency with low stability garanties using 1 as implementation

In this scheme releases of 2 and 3 must be tighly coupled with 1.

4 - I'd hoped that there would be performance improvements, but I didn't see any dramatic difference. In the end I decided to keep flatbuffers for no really good reason and wouldn't be opposed to switching back to protobuff. Perhaps it would be worth trying to compare them more rigorously.

Would be interesting to investigate the tradeoffs by benchmarking protobuf vs flatbuffers vs capnproto as in issue #29.

Protobuf is more space/memory efficient due to byte encoding while capnproto and flatbuffers tend to be more processing effective due to zero enconding and random access.

Maybe could be created different libraries with same API if the performance/memory tradeoffs turns out to be quite separated and isolated, leaving to the developer decide according the applications needs.

5 - It's definitely slower to search through the polygons manually instead of indexed in a quad-tree, but my testing showed it as not that much slower (perhaps 1/2 as fast as quad-tree). At first I thought this was significant, but now I'm really not so sure. I believe the primary path ought to be to load a subset of the map data, and I've optimized that path quite a lot. Now the time zone regions are stored in the tar with their bounding envelopes as their file names. This allows the forRegion() loader to entirely skip deserializing those regions that don't overlap. Also, now the time zone regions are loaded if they intersect with the desired bounding box, and the loaded regions are clipped to the bounding box. This greatly simplifies the user's mental model of the map system, as they needn't worry about providing a bounding box that is big enough to fit the needed time zone inside it, and it greatly reduces the amount of memory needed to hold the mapping data. In my testing, a 2 degree x 2 degree square map often has less than a thousand points in it but is still 100s of kilometers big. For these small maps, having to loop through the small amount of regions is very fast.

  1. For performance matters would be interesting to research if indexing polygons leads to better query performance than comparing through polygon matching. This would be harder because would be one more processing step decomposing the polygons in pieces using a algoritm like S3/H3 but using 64 bits cell-ids should be faster for searching. See my comment in issue Consider adding lazy initialization to TimeZoneEngine #31.

  2. As the talk in #28 there is a trick for getting the continents by using a regexp: All timezones start with continent name as America/Sao_Paulo or Asia/Kolkata. So a regexp like "America.+" or "Africa.+" should do the job. Together with the functionality, a good documentation with examples, for helping the users and reducing the questions asked.

@juarezr
Copy link
Contributor

juarezr commented Feb 19, 2019

Followup to my comment on issue #31:

  • The H3 geospatial indexing system could be used for improving the query performance of OSM shapes.
  • For building this one must change the hierarchical representation of shapes in a quadtree by a database indexes by 64 bits cell-ids.
  • The polygons must be decomposed in compact hexagonal cells and indexed by the pair cell_id/zone_id into a table ordered by cell_id.
  • Querying is done simply by transforming the lat,lng coordinate into a h3_cell_id and doing binary search in this table.
  • H3 cell resolution in compact hexagonal representation can be subdivided in sizes from 0.5 meters to 1107 km.
  • There is a Java library available for development.

@RomanIakovlev
Copy link
Owner Author

Moving discussion about Timeshape & TimeZoneMap collaboration to #50, closing this as it's about polyline query, which is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants