Skip to content

Remove redundant LineStrings in order to save memory#2795

Merged
abyrd merged 10 commits into
opentripplanner:dev-2.xfrom
entur:compact_geometries
Aug 16, 2019
Merged

Remove redundant LineStrings in order to save memory#2795
abyrd merged 10 commits into
opentripplanner:dev-2.xfrom
entur:compact_geometries

Conversation

@gmellemstrand
Copy link
Copy Markdown
Contributor

After doing some memory profiling of OTP, we found that a lot of space is taken up by LineStrings. This affects both graph size and the memory requirement to run OTP. It is especially important for very large graphs, since you may run into a limit to the file size supported by the Kryo serializer.

Note that this is based on #2794 , so only the last commit is relevant for this particular pull request.

The following changes have been made:

The geometries in SimpleTransfer and Raptor Transfer objects have been removed. In the first case they were saved as LineStrings, and in the second as Coordinate arrays. Since we already have a list of edges, it is possible to just combine these whenever we need the complete geometry. This is typically when building the itineraries, so performance is not so critical. The geometries of StreetEdges are delta compressed using CompactLineString, so they do not take up much space.

The geometries in TripPatterns have been removed. This information is duplicated in the hop geometries, and are not needed before creating the itineraries. In addition, the geometries in hop edges now use the CompactLineString, like with StreetEdges.

In total this cuts the size of the serialized graph in half when testing with the Oslo GTFS and Oslo OSM files. I have tested with maxTransferDistance of 2000, which means there were relatively many SimpleTransfers, which took up a lot of space.

This does break the BusRouteStreetMatcher, as it tries to replace the whole TripPattern geometry, so that needs to be looked into before merging.

@gmellemstrand gmellemstrand requested a review from a team July 25, 2019 12:54
@gmellemstrand gmellemstrand changed the title Remove reduntant LineStrings in order to save memory Remove redundant LineStrings in order to save memory Jul 25, 2019
@t2gran t2gran added the OTP2 label Jul 30, 2019
@abyrd
Copy link
Copy Markdown
Member

abyrd commented Jul 31, 2019

@gmellemstrand the PR diff shows over 100 files changed, can you confirm that this is built on top of #2794 (Use realtime data in TransitLayerMapper), and I should look at a diff relative to that PR to understand your changes?

@gmellemstrand
Copy link
Copy Markdown
Contributor Author

It is built on top of 664c2c4, but I see that there are a couple of commits added to #2794 since that time.

@abyrd
Copy link
Copy Markdown
Member

abyrd commented Aug 1, 2019

In that case let's aim to clean up and merge #2794, since Thomas has already provided a lot of comments on that one, then I'll start working on this one.

@gmellemstrand
Copy link
Copy Markdown
Contributor Author

This is now rebased on top of dev-2x. Some things we need to discuss:

  • CompactLineString takes a start coordinate, end coordinate and a LineString and produces a byte array. In order to uncompress you need to provide the start and end coordinates again. For StreetEdge the coordinates of the start and end vertices are used. For patternhops I used the coordinates of the start and end stops. In the cases where the hop geometry is offset from the stops themselves, you will see some extra lines running into the stops. Probably the best change here is to add methods in CompactLineString that do not take start/end coordinates.

  • Are the changes to BusRouteStreetMatcher acceptable? I have not used the module.

@gmellemstrand
Copy link
Copy Markdown
Contributor Author

After discussing with @abyrd made the following changes:

  • Added methods to the CompactLineString which do not require start/end coordinates. In order for the delta encoding to work, 0-coordinates are added at each end and then the original methods are called.
  • There were methods in CompactLineString for variable length byte array encoding that were only used in tests. I have tested using them, and the size of geometries are cut in half with no performance hit to speak of.

@gmellemstrand
Copy link
Copy Markdown
Contributor Author

I have made a new commit that should save about half of space used for elevation profiles. Elevation is sampled along each StreetEdge at regular intervals (10m), and then the resulting coordinates are efficiently encoded using DlugoszVarLenIntPacker.

A typical elevation profile can look like this:

0 = {Coordinate@4453} "(0.0, 239.16)"
1 = {Coordinate@4454} "(10.0, 238.81)"
2 = {Coordinate@4455} "(20.0, 238.45)"
3 = {Coordinate@4456} "(28.602, 238.11)"

The last x-coordinate is the length of the StreetEdge.

I have changed CompactElevationProfile to only save the y-values and then reconstruct the x-values when uncompacting. I have also moved the distanceBetweenSamplesM field from ElevationModule to CompactElevationProfile.

@abyrd
Copy link
Copy Markdown
Member

abyrd commented Aug 16, 2019

Summarizing our discussion today: this PR looks good to me.

One remaining issue is how to store and set the distance between elevation profile sample points. It was effectively almost a constant in the past - but it turns out Entur has made modifications to allow setting this higher to 25m (since the horizontal resolution of their elevation data is only 50m). So the value needs to be saved at graph build and restored when the graph is loaded. We looked at making the (un)compacter a concrete class, serializing it with the Graph, or just passing the horizontal spacing into the unpack method. But unfortunately in all places where this method is called, we don't have access to a context object that lets us read configuration from the graph. So the horizontal spacing would have to be passed through multiple frames, adding parameters to multiple methods. We decided to just retain the current static configuration approach, which is stylistically kind of ugly but will require only one simple and easy to understand new line upon graph load.

This does mean that we cannot in the general case have more than one graph loaded at once because configuration in the global (static) scope is affected by the contents of a particular graph. Eliminating multiple routers was already planned for OTP2, but we are now making moves that really cement this decision.

@gmellemstrand 's changes add new methods compactElevationProfileWithRegularSamples and uncompactElevationProfileWithRegularSamples, in addition to the existing methods. These contain some duplicate code, allowing profiles with regular samples, and the general case with potentially irregularly spaced points. In practice we never use irregularly spaced points, so to avoid maintaining extra code I suggest that the new methods completely replace the old ones, rather than supplementing them.

Finally there is an issue of methods named compack* vs. compact*. At first I thought this was a typo, but in fact the first functions return byte[]s with variable-width integer encoded sequences, while the latter return fixed-width int arrays. A single letter spelling variation does not seem like a good way to distinguish between methods with otherwise identical signatures. The first set of methods call the second set of methods, and we never use the non-varint methods anywhere else. Therefore I suggested we just inline the fixed-width methods into the variable-width ones, and use only the standard spelling compact*.

@abyrd
Copy link
Copy Markdown
Member

abyrd commented Aug 16, 2019

@gmellemstrand once you make the changes we discussed, I can review and potentially merge today.

Copy link
Copy Markdown
Member

@abyrd abyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through all the details. Looks good.

@abyrd abyrd merged commit 671b95a into opentripplanner:dev-2.x Aug 16, 2019
@t2gran t2gran deleted the compact_geometries branch September 24, 2019 14:27
@abyrd abyrd added this to the 2.0 milestone Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants