Replace Java serialization with Kryo #2681

abyrd · 2018-11-22T06:29:08Z

To be completed by pull request submitter:

issue: Replace Java serialization #2643
roadmap: This is not formally on the roadmap but (de)serialization speed has been a central concern for a few years.
tests: Tests have been added to ensure that the object graph remains identical after a round-trip through serialization and deserialization. See org.opentripplanner.routing.graph.GraphSerializationTest. This uses an object graph comparison tool in R5 that is intended specifically for checking that serialization works properly. The comparison framework is itself covered by pretty thorough tests.

To be completed by @opentripplanner/plc:

reviews and approvals by 2 members, ideally from different organizations
before merging: add a bullet point to the changelog file with description and link to the linked issue
after merging: update the relevant card on the roadmap

@csolem

This copies in a recusrive object differ from @csolem from Entur fork https://github.com/entur/OpenTripPlanner/tree/protostuff_poc I began adding some tests to the differ which show that it produces some false negatives (e.g. doesn't see differences in map values)

The problem with the differ causing the test to fail was identified. It does not check whether the outermost objects to be compared are Collections or Maps.

This enables reliable tests that serialization / deserialization perfectly restores a Graph. In some ways this simplifies the differ, but it also ensures that it performs a very deep recursive comparison of almost every object. Also added tests that can compare two references to the same object, which provide strong evidence that the differ actually works.

Also refactored Graph serialization tests a bit.

This pulls in a stable version of our testing framework for Kryo serialization.

This test is failing on Travis CI but not locally.

abyrd · 2018-11-22T08:13:10Z

Some tests are still failing, specifically GraphSerializationTest.testRoundTrip (fails assertion) and GraphServiceTest (errors, no graphs loaded). I will need to fix these before we can really test this branch.

This makes it similar to the registration used in R5 - all primitive hashmaps are set to use their Externalizable implementation, unless we have a specific serializer for them.

Apparently this was a necessary step for Java serialization, but is no longer relevant for Kryo serialization.

abyrd · 2018-11-23T07:30:12Z

This is bizarre. The serialization round trip test fails on Travis CI (which uses Oracle Java 1.8_151) but I can't get it to fail locally even when running on exactly that JVM version. It builds with no errors and passes all tests, in both IntelliJ and under command line Maven, on both Oracle Java 1.8_151 and 1.8_192 (the most recent).

I can however reproduce test failures on an EC2 instance running Amazon Linux 2, Maven 3.6.0, and OpenJDK 1.8.0_191. I get three mismatches: Primitive Long value mismatch: 1542957457381 vs 1542957457382 in testEmptyGraphs, and
No-entry value differs between two maps. One reference was null but not the other. in testRoundTrip.

There may be good reasons for this - minute differences nested deep within internal details the OTP object graph, or unhandled edge cases in the comparisons.

The cached timezone in the graph is transient and lazy-initialized. Previous tests will sometimes cause a timezone to be cached. This seems to depend on non-deterministic test execution order.

abyrd · 2018-11-23T11:57:07Z

With latest commits to OTP and to the ObjectDiffer code in R5, build is now passing on both Travis CI and the EC2 + OpenJDK combination.

NL graph on the EC2 machine, 30GB memory: 24 minutes to build.
Writing graph: 1 minute 36 seconds
Graph size on disk: 1.5GB
Loading graph: 1 minute 11 seconds + 25 seconds indexing

Comparing to OTP master on the same machine: 23 minutes to build.
Writing graph: 1 minute 45 seconds
Graph size on disk: 2.1GB
Loading graph: 4 minutes 57 seconds + 25 seconds indexing

So this PR provides:

29% improvement in graph size
76% improvement in load time

abyrd · 2018-11-23T12:25:45Z

@t2gran @gmellemstrand this is now ready for review and testing. It seems to work well in my testing.

gmellemstrand · 2018-11-29T14:57:36Z

We have tested this and all our tests pass. We are currently running it in production.

optionsome · 2018-11-29T15:03:21Z

Quickly tried to merge this into our fork but some tests were failing and didn't have time to look at it at that time. We will most likely take a better look at it in the near future and I don't yet know if all the issues are just specific to our fork but at least all the tests passed when I ran this branch on my computer.

abyrd · 2018-11-29T15:52:02Z

Thanks for the feedback @optionsome. If your fork adds any new classes to the graph, it's possible that they are not serialized correctly, or that they are just not compared correctly in the test. You might need to register a new serializer or add exceptions to the object tree comparator. If you get stuck at any point I may be able to provide guidance on how to get the comparator or serializer working with your data structures.

abyrd · 2018-11-29T15:58:59Z

@irees @fpurcell if you have a moment to spin up this PR, some corroborating test results would be helpful. Basically there's nothing special to do, just build a graph, load it, and make sure the routing is identical to previous versions. So theoretically could just mean running a batch of automated tests.

fpurcell · 2018-11-30T04:08:45Z

@irees @fpurcell if you have a moment to spin up this PR, some corroborating test results would be helpful...

Hey Andrew.

I built a graph and ran the TriMet tests against your PR, and all tests passed: http://maps7.trimet.org/p/otp_report.html.

So you got my 1/2 thumb up. Here are the artifacts from that build: http://maps7.trimet.org/p/

Take care,
Frank

abyrd · 2018-11-30T05:45:01Z

Thanks @fpurcell. I was hoping for a full thumbs up so the PR would be approved but if you aren't ready to give it we can wait on more input from others.

I see that your build report begins with a reassuring "All tests are PASSING" in green, but there are seven lines that say "failed" in red. Are those failures normal?

fpurcell · 2018-11-30T06:24:25Z

Hey Andrew.

Sorry to confuse with my poor attempt at humor. I was thinking that Ian + myself building 2 graphs and running tests were equal to a 1 together.

But that said, I can't do much more at the moment. Didn't look at the code changes I just tested. I do know, all tests passed..that's a big confidence boost.

Take care,
Frank

abyrd · 2018-11-30T06:52:38Z

@fpurcell what I meant is that following our new process, we need two official Github PR approvals (via the green review changes button at the top right) before it will let us merge. So if you consider your tests a success, please give it an official approval.

I also just wanted to make sure we were sure everything actually passed, because your report page says "fail" on it. But I'll take your word for it :)

pailakka · 2018-12-04T15:25:02Z

I tested this with out fork (same setup as @optionsome). It looks like that for some reason tje ObjectDiffer throws exception when comparing TreeMaps

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer

	at java.lang.Integer.compareTo(Integer.java:52)
	at java.util.TreeMap.getEntry(TreeMap.java:352)
	at java.util.TreeMap.get(TreeMap.java:278)
	at com.conveyal.r5.diff.StandardMapWrapper.get(StandardMapWrapper.java:40)
	at com.conveyal.r5.diff.ObjectDiffer.compareMaps(ObjectDiffer.java:262)
	at com.conveyal.r5.diff.ObjectDiffer.compareTwoObjects(ObjectDiffer.java:182)
        ...

additionally there is comparison differences especially with LuceneIndex in roundTrip test. I tried to revert our changes compared to upstream master but no joy. I'll try to find time to dig deeper

fpurcell

Passes the TriMet test suite.

abyrd · 2018-12-12T05:52:25Z

@pailakka looking at the lines cited in your stack trace, I see that the problem lies in the code that checks whether missing entries are represented the same way in both maps. This is only particularly important for Trove primitive maps, but I implemented it in a way that checks every kind of map, using an unlikely integer as a key. Because of type erasure in Java and the fact that the collections predate generics, the methods on maps will generally accept any object as a key even if they are keyed on some specific other class. This worked fine for all the Map implementations I tried, but it does not work for TreeMaps because they use a Comparator to establish the ordering of they keys, so all keys must be mutually comparable.

I suppose I should switch to an alternate implementation in which the map wrappers include methods that fetch the representation of a missing value from their wrapped maps. Indeed I started doing that at one point but thought it more robust to use this more general technique, which would be impervious to bugs or omissions in the map wrapper classes.

I'll make a ticket for this problem in the R5 library. In the mean time you can just exclude TreeMaps from comparisons in your tests, either by excluding the entire class or excluding TreeMap fields by name.

The LuceneIndex was an experimental, unsupported feature, and is only lazy-initialized after the geocoder API endpoint is hit. It is not designed to be serialized and reloaded - in fact much of its data is saved on disk in separate files. You can also exclude that class or field from the graph comparison, though I don't even understand how the Lucene index is getting initialized before you perform the graph comparison. It should still be null. That index is inside Graph.index which is marked transient so will not be automatically restored upon graph load.

abyrd · 2018-12-18T15:20:37Z

@pailakka please see OTP PR #2700 which should fix your TreeMap problem.

abyrd added 12 commits August 30, 2018 14:15

Kryo deserialization, #2643

c4275dd

Register some additional serializers to allow full round-trip

25833f9

Merge branch 'master' into kryo-serialization

a53edb0

revise object differ and serialization tests

98af5d2

The problem with the differ causing the test to fail was identified. It does not check whether the outermost objects to be compared are Collections or Maps.

Renaming identifiers, documentation. #2643.

e03c221

Break tests into smaller units, add documentation. #2643

296dcd4

Removed ObjectDiffer (now importing it from R5)

e0171fa

Also refactored Graph serialization tests a bit.

remove LoadLevel and custom street index services

9f5f4d9

compare two loaded copies of the same graph for good measure

bb8e75c

fix(build): Use release version of R5

e8e7c75

This pulls in a stable version of our testing framework for Kryo serialization.

abyrd requested a review from a team as a code owner November 22, 2018 06:29

abyrd added the Improvement label Nov 22, 2018

abyrd assigned t2gran and gmellemstrand Nov 22, 2018

abyrd added 2 commits November 22, 2018 14:31

Remove unused dependency

5f9b9de

Print differences before assertion that there are none

fcd8fe0

This test is failing on Travis CI but not locally.

abyrd added the WIP DO NOT MERGE label Nov 22, 2018

abyrd added 2 commits November 22, 2018 16:43

Update kryo serializer registration

fab1232

This makes it similar to the registration used in R5 - all primitive hashmaps are set to use their Externalizable implementation, unless we have a specific serializer for them.

Do not wrap BAOS in ObjectOutputStream

e28af2c

Apparently this was a necessary step for Java serialization, but is no longer relevant for Kryo serialization.

abyrd added 4 commits November 23, 2018 17:05

Ignore graph build time when checking serialization

7939acc

Clear cached time zone in Graph before checking serialization

616f722

The cached timezone in the graph is transient and lazy-initialized. Previous tests will sometimes cause a timezone to be cached. This seems to depend on non-deterministic test execution order.

Use patched r5 ObjectDiffer from snapshot

1f33b03

Use release version of r5 with fixed object differ

320cdc3

exclude ThreadPoolExecutor from graph comparison

4996b6d

abyrd removed the WIP DO NOT MERGE label Nov 29, 2018

gmellemstrand approved these changes Nov 29, 2018

View reviewed changes

abyrd requested review from fpurcell and removed request for fpurcell November 29, 2018 15:56

fpurcell approved these changes Dec 11, 2018

View reviewed changes

abyrd mentioned this pull request Dec 12, 2018

Object comparison fails on TreeMaps conveyal/r5#482

Closed

Merge branch 'master' into kryo-serialization

63ce942

abyrd merged commit 71a0b08 into master Dec 12, 2018

abyrd deleted the kryo-serialization branch December 12, 2018 07:25

This was referenced Dec 17, 2018

Object comparison fails on TreeMaps conveyal/kryo-tools#1

Closed

Remove R5 and gtfs-lib dependencies, using kryo-tools dependency #2700

Closed

hbruch mentioned this pull request Feb 29, 2020

Turn restrictions not respected due to (de)serialization issues #2991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Java serialization with Kryo #2681

Replace Java serialization with Kryo #2681

abyrd commented Nov 22, 2018 •

edited by fpurcell

Loading

abyrd commented Nov 22, 2018

abyrd commented Nov 23, 2018

abyrd commented Nov 23, 2018

abyrd commented Nov 23, 2018

gmellemstrand commented Nov 29, 2018

optionsome commented Nov 29, 2018

abyrd commented Nov 29, 2018

abyrd commented Nov 29, 2018

fpurcell commented Nov 30, 2018

abyrd commented Nov 30, 2018

fpurcell commented Nov 30, 2018

abyrd commented Nov 30, 2018

pailakka commented Dec 4, 2018

fpurcell left a comment

abyrd commented Dec 12, 2018

abyrd commented Dec 18, 2018

Replace Java serialization with Kryo #2681

Replace Java serialization with Kryo #2681

Conversation

abyrd commented Nov 22, 2018 • edited by fpurcell Loading

abyrd commented Nov 22, 2018

abyrd commented Nov 23, 2018

abyrd commented Nov 23, 2018

abyrd commented Nov 23, 2018

gmellemstrand commented Nov 29, 2018

optionsome commented Nov 29, 2018

abyrd commented Nov 29, 2018

abyrd commented Nov 29, 2018

fpurcell commented Nov 30, 2018

abyrd commented Nov 30, 2018

fpurcell commented Nov 30, 2018

abyrd commented Nov 30, 2018

pailakka commented Dec 4, 2018

fpurcell left a comment

Choose a reason for hiding this comment

abyrd commented Dec 12, 2018

abyrd commented Dec 18, 2018

abyrd commented Nov 22, 2018 •

edited by fpurcell

Loading