-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Java serialization with Kryo #2681
Conversation
This copies in a recusrive object differ from @csolem from Entur fork https://github.com/entur/OpenTripPlanner/tree/protostuff_poc I began adding some tests to the differ which show that it produces some false negatives (e.g. doesn't see differences in map values)
The problem with the differ causing the test to fail was identified. It does not check whether the outermost objects to be compared are Collections or Maps.
This enables reliable tests that serialization / deserialization perfectly restores a Graph. In some ways this simplifies the differ, but it also ensures that it performs a very deep recursive comparison of almost every object. Also added tests that can compare two references to the same object, which provide strong evidence that the differ actually works.
Also refactored Graph serialization tests a bit.
This pulls in a stable version of our testing framework for Kryo serialization.
This test is failing on Travis CI but not locally.
Some tests are still failing, specifically GraphSerializationTest.testRoundTrip (fails assertion) and GraphServiceTest (errors, no graphs loaded). I will need to fix these before we can really test this branch. |
This makes it similar to the registration used in R5 - all primitive hashmaps are set to use their Externalizable implementation, unless we have a specific serializer for them.
Apparently this was a necessary step for Java serialization, but is no longer relevant for Kryo serialization.
This is bizarre. The serialization round trip test fails on Travis CI (which uses Oracle Java 1.8_151) but I can't get it to fail locally even when running on exactly that JVM version. It builds with no errors and passes all tests, in both IntelliJ and under command line Maven, on both Oracle Java 1.8_151 and 1.8_192 (the most recent). I can however reproduce test failures on an EC2 instance running Amazon Linux 2, Maven 3.6.0, and OpenJDK 1.8.0_191. I get three mismatches: There may be good reasons for this - minute differences nested deep within internal details the OTP object graph, or unhandled edge cases in the comparisons. |
The cached timezone in the graph is transient and lazy-initialized. Previous tests will sometimes cause a timezone to be cached. This seems to depend on non-deterministic test execution order.
With latest commits to OTP and to the ObjectDiffer code in R5, build is now passing on both Travis CI and the EC2 + OpenJDK combination. NL graph on the EC2 machine, 30GB memory: 24 minutes to build. Comparing to OTP master on the same machine: 23 minutes to build. So this PR provides:
|
@t2gran @gmellemstrand this is now ready for review and testing. It seems to work well in my testing. |
We have tested this and all our tests pass. We are currently running it in production. |
Quickly tried to merge this into our fork but some tests were failing and didn't have time to look at it at that time. We will most likely take a better look at it in the near future and I don't yet know if all the issues are just specific to our fork but at least all the tests passed when I ran this branch on my computer. |
Thanks for the feedback @optionsome. If your fork adds any new classes to the graph, it's possible that they are not serialized correctly, or that they are just not compared correctly in the test. You might need to register a new serializer or add exceptions to the object tree comparator. If you get stuck at any point I may be able to provide guidance on how to get the comparator or serializer working with your data structures. |
@irees @fpurcell if you have a moment to spin up this PR, some corroborating test results would be helpful. Basically there's nothing special to do, just build a graph, load it, and make sure the routing is identical to previous versions. So theoretically could just mean running a batch of automated tests. |
Hey Andrew. I built a graph and ran the TriMet tests against your PR, and all tests passed: http://maps7.trimet.org/p/otp_report.html. So you got my 1/2 thumb up. Here are the artifacts from that build: http://maps7.trimet.org/p/ Take care, |
Thanks @fpurcell. I was hoping for a full thumbs up so the PR would be approved but if you aren't ready to give it we can wait on more input from others. I see that your build report begins with a reassuring "All tests are PASSING" in green, but there are seven lines that say "failed" in red. Are those failures normal? |
Hey Andrew. Sorry to confuse with my poor attempt at humor. I was thinking that Ian + myself building 2 graphs and running tests were equal to a 1 together. But that said, I can't do much more at the moment. Didn't look at the code changes I just tested. I do know, all tests passed..that's a big confidence boost. Take care, |
@fpurcell what I meant is that following our new process, we need two official Github PR approvals (via the green review changes button at the top right) before it will let us merge. So if you consider your tests a success, please give it an official approval. I also just wanted to make sure we were sure everything actually passed, because your report page says "fail" on it. But I'll take your word for it :) |
I tested this with out fork (same setup as @optionsome). It looks like that for some reason tje ObjectDiffer throws exception when comparing TreeMaps
additionally there is comparison differences especially with LuceneIndex in roundTrip test. I tried to revert our changes compared to upstream master but no joy. I'll try to find time to dig deeper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passes the TriMet test suite.
@pailakka looking at the lines cited in your stack trace, I see that the problem lies in the code that checks whether missing entries are represented the same way in both maps. This is only particularly important for Trove primitive maps, but I implemented it in a way that checks every kind of map, using an unlikely integer as a key. Because of type erasure in Java and the fact that the collections predate generics, the methods on maps will generally accept any object as a key even if they are keyed on some specific other class. This worked fine for all the Map implementations I tried, but it does not work for TreeMaps because they use a Comparator to establish the ordering of they keys, so all keys must be mutually comparable. I suppose I should switch to an alternate implementation in which the map wrappers include methods that fetch the representation of a missing value from their wrapped maps. Indeed I started doing that at one point but thought it more robust to use this more general technique, which would be impervious to bugs or omissions in the map wrapper classes. I'll make a ticket for this problem in the R5 library. In the mean time you can just exclude TreeMaps from comparisons in your tests, either by excluding the entire class or excluding TreeMap fields by name. The LuceneIndex was an experimental, unsupported feature, and is only lazy-initialized after the geocoder API endpoint is hit. It is not designed to be serialized and reloaded - in fact much of its data is saved on disk in separate files. You can also exclude that class or field from the graph comparison, though I don't even understand how the Lucene index is getting initialized before you perform the graph comparison. It should still be null. That index is inside Graph.index which is marked transient so will not be automatically restored upon graph load. |
To be completed by pull request submitter:
To be completed by @opentripplanner/plc: