New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph building hangs on 'Intersecting unconnected areas' on latest Master #1605
Comments
It is not expected to be slow, this is probably just unexpected inefficiency. Some of these geometric algorithms (including the visibility graph algorithm) can break down with certain inputs. This code was added recently and seems to need some revision. With tomorrow's holiday, @laurentg is out of the office but we can probably improve it later in the week. |
It's not expected to be so slow, but I'm a bit afraid of the performance for large graph as the performance may not be O(n). To minimize memory usage we only store P+R areas in the spatial index, and then query for all ways the P+R areas intersecting them, thus one spatial index query per edge. Spatial index query may not be O(1) depending on the spatial index implementation. If it's a problem you can safely disable L335 in OSMDatabase |
See #1617 on performance: my fears are overstated. |
Fix #1605: shared nodes between areas causing hang.
This change is included in bugfix tag 0.12.1. @mattwigway reports that a New York graph build has been stuck overnight. |
Which extract is @mattwigway using, and which branch? I re-tested here with the whole of new-york state (http://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf from a few weeks ago) with the fix on my branch and it builds fine. |
He is using tag 0.12.1. I assume that the OSM data he's using is similar to that NY extract... but it might have come out of Vanilla Extract. I hope we're not feeding OTP invalid input due to a bug or lacking feature in Vex. @mattwigway can you confirm? |
Just to make sure I re-tested with the latest version from download.geofabrik.de (from a few minutes ago) and it builds fine. For the record intersecting takes 4.6 seconds on my laptop (total build time is 5.2 minutes). I tested on branch "unconnected-P+R", but everything in it is in master (https://github.com/laurentg/OpenTripPlanner-plannerstack/compare/unconnected-P%2BR) |
@mattwigway has traced this to a specific way in Hackettstown, New Jersey: http://www.openstreetmap.org/way/305082958 So it should be reproducible with a New Jersey OSM extract. The road already shares a node with its park and ride lot, but both the road and the lot exist in two perfectly overlaid copies. Therefore each copy of the road intersects its twin's copy of the park and ride lot without sharing a node, and the intersection perfectly coincides with an existing node. Presumably a zero-length split occurs at the end of a line segment, and this continues recursively. While this is a pretty crazy situation, it seems that the same problem could arise with less unusual inputs. A road would just have to pass precisely over a node in a parking lot, or the road could have a node in exactly the same location as one in the parking lot without sharing it. Matt is implementing an epsilon-distance check which will reuse an existing node if the split node is too close to the end of a line segment. |
It doesn't need an NJ OSM extract, a tiny slice will do: http://api.openstreetmap.org/api/0.6/map?bbox=-74.80558,40.84353,-74.80136,40.84541 |
So, I've just pushed initial work on this to the coincident_intersections branch. It definitely needs an automated test. The way it works is that if one of the nodes in the way is close enough to the ring, it will just add that node to the ring directly, rather than creating a new node. If that node is already in the ring, it won't add it again (this is because we see every node in a way twice; once at the end of a segment and once at the start of the next). However, suppose a ring crosses a node in a way twice: that node probably should be added twice, but won't be. I don't see a good way to solve this. However, I think it's not a problem following discussion with @abyrd, because we believe the park-and-ride lot is condensed to a single node which is then connected to all of the access points (nodes that the P&R lot shares with other ways). If that is the case, it's better that we don't add duplicate nodes. @laurentg can you confirm that that is that case? |
Excellent, thanks for the fix. Otherwise another fix may be to simply remove |
Sorry, my mistake: the fix above is replacing If we are OK to drop the functionality of handling segment intersecting twice, that's maybe a simpler solution. For corner cases where we have one segment intersecting two area segments, there is always the possibility of correcting OSM data (and anyway it will find a path with a single intersection). Having a backtracking loop is always a bit dangerous, as we can see here. It's difficult to mathematically prove it won't loop. |
@mattwigway FYI if you want to add automated test you can expand the existing unit-test |
Hmmm, yes that is a much simpler solution. I'm torn; I like the idea of On Fri, Nov 21, 2014 at 5:34 AM, Laurent Grégoire notifications@github.com
|
@mattwigway noticed that the duplicate NJ lot appears only once in the output. I believe this is due to org.opentripplanner.graph_builder.impl.osm.OpenStreetMapGraphBuilderImpl.Handler#groupAreas and We also see in org.opentripplanner.graph_builder.impl.osm.OpenStreetMapGraphBuilderImpl.Handler#buildParkAndRideAreasForGroup: // Place the P+R at the center of the envelope
ParkAndRideVertex parkAndRideVertex = new ParkAndRideVertex(graph, "P+R" + osmId,
"P+R_" + osmId, (envelope.getMinX() + envelope.getMaxX()) / 2,
(envelope.getMinY() + envelope.getMaxY()) / 2, creativeName); This reduces the adjacent or overlapping group of areas to a single point, which represents the park and ride. Matt mentioned that this was not obvious to him at first when working on the problem, and I also had to go look it up in the code, so I think this behavior should be documented at least in the Javadoc for the function that contains it. Finally, we realized that there can be a node in either the road or the lot (or both) which lies perfectly at the intersection, so all cases need to be considered. I'm also fine with Laurent's solution since it seems to give decent results with less complexity. Matt is going to implement one or the other. @mattwigway if you do decide to revise your approach, I notice that there is some repeated code: if (checkIntersectionDistance(p, nA, epsilon)) { }
else if (checkIntersectionDistance(p, nB, epsilon)) { } Especially if this block expands, perhaps this could be handled with a formulation like: for (Node n : new Node[] {nA, nB}) {} |
I personally use |
I think I'm going to go with the simpler solution proposed by @laurentg, because this quickly turns into spaghetti code and I'm not convinced there is not another corner case. |
This does mean we get two P&R's in the output for the Hackettstown example, because things are no longer getting merged (because, since we're not inserting existing nodes, they don't share nodes anymore). |
Oh hmm. That simple solution won't work with the Hackettstown example, because the order things come out of the spatial index is undefined, so sometimes on P&R gets all of the links, sometimes each gets some, etc. |
That's true, but in that particular case why not fixing the P+R in OSM directly? AFAIU the OSM mapping is wrong here. Trying to fix all incorrect OSM mapping automatically in OTP is complex, and is probably best handled by correcting the source data directly if there are only a few cases. The P+R code already print a warning if the P+R does not have at least one entrance and one exit for both pedestrian and car, so inaccessible P+R are flagged anyway. |
I agree in principle, but regardless of data quality OTP should perform in a deterministic manner (even if that deterministic manner is to fail). There is a simple way to handle all of this. Instead of reconstructing the way segment as we are looping over it, we just save all of the ring segments that the way crosses as we go through the loop, and then reconstruct once the loop is done. A way segment can cross a ring segment exactly once, since they're both line segments, so we will only ever add one point per ring segment. I'm not sure what will happen if there is an edge in the P&R that is coincident with an edge in the way, but can find out (that edge in the way will intersect the ring segments at the start and the end as well as the coincident one, and the intersection with the coincident one is undefined). This is true even in spherical geometry, because a line segment between antipodal points is undefined, and any other line segment must be shorter than 180 degrees. Any two great circles cross exactly twice, 180 degrees apart. (We're not using spherical geometry anyhow, but I'm a geographer :). |
If you loop first and intersect after, you still have to cover the case where a split segment intersect again another segment (either from the same ring or from another ring). Each split remove 2 segments and create 4 new ones. The problem is that there are many scenario: one segment way intersecting many ring segments, multiple segment way intersecting a single ring segment, etc... I definitively think we should focus on something simple that work 99% of the time and provide a good default on the remaining case, even if someone has to tweak OSM data to get decent results. Handling a single way segment intersecting multiple P+R segments is not very common. |
@laurentg I was thinking that you would just split the way n times for the n segments you have. So if a way segment intersects 5 ring segments, you create 5 virtual nodes, and then advance the loop far enough to skip all of the inserted virtual nodes. |
@mattwigway That could work indeed. In that case you need to order all the intersections for the current way segment (based on their distance to the base point), eliminate the duplicates, and insert new split segments in order. That's a bit more complicated than the actual code, as you need to store all potential intersections first (with the corresponding ring segment), and handle potential merge. |
@laurentg when you eliminate the duplicates, I'm not sure what you mean; there should not be duplicate ring segments coming out of the spatial index should there? |
This fix is too simple and is nondeterministic with certain input. This reverts commit 82d277f.
I've implemented this (here) but I'm still having trouble with nondeterminism; sometimes I get two park and rides out and sometimes only one, which is confusing me. |
@mattwigway keep in mind that this OSM data is completely wrong and very unusual. We want to produce something that works for routing if possible (be tolerant of sketchy input) but it's impossible to make OTP automatically clean all possible OSM or GTFS input. The most important thing is to detect and log this odd situation where the intersections of two line segments perfectly coincides with an existing vertex in one of the two ways. That way it can be examined and fixed in OSM (it is almost guaranteed to be a mistake when this happens). I'm not even sure it's important for OTP to do something deterministic with such bad input data. And also consider that any determinism you obtain might be somewhat illusory -- we're doing geometric operations on float coordinates that have been rounded and converted a few times... |
I believe I have actually solved this now, and implemented tests for the corner cases. However, I want to look at it again Monday morning to ensure that everything still makes sense. I went back to the original snapping strategy, and kept the control flow flat (no for statements to avoid duplicated code) as it makes the code easier to follow. The amount of duplicated code remains minimal. |
@mattwigway When I said "remove duplicates" I meant merging intersections which are close to each other. I was not talking about duplicates coming from the spatial index, there should be none indeed. But to be honest, I agree with Andrew, in that particular case the OSM mapping is plainly wrong and would take 5 minutes to fix in the source. I'm not sure we should complexify the code for handling such corner cases. The important thing is to detect and print possibly wrong mapping and unsupported cases (for example checking if two P+R are very close to each other). There aren't much: so far we had 1 in the whole of Netherlands (Sloterdijk P+R), 0 in NY state and probably only 1 in NJ (this case). |
I've encapsulated the broken Hackettstown OSM into the repo for an automated test, but I went ahead and fixed the issue in the OSM master database: http://www.openstreetmap.org/changeset/27000499 . |
Been trying to build the Dutch graph with the latest Master today, it hung for more then 40 minutes on "intersecting unconnected areas", while normally a graph build takes around 25 minutes on the same machine.
After reading through the last commits and checking-out different commits, the issue seems to be introduced with commit 9856f2f: "Fix #1562 - handle intersecting unconnected P+R."
I'm using the pbf from: http://download.geofabrik.de/europe/netherlands-latest.osm.pbf
Is this expected to take so long? Should I just keep it running? Or does it point to something going wrong?
The text was updated successfully, but these errors were encountered: