New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph building hangs on 'Intersecting unconnected areas' on latest Master #1605

Closed
clinct opened this Issue Nov 10, 2014 · 30 comments

Comments

Projects
None yet
4 participants
@clinct
Collaborator

clinct commented Nov 10, 2014

Been trying to build the Dutch graph with the latest Master today, it hung for more then 40 minutes on "intersecting unconnected areas", while normally a graph build takes around 25 minutes on the same machine.

After reading through the last commits and checking-out different commits, the issue seems to be introduced with commit 9856f2f: "Fix #1562 - handle intersecting unconnected P+R."

I'm using the pbf from: http://download.geofabrik.de/europe/netherlands-latest.osm.pbf

Is this expected to take so long? Should I just keep it running? Or does it point to something going wrong?

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 10, 2014

It is not expected to be slow, this is probably just unexpected inefficiency. Some of these geometric algorithms (including the visibility graph algorithm) can break down with certain inputs. This code was added recently and seems to need some revision. With tomorrow's holiday, @laurentg is out of the office but we can probably improve it later in the week.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 11, 2014

It's not expected to be so slow, but I'm a bit afraid of the performance for large graph as the performance may not be O(n). To minimize memory usage we only store P+R areas in the spatial index, and then query for all ways the P+R areas intersecting them, thus one spatial index query per edge. Spatial index query may not be O(1) depending on the spatial index implementation.

If it's a problem you can safely disable L335 in OSMDatabase processUnconnectedAreas(); waiting for a proper fix/workaround.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 15, 2014

See #1617 on performance: my fears are overstated.

@abyrd abyrd closed this in #1617 Nov 17, 2014

abyrd added a commit that referenced this issue Nov 17, 2014

Merge pull request #1617 from laurentg/unconnected-P+R
Fix #1605: shared nodes between areas causing hang.

@abyrd abyrd reopened this Nov 20, 2014

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 20, 2014

This change is included in bugfix tag 0.12.1. @mattwigway reports that a New York graph build has been stuck overnight.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 20, 2014

Which extract is @mattwigway using, and which branch? I re-tested here with the whole of new-york state (http://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf from a few weeks ago) with the fix on my branch and it builds fine.

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 20, 2014

He is using tag 0.12.1. I assume that the OSM data he's using is similar to that NY extract... but it might have come out of Vanilla Extract. I hope we're not feeding OTP invalid input due to a bug or lacking feature in Vex. @mattwigway can you confirm?

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 20, 2014

Just to make sure I re-tested with the latest version from download.geofabrik.de (from a few minutes ago) and it builds fine. For the record intersecting takes 4.6 seconds on my laptop (total build time is 5.2 minutes). I tested on branch "unconnected-P+R", but everything in it is in master (https://github.com/laurentg/OpenTripPlanner-plannerstack/compare/unconnected-P%2BR)

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 20, 2014

@mattwigway has traced this to a specific way in Hackettstown, New Jersey: http://www.openstreetmap.org/way/305082958

So it should be reproducible with a New Jersey OSM extract.

The road already shares a node with its park and ride lot, but both the road and the lot exist in two perfectly overlaid copies. Therefore each copy of the road intersects its twin's copy of the park and ride lot without sharing a node, and the intersection perfectly coincides with an existing node. Presumably a zero-length split occurs at the end of a line segment, and this continues recursively.

While this is a pretty crazy situation, it seems that the same problem could arise with less unusual inputs. A road would just have to pass precisely over a node in a parking lot, or the road could have a node in exactly the same location as one in the parking lot without sharing it.

Matt is implementing an epsilon-distance check which will reuse an existing node if the split node is too close to the end of a line segment.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 20, 2014

It doesn't need an NJ OSM extract, a tiny slice will do: http://api.openstreetmap.org/api/0.6/map?bbox=-74.80558,40.84353,-74.80136,40.84541

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

So, I've just pushed initial work on this to the coincident_intersections branch. It definitely needs an automated test.

The way it works is that if one of the nodes in the way is close enough to the ring, it will just add that node to the ring directly, rather than creating a new node. If that node is already in the ring, it won't add it again (this is because we see every node in a way twice; once at the end of a segment and once at the start of the next). However, suppose a ring crosses a node in a way twice: that node probably should be added twice, but won't be. I don't see a good way to solve this. However, I think it's not a problem following discussion with @abyrd, because we believe the park-and-ride lot is condensed to a single node which is then connected to all of the access points (nodes that the P&R lot shares with other ways). If that is the case, it's better that we don't add duplicate nodes. @laurentg can you confirm that that is that case?

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

Excellent, thanks for the fix.

Otherwise another fix may be to simply remove i-- in L481, and dropping the functionnality of handling segment intersecting with two areas (or two segments of the same area). I think this is a rare case and even if we miss one intersection on the two that either may be ignored or fixed directly in OSM.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

Sorry, my mistake: the fix above is replacing i-- by i++ in L481, not simply removing it, as you also need to skip the newly created segment, which will be at index i+1 (just tested with your NJ extract, it works fine).

If we are OK to drop the functionality of handling segment intersecting twice, that's maybe a simpler solution. For corner cases where we have one segment intersecting two area segments, there is always the possibility of correcting OSM data (and anyway it will find a path with a single intersection).

Having a backtracking loop is always a bit dangerous, as we can see here. It's difficult to mathematically prove it won't loop.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

@mattwigway FYI if you want to add automated test you can expand the existing unit-test TestUnconnectedAreas.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

Hmmm, yes that is a much simpler solution. I'm torn; I like the idea of
supporting corner cases whenever possible, but I see that it's making the
code more complicated here. @abyrd how do you feel about this?

On Fri, Nov 21, 2014 at 5:34 AM, Laurent Grégoire notifications@github.com
wrote:

@mattwigway https://github.com/mattwigway FYI if you want to add
automated test you can expand the existing unit-test TestUnconnectedAreas.


Reply to this email directly or view it on GitHub
#1605 (comment)
.

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 21, 2014

@mattwigway noticed that the duplicate NJ lot appears only once in the output. I believe this is due to org.opentripplanner.graph_builder.impl.osm.OpenStreetMapGraphBuilderImpl.Handler#groupAreas and
org.opentripplanner.graph_builder.impl.osm.AreaGroup#groupAreas which combine any areas that share nodes and have the same level. These methods really need Javadoc, as does the AreaGroup class.

We also see in org.opentripplanner.graph_builder.impl.osm.OpenStreetMapGraphBuilderImpl.Handler#buildParkAndRideAreasForGroup:

 // Place the P+R at the center of the envelope
            ParkAndRideVertex parkAndRideVertex = new ParkAndRideVertex(graph, "P+R" + osmId,
                    "P+R_" + osmId, (envelope.getMinX() + envelope.getMaxX()) / 2,
                    (envelope.getMinY() + envelope.getMaxY()) / 2, creativeName);

This reduces the adjacent or overlapping group of areas to a single point, which represents the park and ride. Matt mentioned that this was not obvious to him at first when working on the problem, and I also had to go look it up in the code, so I think this behavior should be documented at least in the Javadoc for the function that contains it.

Finally, we realized that there can be a node in either the road or the lot (or both) which lies perfectly at the intersection, so all cases need to be considered.

I'm also fine with Laurent's solution since it seems to give decent results with less complexity. Matt is going to implement one or the other.

@mattwigway if you do decide to revise your approach, I notice that there is some repeated code:

if (checkIntersectionDistance(p, nA, epsilon)) { }
else if (checkIntersectionDistance(p, nB, epsilon)) { }

Especially if this block expands, perhaps this could be handled with a formulation like:

for (Node n : new Node[] {nA, nB}) {}

@abyrd abyrd assigned mattwigway and unassigned laurentg Nov 21, 2014

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

I personally use for (T t : Arrays.asList(t1, t2...)) { } for such loops. And with Java 8 lambdas there may be even cleaner solutions.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

I think I'm going to go with the simpler solution proposed by @laurentg, because this quickly turns into spaghetti code and I'm not convinced there is not another corner case.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

This does mean we get two P&R's in the output for the Hackettstown example, because things are no longer getting merged (because, since we're not inserting existing nodes, they don't share nodes anymore).

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

Oh hmm. That simple solution won't work with the Hackettstown example, because the order things come out of the spatial index is undefined, so sometimes on P&R gets all of the links, sometimes each gets some, etc.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

That's true, but in that particular case why not fixing the P+R in OSM directly? AFAIU the OSM mapping is wrong here. Trying to fix all incorrect OSM mapping automatically in OTP is complex, and is probably best handled by correcting the source data directly if there are only a few cases. The P+R code already print a warning if the P+R does not have at least one entrance and one exit for both pedestrian and car, so inaccessible P+R are flagged anyway.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

I agree in principle, but regardless of data quality OTP should perform in a deterministic manner (even if that deterministic manner is to fail).

There is a simple way to handle all of this. Instead of reconstructing the way segment as we are looping over it, we just save all of the ring segments that the way crosses as we go through the loop, and then reconstruct once the loop is done. A way segment can cross a ring segment exactly once, since they're both line segments, so we will only ever add one point per ring segment. I'm not sure what will happen if there is an edge in the P&R that is coincident with an edge in the way, but can find out (that edge in the way will intersect the ring segments at the start and the end as well as the coincident one, and the intersection with the coincident one is undefined).

This is true even in spherical geometry, because a line segment between antipodal points is undefined, and any other line segment must be shorter than 180 degrees. Any two great circles cross exactly twice, 180 degrees apart. (We're not using spherical geometry anyhow, but I'm a geographer :).

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

If you loop first and intersect after, you still have to cover the case where a split segment intersect again another segment (either from the same ring or from another ring). Each split remove 2 segments and create 4 new ones. The problem is that there are many scenario: one segment way intersecting many ring segments, multiple segment way intersecting a single ring segment, etc...

I definitively think we should focus on something simple that work 99% of the time and provide a good default on the remaining case, even if someone has to tweak OSM data to get decent results. Handling a single way segment intersecting multiple P+R segments is not very common.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

@laurentg I was thinking that you would just split the way n times for the n segments you have. So if a way segment intersects 5 ring segments, you create 5 virtual nodes, and then advance the loop far enough to skip all of the inserted virtual nodes.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 21, 2014

@mattwigway That could work indeed. In that case you need to order all the intersections for the current way segment (based on their distance to the base point), eliminate the duplicates, and insert new split segments in order. That's a bit more complicated than the actual code, as you need to store all potential intersections first (with the corresponding ring segment), and handle potential merge.

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

@laurentg when you eliminate the duplicates, I'm not sure what you mean; there should not be duplicate ring segments coming out of the spatial index should there?

mattwigway added a commit that referenced this issue Nov 21, 2014

mattwigway added a commit that referenced this issue Nov 21, 2014

mattwigway added a commit that referenced this issue Nov 21, 2014

Revert "simpler fix for #1605, thanks @laurentg for the suggestion."
This fix is too simple and is nondeterministic with certain input.

This reverts commit 82d277f.
@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

I've implemented this (here) but I'm still having trouble with nondeterminism; sometimes I get two park and rides out and sometimes only one, which is confusing me.

@abyrd

This comment has been minimized.

Member

abyrd commented Nov 21, 2014

@mattwigway keep in mind that this OSM data is completely wrong and very unusual. We want to produce something that works for routing if possible (be tolerant of sketchy input) but it's impossible to make OTP automatically clean all possible OSM or GTFS input.

The most important thing is to detect and log this odd situation where the intersections of two line segments perfectly coincides with an existing vertex in one of the two ways. That way it can be examined and fixed in OSM (it is almost guaranteed to be a mistake when this happens).

I'm not even sure it's important for OTP to do something deterministic with such bad input data. And also consider that any determinism you obtain might be somewhat illusory -- we're doing geometric operations on float coordinates that have been rounded and converted a few times...

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 21, 2014

I believe I have actually solved this now, and implemented tests for the corner cases. However, I want to look at it again Monday morning to ensure that everything still makes sense. I went back to the original snapping strategy, and kept the control flow flat (no for statements to avoid duplicated code) as it makes the code easier to follow. The amount of duplicated code remains minimal.

@laurentg

This comment has been minimized.

Member

laurentg commented Nov 22, 2014

@mattwigway When I said "remove duplicates" I meant merging intersections which are close to each other. I was not talking about duplicates coming from the spatial index, there should be none indeed.

But to be honest, I agree with Andrew, in that particular case the OSM mapping is plainly wrong and would take 5 minutes to fix in the source. I'm not sure we should complexify the code for handling such corner cases. The important thing is to detect and print possibly wrong mapping and unsupported cases (for example checking if two P+R are very close to each other). There aren't much: so far we had 1 in the whole of Netherlands (Sloterdijk P+R), 0 in NY state and probably only 1 in NJ (this case).

@mattwigway

This comment has been minimized.

Member

mattwigway commented Nov 24, 2014

I've encapsulated the broken Hackettstown OSM into the repo for an automated test, but I went ahead and fixed the issue in the OSM master database: http://www.openstreetmap.org/changeset/27000499 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment