-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create edge flags per edge instead of per way #2457
Conversation
…ficial tags for each edge
I got a few more numbers:
The artificial way tags are custom areas/countries mostly, but also date parsing and estimated distance. We see that adding more encoded values increases the import time by about 9-12% when we calculate everything per way (170s vs. ~155s) and around 12% when we calculate everything per segment (180s vs 159s). Calculating everything per segment results in only around 6% slow down even when all encoded values are used. If less encoded values are used the slow down is even less pronounced. Calculating the artificial way tags per segment is also only around 6% slower than not calculating them at all (this is for the default config with country lookups, but without any other custom areas!). |
Some results when using the planet file (car only, no elevation data)
The difference was 8 minutes or 9%. @karussell (or anyone else) WDYT? I feel like this speed up is not worth the complication. Separating |
GHPoint estimatedCenter = null; | ||
// todonow: we have to remove the tags again, because there can be multiple edges per way, but what if this was | ||
// not an artificial tag? | ||
way.removeTag("estimated_distance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we start using a gh:
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a bit ugly. The cleanest thing we can do is probably using a fresh copy of the ReaderWay
for each edge? And yes prefixing the tags to distinguish them from standard OSM tags also seems useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even "abuse" tags for this? Wouldn't it be better to have a clear distinction between OSM tags and our own metadata? e.g way.setMetadata()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But modifying the way
seems wrong even when we use a prefix or way.setMetadata()
, because the artificial tags are only valid for the current edge and the way object will be used for multiple edges. Should we store the artificial 'tags' in a separate map and pass this to handleWayTags
? So for the EncodingManager
, TagParser
s and FlagEncoder
we would have something like:
public IntsRef handleWayTags(ReaderWay way, Map<String, Object> edgeTags, AcceptWay acceptWay, IntsRef relationFlags)
?! Or should we use the gh:
prefix and copy the way object for each edge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider functions with more than 4 arguments to be code smell, to be honest. But I'm not sure if we should move all of them into a single wrapper class though. We could still make a distinction between "raw" data(ReaderWay) and processed data. Without having touched relation handling in GH: are the relationFlags associated with a single way or is this some shared datastructure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relation flags just represent the condensed information belonging to all the relations associated to a way. The only reason we use the IntsRef/flags for this information is saving memory during import. You can think of them as data that was already extracted from the source OSM data. If we used a wrapper class, why would you prefer separating the ReaderWay from the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a strong preference. I was just pointing out that we could do this if we want to clearly separate between raw OSM data and the results of GH parsing/processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if we could postpone the signature change (even if it was the right thing to do). We will have to change the signature for #2465 and also the node tag handling might change again. Currently the only artificial/edge tag we use that actually does occur in OSM data (according to taginfo) is country
and we already ignore/overwrite this in current master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's come back to this later.
My assumption here was that if properly mapped this would work for all cases :) but keeping this is a bit dirty. I would say let's make it simpler and optimize speed later if really required and major benefits (>30%) |
Apart from mapping errors yes. But the real problem are custom areas.
I agree. |
// Estimate length of ways containing a route tag e.g. for ferry speed calculation | ||
double firstLat = first.getLat(), firstLon = first.getLon(); | ||
double lastLat = last.getLat(), lastLon = last.getLon(); | ||
double firstLat = pointList.getLat(0), firstLon = pointList.getLon(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, because we calculate this estimated center we may choose another country (see https://discuss.graphhopper.com/t/country-of-way-is-wrong-on-road-near-border-with-curvature/6908). Wouldn't be a better idea to pick up a middle point (or close to is) as:
double middleLat = pointList.getLat(pointList.size() / 2), middleLon = pointList.getLon(pointList.size() / 2);
This way we make sure the selected point in on the edge/way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #2467 for this
So far we create edge flags (
EncodingManager/FlagEncoder#handleWayTags
) for each OSM way, even though a way might be split into multiple edges. This works, because the flags are the same for each edge the way is split into. However, once we consider elevation (like #2456 ) or node tags (like traffic signals for example) this will no longer work. We already haveEncodingManager/AbstractFlagEncoder#applyWayTags
for this reason, but this is only used for the flag encoders, not the encoded values so far. We also determine the custom areas and countries per way and not per edge. This works as long as ways stop at country borders, but for smaller custom areas this can already break. Imagine a longer way that crosses multiple custom areas but has to be associated with just one. Alsoestimated_distance
is rather imprecise when it is calculated per way, but applied for each edge.GraphHopper does not really use the concept of an OSM way, except during import of course. So the only reason we calculate the flags per way not per edge is performance. Running the import for Germany with this PR (where the edge flags are calculated for each edge) was around 5-10% slower than master. This can of course depend on which encoded values are calculated and how many custom areas there are. However, in case there are very many custom areas I would also assume the higher precision is wanted or even necessary anyway.
We could start separating encoded values that need to be calculated per edge from those that are calculated per way, or add something like
TagParser#applyWayTags
(like here: 92dae9e), but considering the small speed up to me it seems like this might not really be worth it. In case the custom area index lookup was the bottleneck there is still room to speed up the area lookup (use rasterization).I'll try to check the performance of this for a planet-wide import with many encoded values next.