Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create edge flags per edge instead of per way #2457

Merged
merged 16 commits into from
Nov 18, 2021

Conversation

easbar
Copy link
Member

@easbar easbar commented Nov 3, 2021

So far we create edge flags (EncodingManager/FlagEncoder#handleWayTags) for each OSM way, even though a way might be split into multiple edges. This works, because the flags are the same for each edge the way is split into. However, once we consider elevation (like #2456 ) or node tags (like traffic signals for example) this will no longer work. We already have EncodingManager/AbstractFlagEncoder#applyWayTags for this reason, but this is only used for the flag encoders, not the encoded values so far. We also determine the custom areas and countries per way and not per edge. This works as long as ways stop at country borders, but for smaller custom areas this can already break. Imagine a longer way that crosses multiple custom areas but has to be associated with just one. Also estimated_distance is rather imprecise when it is calculated per way, but applied for each edge.

GraphHopper does not really use the concept of an OSM way, except during import of course. So the only reason we calculate the flags per way not per edge is performance. Running the import for Germany with this PR (where the edge flags are calculated for each edge) was around 5-10% slower than master. This can of course depend on which encoded values are calculated and how many custom areas there are. However, in case there are very many custom areas I would also assume the higher precision is wanted or even necessary anyway.

We could start separating encoded values that need to be calculated per edge from those that are calculated per way, or add something like TagParser#applyWayTags (like here: 92dae9e), but considering the small speed up to me it seems like this might not really be worth it. In case the custom area index lookup was the bottleneck there is still room to speed up the area lookup (use rasterization).

I'll try to check the performance of this for a planet-wide import with many encoded values next.

@easbar
Copy link
Member Author

easbar commented Nov 6, 2021

I got a few more numbers:

map encoded_values edge flags artificial tags import time
Germany default per way per way 151-157s
Germany all per way per way 170s
Germany default per segment per segment 159s
Germany all per segment per segment 180s
Germany default per segment disabled 150s
Germany all per segment disabled 170s

encoded_values=all means I disabled all encoded values listed in config-example.yml (except mtb_rating because it was broken (#2458)). encoded_values=default means the graph.encoded_values setting was empty (road_class,road_class_link,road_environment,max_speed,roundabout,road_access).

The artificial way tags are custom areas/countries mostly, but also date parsing and estimated distance.

We see that adding more encoded values increases the import time by about 9-12% when we calculate everything per way (170s vs. ~155s) and around 12% when we calculate everything per segment (180s vs 159s). Calculating everything per segment results in only around 6% slow down even when all encoded values are used. If less encoded values are used the slow down is even less pronounced.

Calculating the artificial way tags per segment is also only around 6% slower than not calculating them at all (this is for the default config with country lookups, but without any other custom areas!).

@easbar
Copy link
Member Author

easbar commented Nov 8, 2021

Some results when using the planet file (car only, no elevation data)

map encoded_values edge flags artificial tags import time
planet default per way per way 5243s (1h27min)
planet default per segment per segment 5724s (1h35min)

The difference was 8 minutes or 9%. @karussell (or anyone else) WDYT? I feel like this speed up is not worth the complication. Separating handleWayTags (=edge flags per way) and applyWayTags (=edge flags per segment) might be ok, but assigning custom areas based on the center of the OSM way is ugly (even though it probably works for countries in most cases). Another possibility might be splitting ways at custom area borders, but that seems more complicated to me and maybe it does not even perform any better.

GHPoint estimatedCenter = null;
// todonow: we have to remove the tags again, because there can be multiple edges per way, but what if this was
// not an artificial tag?
way.removeTag("estimated_distance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we start using a gh: prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a bit ugly. The cleanest thing we can do is probably using a fresh copy of the ReaderWay for each edge? And yes prefixing the tags to distinguish them from standard OSM tags also seems useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even "abuse" tags for this? Wouldn't it be better to have a clear distinction between OSM tags and our own metadata? e.g way.setMetadata()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But modifying the way seems wrong even when we use a prefix or way.setMetadata(), because the artificial tags are only valid for the current edge and the way object will be used for multiple edges. Should we store the artificial 'tags' in a separate map and pass this to handleWayTags? So for the EncodingManager, TagParsers and FlagEncoder we would have something like:

public IntsRef handleWayTags(ReaderWay way, Map<String, Object> edgeTags, AcceptWay acceptWay, IntsRef relationFlags)

?! Or should we use the gh: prefix and copy the way object for each edge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider functions with more than 4 arguments to be code smell, to be honest. But I'm not sure if we should move all of them into a single wrapper class though. We could still make a distinction between "raw" data(ReaderWay) and processed data. Without having touched relation handling in GH: are the relationFlags associated with a single way or is this some shared datastructure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relation flags just represent the condensed information belonging to all the relations associated to a way. The only reason we use the IntsRef/flags for this information is saving memory during import. You can think of them as data that was already extracted from the source OSM data. If we used a wrapper class, why would you prefer separating the ReaderWay from the others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a strong preference. I was just pointing out that we could do this if we want to clearly separate between raw OSM data and the results of GH parsing/processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we could postpone the signature change (even if it was the right thing to do). We will have to change the signature for #2465 and also the node tag handling might change again. Currently the only artificial/edge tag we use that actually does occur in OSM data (according to taginfo) is country and we already ignore/overwrite this in current master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's come back to this later.

@karussell
Copy link
Member

karussell commented Nov 9, 2021

(even though it probably works for countries in most cases).

My assumption here was that if properly mapped this would work for all cases :) but keeping this is a bit dirty.

I would say let's make it simpler and optimize speed later if really required and major benefits (>30%)

@easbar
Copy link
Member Author

easbar commented Nov 10, 2021

My assumption here was that if properly mapped this would work for all cases :)

Apart from mapping errors yes. But the real problem are custom areas.

I would say let's make it simpler and optimize speed later if really required and major benefits (>30%)

I agree.

// Estimate length of ways containing a route tag e.g. for ferry speed calculation
double firstLat = first.getLat(), firstLon = first.getLon();
double lastLat = last.getLat(), lastLon = last.getLon();
double firstLat = pointList.getLat(0), firstLon = pointList.getLon(0);
Copy link
Contributor

@cosmin-petrescu cosmin-petrescu Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, because we calculate this estimated center we may choose another country (see https://discuss.graphhopper.com/t/country-of-way-is-wrong-on-road-near-border-with-curvature/6908). Wouldn't be a better idea to pick up a middle point (or close to is) as:

double middleLat = pointList.getLat(pointList.size() / 2), middleLon = pointList.getLon(pointList.size() / 2);

This way we make sure the selected point in on the edge/way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #2467 for this

@easbar easbar marked this pull request as ready for review November 17, 2021 11:39
@easbar easbar merged commit 69b412d into master Nov 18, 2021
@easbar easbar deleted the artificial_tags_per_segment branch November 18, 2021 18:17
easbar added a commit that referenced this pull request Nov 18, 2021
@easbar easbar added this to the 5.0 milestone Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants