Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order placenames by population and substitute NE usage on low zoom level... #1461

Merged
merged 2 commits into from
Sep 16, 2015

Conversation

sommerluk
Copy link
Collaborator

Order placenames by population and substitute NE usage on low zoom levels.

Resolves #50
Resolves #255
Resolves #482
Resolves #1083
Resolves #1505

It tries to order cities and towns by population. This would improve the rendering regarding which label has to be supressed in case of collision.

It substitutes Natural Earth populated places labels on low zoom levels (3–4). It uses OSM data instead, making a choise based on population=*. It uses the same font color at 3–4 like on the higher zoom levels.

It tries to guarantie that once a label has been rendered at a specific zoom level, it is not supressed at higher zoom levels by other labels of start rendering only at higher zoom levels (as far as possible).

It tries to preserve more or less the current placenames density.

@matkoniecz
Copy link
Contributor

Can you provide before/after comparison?

@imagico
Copy link
Collaborator

imagico commented Mar 23, 2015

Regarding ordering by population - this might be a good idea but would need some examples.

Regarding replacing the NE data with a high population cutoff - bad idea. I think this has been discussed before, importance rating of places does not work well based on population alone, you would need to take into account the surrounding, a huge city next to even bigger cities should not be rendered (think of the US east coast for example) while a medium size city with nothing comparable around for hundreds of kilometers should (think of Anchorage, Alaska or Murmansk, Russia).

@sommerluk
Copy link
Collaborator Author

Here comes a comparision for z=3.

screen3

(But I don’t have a complete planet in the database. Only place=city here, with the country names missing.)

In general, for z3 and z4, where “Natural Earth” is substituted, this means less city labels in Europe and North America and more city labels in Asia and Africa. The threshold can be tweaked…

On the higher zoom levels it changes mostly which label will be supressed in case of collision.

WHEN (population ~ '^[0-9]+$') THEN CAST(population AS integer)
WHEN (place = 'city') THEN 100000
ELSE 10000
END AS population,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use INT_population here or something like that, otherwise it's very easy to get confused.

@kocio-pl
Copy link
Collaborator

On low zoom levels I'd like to see countries/states and capitals, no matter how much people live there. Population is one of the most important hints for rendering when we have conflicting name labels (see #337).

@sommerluk
Copy link
Collaborator Author

@imagico

a huge city next to even bigger cities should not be rendered (think of the US east coast for example) while a medium size city with nothing comparable around for hundreds of kilometers should

Maybe we can get closer to the desired behavior by raising text-min-distance and at the same time render places with lower populations earlier. So, we get more collisions, and these collisions can be solved e.g. based on the population=* tag.

However, I think that Natural Earth does a quite bad job regarding your criteria. It displays a lot of labels in very small continent which is called “Europe”, and it displays less labels in a much bigger continent which is called “Africa”. So, I would say we don’t loose anything.

@imagico

In #337 you have proposed to use a score value:

For example you could use the population as score and multiply it with a factor of for example three in case it is a national capital. This would mean a larger city is only rendered with preference if its population is more than three times that of the capital.

Currently, we treat not only national capitals (“capital=yes”), but also regional capitals of a givel level (“capital=4”) in a special way. So, maybe we could do this:

national capital → population × 3
regional capital → population × 2
all others → population × 1

What do you think about?

Currently, we have also various layers — in this order:

↗ Natural Earth for z3-z4
↗ a layer for the capitals
↗ a layer for the rest of cities and town (with different font sizes for cities and towns at most zoom levels)

If we would adopt the solution with the score: Could we even use only one layer (with different font sizes based on the score)?

@pnorman
Copy link
Collaborator

pnorman commented Mar 24, 2015

However, I think that Natural Earth does a quite bad job regarding your criteria. It displays a lot of labels in very small continent which is called “Europe”, and it displays less labels in a much bigger continent which is called “Africa”. So, I would say we don’t loose anything.

That is a result of the cutoffs we have set for rendering. It would be fairly trivial to display more cities, and in a sensible rendering order.

@imagico
Copy link
Collaborator

imagico commented Mar 24, 2015

However, I think that Natural Earth does a quite bad job regarding your criteria.

Indeed - this is one of the reasons it should be replaced. My favorite example is:

Miami: population 415000, shown
Havana: population 2.1 million, capital of Cuba, not shown

But i would be against using this as an excuse to take the easy way and choose an alternative approach nearly as bad while better methods exist.

If you can use Mapnik's collision detection to do the necessary proximity processing that would be fine with me - i am not sure if this is workable though. Because you probably want to show some labels even if they are close together (like New York and Washington DC) but do not want the same density of labels globally. On the other hand at z=3/4 rendering performance is not an issue so you could reasonably use even an expensive SQL query to do the necessary processing.

Currently, we treat not only national capitals (“capital=yes”), but also regional capitals of a givel level (“capital=4”) in a special way. So, maybe we could do this:

national capital → population × 3
regional capital → population × 2
all others → population × 1

Whatever formula you use - it will require some tuning. This is best done with fast interactive feedback, i.e. not with the rendered map.

The more important questions you would need to consider is how you treat populated places with no population tag. You could use the minimum number of a settlement of this class of course (i.e. city -> 100k)

And yes, if you render the low zoom levels based on a score calculated directly from attributes you could use the same layer for all zoom levels. It might be somewhat inefficient w.r.t. rendering performance though.

@sommerluk
Copy link
Collaborator Author

You could use the minimum number of a settlement of this class of course (i.e. city -> 100k)

Yes. The current code of this PR uses 100 000 as default for place=city and 10 000 as default for place=town. The default is used when either there is no value or if the value is not valid (contains non-digit characters).

[I suppose I should catch special cases of wrong tagging like population=123456789123456789123456789 in the regular expression. The current regular expression considers them as valid, but they are too big for the integer datatype, and this leads to an SQL error.]

@sommerluk
Copy link
Collaborator Author

This is best done with fast interactive feedback, i.e. not with the rendered map.

Have you something special in mind?

@pnorman
Copy link
Collaborator

pnorman commented Mar 24, 2015

Indeed - this is one of the reasons it should be replaced. My favorite example is:

Miami: population 415000, shown
Havana: population 2.1 million, capital of Cuba, not shown

What class are the two cities in NE?

@imagico
Copy link
Collaborator

imagico commented Mar 24, 2015

@sommerluk - you can do that in QGIS for example.

@pnorman - SCALERANK is 1 for Miami and 2 for Havana.

@matthijsmelissen
Copy link
Collaborator

INSTALL.md would also need to be updated.

@pnorman
Copy link
Collaborator

pnorman commented Mar 26, 2015

Can you do a before/after out of the same database?

@dieterdreist
Copy link

Am 24.03.2015 um 20:44 schrieb imagico notifications@github.com:

You could use the minimum number of a settlement of this class of course (i.e. city -> 100k)

You would need regional min numbers though, in China a place with 100k is likely not considered a city

@sommerluk
Copy link
Collaborator Author

@math1985

INSTALL.md would also need to be updated.

And also get-shapefiles.sh

@sommerluk
Copy link
Collaborator Author

@imagico

Maybe we can get closer to the desired behavior by raising text-min-distance and at the same time render places with lower populations earlier. So, we get more collisions, and these collisions can be solved e.g. based on the population=* tag.

If you can use Mapnik's collision detection to do the necessary proximity processing that would be fine with me - i am not sure if this is workable though. Because you probably want to show some labels even if they are close together (like New York and Washington DC) but do not want the same density of labels globally.

I’ve tried it. I had raised text-min-distance. But it doesn’t work well. Imagine the following situation:

Big city

Medium city
Small city

At low zoom levels, “Big city” will hide “Medium city” because it’s to close to “Big city”. However, “Small city” is displayed. At higher zoom levels, the distance between “Big city” and “Medium city” will be enough to display both, but the side effect is that “Small city” can’t be rendered anymore because it’s to close to the – now visible – “Medium city”. Result: “Small city” is e.g visible at z=4, not visible at z5–6, and visible again starting with z=7. I suppose that there will always be some of these inconsistent cases. But when I tested it, there were much more of these inconsistent cases when text-min-distance was raised.

@sommerluk
Copy link
Collaborator Author

I’ve updated the code. Now, it’s using a score – as proposed by @imagico

@pnorman

Here are some screenshots of the updated code (left: current behaviour / right: behaviour of this PR). They are made out of a limited database with only the city nodes.

zoom3

zoom4africa

zoom4asia

zoom4europe

zoom5africa

zoom5europe

zoom6africa

zoom6europe

@sommerluk
Copy link
Collaborator Author

Maybe the value should be tweaked, so that we have more labels at z6 (and maybe also z5).

Should there be a third font size for placenames-medium (current PR code: only 2 font sizes: high-score and low-score)?

@matthijsmelissen
Copy link
Collaborator

Current results look good to me, as far as I'm concerned this can be merged.

@imagico
Copy link
Collaborator

imagico commented Apr 27, 2015

Just wanted to mention that this change will likely increase the tendency to tag for the renderer with place classification and population numbers. This is already an issue right now (see for example the number of obvious bogus points in http://taginfo.openstreetmap.org/tags/place=city#map)

A possible way to slightly counteract this would be to create an upper limit for population of place=town so pushing a small place significantly would require manipulating both population and place classification.

@dieterdreist
Copy link

2015-04-27 12:33 GMT+02:00 imagico notifications@github.com:

A possible way to slightly counteract this would be to create an upper
limit for population of place=town so pushing a small place significantly
would require manipulating both population and place classification.

this should not be done (or had to be implemented in a more complex way),
because population and place importance / classes are not the same
worldwide, e.g. China or India tend to have more population in less
important settlements compared to Europe.

@sommerluk
Copy link
Collaborator Author

We could change the ORDER BY statement, so that first criterion is place=city/place=town, and the second criterion is the score that we have calculated. So place=city would have always a higher priority than place=town. Within each group, the elements would be orderd by score. Advandage: This approach would avoid to use a hard-coded threshold value and should be comprehensible for mappers (they get what they expect). (Minor) inconsistence: A city of 110 000 peoples without capital=* would have a higher priority than a town of 90 000 peoples with capital=*…

@imagico
Copy link
Collaborator

imagico commented May 1, 2015

This would avoid prioritizing a town with high population number over a city with lower one but the town would still have a high score and would be shown early if there are no competing cities.

And i disagree with @dieterdreist - place classification is not an importance rating, the wiki bases it primarily on the function of the place for the people around it and this is very similar globally for places with a certain number of inhabitants. The population ranges of places with the various tags in different regions seem more a matter of local tagging habits than of actual differences in reality. So i think limiting the population of place=town to something like 200k in score calculation does seem appropriate.

@pnorman
Copy link
Collaborator

pnorman commented May 2, 2015

We should be rendering city before town, even if the city has a lower population. Capital might change that, but that's a cartographic decision which I've seen done both ways

@sommerluk
Copy link
Collaborator Author

This PR now renders cities always earlier than towns.

Exception: Towns that are capitals of a country may be treated like cities if their score is high enough.

Within each group, the elements are ordered by score.

@pnorman
Copy link
Collaborator

pnorman commented May 14, 2015

I did a proper comparison for Europe, with full data loaded so there are the normal label collisions.

I was mainly interested in how it stacked up against Natural Earth. I'll do some more for higher zooms where it's already using OSM data

z3
3
3
4 vs 11 cities

z4
4
4
3 vs 6 cities

@matkoniecz
Copy link
Contributor

What is causing the difference in status of United Kingdom label on z3 and label for Poland on z4? It seems that this PR should affect only display of cities (and features colliding with city labels).

@sommerluk sommerluk force-pushed the lowzoomcities10 branch 2 times, most recently from ed12e1e to 22b3ede Compare September 11, 2015 21:37
@sommerluk
Copy link
Collaborator Author

Sorry for the late response, my hard disk was broken and I could not test further things.

About the issue with New York/Brookshaven: I have no idea why this happens. Maybe a data issue? I’ve tried to reproduce it with my cities-and-town-only database, and also with new-york-latest.osm.pbf (from last week) from Geofabrik. With both, I could not reproduce this issue.

I’ve tuned now z6 to be more similar to the current situation on z6.

@sommerluk
Copy link
Collaborator Author

Solves #1083

@matkoniecz
Copy link
Contributor

About the issue with New York/Brookshaven: I have no idea why this happens. Maybe a data issue? I’ve tried to reproduce it with my cities-and-town-only database, and also with new-york-latest.osm.pbf (from last week) from Geofabrik. With both, I could not reproduce this issue.

Maybe it is a data issue that is now fixed.

@HolgerJeromin
Copy link
Contributor

@sommerluk The "fixes #xy" things must be in the commit message or in the first comment/description from you. Please add my issues, too :-)

@Teester
Copy link

Teester commented Sep 12, 2015

I've applied this patch to my local dataset (Ireland). It is an area where few towns have populations above 10000, so with this patch as it stands, many towns with no population data are superseding towns with population data, by virtue them being assigned a population of 10000 by the patch. So, major towns with population data are not appearing in favour of smaller towns which have no population data.

Changing the population assigned to 1000 seems to resolve the issue, for Ireland at least. I have not noticed any problems with cities displayed in my dataset. Is there information available on average town population in OSM or proportion of towns with population included.

@matkoniecz
Copy link
Contributor

@Teester Thanks for testing!

Probably setting it low and encouraging to tag population for at least major towns is better than setting it high and expecting that every single small place=town will be tagged with population data.

@kocio-pl
Copy link
Collaborator

+1

@sommerluk
Copy link
Collaborator Author

@HolgerJeromin Thanks for the hint!

@sommerluk
Copy link
Collaborator Author

The fallback score for towns without population key is now 1000. Thanks for testing!

@matkoniecz
Copy link
Contributor

100 000 for cities is also too high - there is plenty of place=city with lower population - see http://overpass-turbo.eu/s/bqR

@imagico
Copy link
Collaborator

imagico commented Sep 14, 2015

Selecting a low fallback population number will probably not be much encouragement for tagging correct populations but likely significantly push tagging for the renderer with the place classes (if a town does not show up it is much more straight forward to tag it place=city instead of finding out the population number).

The most encouraging approach for the mapper for correct tagging overall is quite surely interpreting the tags exactly as they are supposed to be used.

The most important problem of the approach here is that is does not display smaller places even if they are far away from other places. Like:

http://www.openstreetmap.org/node/244081999
http://www.openstreetmap.org/node/31219396
http://www.openstreetmap.org/node/29341070

The first two are correctly tagged as city, the latter is tagged for the renderer. They could all well show up at z5 although they do not. In the first case tagging the correct population will probably help, in the other two probably not.

Apart from that i still think major cities should have priority over subnational admin entities.

@sommerluk
Copy link
Collaborator Author

100 000 for cities is also too high

I’m not so sure about this. For towns, http://wiki.openstreetmap.org/wiki/Key:place says: “… often with a population of 10,000 people and good range of local facilities… In areas of low population, towns may have significantly lower populations.” This is a good reason to use a fallback score below 10 000.

However, for cities, the very same wiki page says: “Should normally have a population of at least 100,000 people and be larger than nearby towns.” I know that the place value is not just population-dependent, but following the wiki a place=city with a population lower than 100 000 should be an exception.

@pnorman
Copy link
Collaborator

pnorman commented Sep 15, 2015

I’m not so sure about this. For towns, http://wiki.openstreetmap.org/wiki/Key:place says: “… often with a population of 10,000 people and good range of local facilities… In areas of low population, towns may have significantly lower populations.” This is a good reason to use a fallback score below 10 000.

However, for cities, the very same wiki page says: “Should normally have a population of at least 100,000 people and be larger than nearby towns.” I know that the place value is not just population-dependent, but following the wiki a place=city with a population lower than 100 000 should be an exception.

The 10%, median, and 90% population values for place=city are 30k, 131k, and 800k. For town they are 2.4k, 12k, and 45k.

@matthijsmelissen matthijsmelissen merged commit 596611a into gravitystorm:master Sep 16, 2015
@matthijsmelissen
Copy link
Collaborator

Thanks a lot for the PR @sommerluk! And sorry for the show review process.

I think this is a significant improvement in terms of code, so I'm happy to merge it. We can always finetune later.

By the way, it seems you are sorting by category both in the stylesheet (wit attachments) and in the SQL statement. Wouldn't sorting at only one place be sufficient?

@sommerluk
Copy link
Collaborator Author

Thanks @math1985 for merging.

you are sorting by category both in the stylesheet (wit attachments) and in the SQL

Good catch! PR at #1843 …

@yopaseopor
Copy link

Sorry, but #1083 still happens. Still happens at zoom level 9 http://c.tile.openstreetmap.org/9/258/191.png (the first city I talk about does not appear). Other levels fixed.

@pnorman
Copy link
Collaborator

pnorman commented Sep 20, 2015

This commit is not in the version used on tile.openstreetmap.org as it's not in a released version yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet