Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized woodland polygons #4110

Closed
mboeringa opened this issue Apr 4, 2020 · 9 comments
Closed

Generalized woodland polygons #4110

mboeringa opened this issue Apr 4, 2020 · 9 comments

Comments

@mboeringa
Copy link

mboeringa commented Apr 4, 2020

Hi all,

As most of you probably know through my sporadic contributions in the threads here, I have been working on a personal project creating my own ArcGIS based OpenStreetMap renderer.

As part of that work, I recently developed a novel geoprocessing workflow for generating generalized woodland polygons from OpenStreetMap data for small scale display at low to mid zoom scales (Z0-Z11 roughly). With the very fast pace at which woodland / forest polygons currently seem to be added (I noticed it appears almost 1,5M new woodland polygons have been added in recent months on a total of currently almost 12M records), this is becoming ever more useful.

Contrary to most attempts to generalize or generate woodland polygons for small scale display, that are usually raster or imagery based, this process flow is 100% vector based and uses OpenStreetMap data directly, based on osm2pgsql imported woodland polygons. The process is a hybrid solution, that involves both PostGIS functions and ArcGIS based geoprocessing tools, and it generates seven progressively more generalized PostGIS layers that I can subsequently convert to shapefile if needed (I only use the PostGIS layers in my own work). Note that this is a complex workflow, it isn't the typical crude "minimum-area" filter, far from it! It is a multi-stage generalizing, filtering and geoprocessing workflow that involves a buffering & dissolve step as well, to stitch together small polygons in larger structures to prevent them from being lost. The buffering involved is a geodesic buffering, meaning results are consistent across latitudes. For optimal cartography, the data is smoothed as well in a final step.

I think the results of this process flow are quite unique and I have been thinking of sharing this part of my work on my ArcGIS Renderer for the benefit of the OpenStreetMap community.

The data could be used for rendering woodland polygons all the way down from Z0. Yes, I am not joking, Z0! See the attached screenshot of a 1:296M "Web Mercator scale" rendering of OpenStreetMap woodland polygons. At that scale, there is currently only 1091 highly generalized polygons that need to be rendered, so more than 11.9M complex woodland polygons were reduced to just over a thousand polygons for global scale rendering by this process.

PROs:

  • Ability to show woodland up to Z0 due to sharply reduced polygon numbers.
  • Could potentially lead to a significant boost in the rendering speed on low to mid zoom scales as far less, and much less complicated, polygons need to be rendered.
  • Likely necessary for vector tiles with client-side rendering.
  • Less random "drop out" of polygons as often witnessed in vector tiles due to crude "minimum area" filtering.

CONs:

  • Any generalization is a compromise, and no generalization is "perfect". Pixel peeking and saying: "Uh, why didn't it include that forest patch in my back yard?" isn't much sensible for e.g. a generalized layer designed for 1:1M scale. One will have to accept that the rendering result looks significantly different from today's one at the scales where the layers are used, as the generalized data does not give pixel level landcover rendering.
  • Since inner holes below a threshold minimum area are dropped, the data is not really suitable to be rendered in combination with other landuses. This is especially so for the less accurate / more generalized mid zoom to low zoom "Woodland - Small Scale 5-7" layers. I personally use these layers in a combination with water features, glaciers and things like deserts only, natural stuff that is largely mutually exclusive, so no typical agricultural landuse.
  • As an extension of the above, it is best to render water features always on top of these layers, as you generally would like to keep larger lakes visible, even if the generalized woodland polyon happened to drop the lake interior, because the process flow does not do magic to prevent this (it is already highly complex, adding such tests would be a major thing). Although the generalization settings are thus, that you would probably not notice it much, because only inner holes below a reasonable threshold (based on scale) are dropped. In addition, it is really rare to have truly large lakes being encompassed entirely by a single generalized woodland polygon, so this issue is actually a very minor one.
  • I am not able to share the process flow as some open source script. It is actually fully and tightly integrated with a custom ArcGIS ModelBuilder/Python toolbox that I developed for the rendering in ArcGIS, and as such not suitable for sharing. Besides that, the toolbox it is part of, has many functions totally unrelated to this work. So I will only be sharing the end result. I don't know if that is no-go for some here, but that is the actual situation.

TEST DATA SETS:
For now, for anyone who is interested to try out these layers, I have made a set of the "Woodland - Small Scale 3-7" data sets available as zipped shapefiles on a Dropbox account, see the links. I have not included the 1 & 2 layers, as these are to big as shapefiles (> 1GB), and less useful as they were designed for the higher zoom scale ranges (+/- 1:50k to 1:500k, Z12-13), where rendering the full data isn't really problematic. I am quite sure I will therefor also not make these highest generalized zoom levels available, as there isn't much sense to it (I do use them myself for my personal renderer).

Note that the currently available shapefiles are in WGS1984, based on the osm2pgsql import settings I used. I could make them available in Web Mercator if preferred, that is only a minor change in processing workflow.

The processing involves a final subdividing / dicing step as well, limiting the geometries' sizes to 5000 vertices max.

REVIEWING THE DATA SETS:
Please be fair and only really review the data sets at the scales for which they were designed, see the listing entirely below. Yes, I know, if you zoom in to 1:50k scale on a dataset designed for 1:2M to 1:5M (the "Woodland - Small Scale 5" dataset), you will see apparently strange results in places. This is unavoidable and a side consequence of generalizing for small scale based on highly detailed OSM data. If a good cartographer would manually draw woodland polygons for a 1:5M map, and you would start to compare it to 1:50k data, then obviously strange and incomprehensible results or artefacts would be visible as well. I see the results at best as a "representation" of reality, not reality itself. Note that any artefacts will be significantly reduced for the higher zoom 3 & 4 layers, that more closely match "reality", whatever that is... I may be able to make some minor improvements to the algorithm in the future as well, but can't guarantee that.

I also recommend setting a 0.4 outline on the symbology of the polygons. Due to the specific process flow, and also a need to do an intermediate subdividing step with a 100k vertex limit to drastically reduce subsequent processing times (otherwise processing would extent into weeks!), tiny gaps may occasionally be visible. Setting a small outline in the same color as the polygon fill, will mostly hide such issues. Additionally, be aware also that again due to the process flow, the data may occasionally show tiny overlaps between polygons, the data is not 100% topologically correct in that respect. This happens only very rarely though.

Zip file sizes:

woodland_small_scale_3: 588 MB
woodland_small_scale_4: 138 MB
woodland_small_scale_5: 30 MB
woodland_small_scale_6: 9 MB
woodland_small_scale_7: <1 MB

NOTE:
The Dropbox links to downloadable data of the generalized woodland that were present here have been removed due to no interest in the data.

FUTURE:
Note that this data is only made available for preliminary testing. Do not rely on it for production! I cannot guarantee that these data sets will stay available. It depends on the response here what I will do. I think it would be best if the data was ultimately made available through the FOSSGIS run "https://osmdata.openstreetmap.de/" data server managed by Jochen Topf and Imagico that also serves pre-generalized data for oceans, land and antarctic icesheet, but I haven't contacted them yet about this.

The processing for this, even when limited to "Woodland - Small Scale 3-7", is actually quite heavy. Although I have managed to make most of the process flow multi-core enabled, it still requires a few days to generate these data sets on my currently limited 4 core / 8 threads test server (how I wished I had one of these new AMD 64 core processors ;-)). For any kind of future availability, it would mean a two weekly or monthly update being most realistic.

STATS:
Number of polygons:

All woodland polygons: 11,928,938

(Woodland - Small Scale 1: 2,883,276): See remarks above, likely not becoming available.

(Woodland - Small Scale 2: 1,151,113): See remarks above, likely not becoming available.

Woodland - Small Scale 3: 468,685

Woodland - Small Scale 4: 126,411

Woodland - Small Scale 5: 25,818

Woodland - Small Scale 6: 6,786

Woodland - Small Scale 7: 1091

Appropriate zoom scales:

("Woodland - Small Scale 1": 1:50k-1:100k / Z13):: See remarks above.

("Woodland - Small Scale 2": 1:100k-1:250k / Z12): See remarks above.

"Woodland - Small Scale 3": 1:250k-1:500k / Z11

"Woodland - Small Scale 4": 1:500k-1:2M / Z9-10

"Woodland - Small Scale 5": 1:2M-1:5M / Z7-8

"Woodland - Small Scale 6": 1:5M-1:25M / Z5-6

"Woodland - Small Scale 7": 1:25M-1:500M / Z0-4

SAMPLE screenshots:
afbeelding
afbeelding
afbeelding
afbeelding
afbeelding
afbeelding

Marco Boeringa
The Netherlands

@imagico
Copy link
Collaborator

imagico commented Apr 4, 2020

Some reasons why this is not going to be a suitable suggestion for this style:

  • non-open source processes are incompatible with the OSMF FOSS policy and therefore would make a style relying on such unsuitable for deployment on OSMF computing infrastructure. Suggesting a style with such for featuring on osm.org would be possible but non-open source processes are a counterargument there as well.
  • Heavy and visible geometry processing is incompatible with the goal of this style to be intuitively understandable for mappers as far as the relationship between data and the appearance on the map is concerned.
  • For the same reason we try to limit the volume of data used that is not minutely (or at least daily) updated from the OSM database.

@jeisenbe
Copy link
Collaborator

jeisenbe commented Apr 4, 2020

I think it would be best if the data was ultimately made available through the FOSSGIS run "https://osmdata.openstreetmap.de/" data server managed by Jochen Topf and Imagico that also serves pre-generalized data for oceans, land and antarctic icesheet

I agree that's worth offering, especially for woodland_small_scale_5/6/7 - I can see how that might be useful for other map styles at low zoom levels, where you won't usually update the tiles more than once a month.

@pnorman
Copy link
Collaborator

pnorman commented Apr 4, 2020

I'd be okay with going for generalized forest layers, but we're not capable of running arcgis scripts.

@mboeringa
Copy link
Author

@pnorman

Due to the restrictions I already outlined, my idea was to only make the end product of the generalization available as ready to use shapefiles, so there is no need to run ArcGIS scripts for anyone. Just like today's land, ocean and artic icesheet polygons have been made available as ready to use data on the FOSSGIS server, and people do not have to run these complex and time consuming processes themselves.

By the way, one reason that I have been pondering making this data available, is also the current display performance issues experienced on www.openstreetmap.org.

I know this is largely a caching server issue, as e.g. the JOSM editor currently opens up with a call for caching server capacity, but I have been wondering if mid to low zoom rendering would be helped by such generalized data as well, and if this could have an impact on the overall rendering ecosystem's performance? Is the actual rendering currently a concern with the highly detailed original woodland polygons? Would there be any benefits to using generalized data? @pnorman can you shed some light on this?

@imagico . I share and understand your concerns related to FOSSGIS and open source, and I realize the current proposition does not fit the bill. The main reason I suggested the OpenStreetMap Data website as a possible distribution channel, is that the data's purposes are largely the same, and that it would provide a one stop solution for those seeking out such generalized data.

@mboeringa
Copy link
Author

mboeringa commented Apr 7, 2020

Since there appears to be no more desire to discuss this further, I am closing this issue for now.

@mboeringa
Copy link
Author

I have decided to remove the generalized data on my Dropbox account due to the lack of interest.

If anyone still would like to review the data, leave a message here.

@tomass
Copy link

tomass commented Apr 12, 2020

It would be interesting to read about the steps you used for some specific scale. Something like:

  1. Group/cluster polygons closer than 100 meters.
  2. Buffer +50m, -100m, +50m. Join type=X (as you used Arc, you've probably used Polygon amalgamation functions here?)
  3. Remove polygons smaller than 1km2.
  4. SimplifyDP/VW with tolerance 50.

Have you tried snapping to grid method? What about topological dissolving?

But maybe this is not the right place to discuss Cartography? Maybe forum could be the place for it?

@mboeringa
Copy link
Author

mboeringa commented May 27, 2020

It would be interesting to read about the steps you used for some specific scale. Something like:

1. Group/cluster polygons closer than 100 meters.

2. Buffer +50m, -100m, +50m. Join type=X (as you used Arc, you've probably used Polygon amalgamation functions here?)

3. Remove polygons smaller than 1km2.

4. SimplifyDP/VW with tolerance 50.

Have you tried snapping to grid method? What about topological dissolving?

But maybe this is not the right place to discuss Cartography? Maybe forum could be the place for it?

@tomass

Sorry for the scandalously late response to your question. I was in the middle of further development relating to my renderer, that I really wanted to finish first.

As you undoubtedly realize from your own experiences with generalization and the advanced algorithms you developed and posted about on the OpenStreetMap Diaries, tackling the woodland problem is not easy. It requires a complex multi-stage filtering and generalization process.

Roughly, the process goes like this:

  • Select anything tagged as woodland, I use landuse=forest and natural=wood as most people do.
  • Filter out any really tiny features. You do not want to amalgamate every tiny patch of "landuse=forest" tagged in someones backyard in a city environment during the buffering stage. The area filtering value to use for this step is actually pretty critical. For example, here in the Netherlands, many large forests have been cut up in very small (only one to a few hectares) patches. Set this first stage filtering to large, and you will end up with large gaps in the forest cover in a country like the Netherlands. Additional problem in the Netherlands is that these small patches are often unconnected, they are separated by a small gap representing a path / track. This is why the buffer & dissolve stage is so critical as well: Just filtering out anything below X m2 would again lead to large losses of forest cover in a country like the Netherlands, if no buffering stage is employed.

afbeelding

  • First stage generalization: a very mild generalization using a very small tolerance to weed out just the most densely digitized features.

  • Selection of "long / ragged / thin features" and putting them "on hold". These features will not participate in the buffering & dissolve stage. What do I mean with "long / ragged / thin features"? If you look at the areas below, one from Russia, the other from Cameroun, you will notice that there are some really thinnish forest patches. In Russia, it is artificial strips of forests between agricultural land, in Cameroun it is natural vegetation in valleys in mildly undulating land.
    Buffering and dissolving these features causes issues, as the final generalization step also selectively removes inner holes. If you would include these features in the buffering and dissolving stage, you may end up with a vast over-representation of forest cover once inner hole elimination kicks in. How to smartly remove these "long / ragged / thin features" is actually quite a cartographic challenge. I use a combination of several calculated geometric properties to distinguish them. Just one example of such property is what I call the "convex hull ratio", that is, the ratio between the surface area of the polygon, and the polygon's convex hull. If this is a low value (e.g. 0.15), than you likely have such a "long / ragged / thin features". However, as said, this is not the only property I use. Finding the right metrics and tuning them for each zoom scale has been quite a challenge and critical for the results.

https://www.openstreetmap.org/#map=13/44.6415/40.9261
afbeelding

https://www.openstreetmap.org/#map=12/6.0122/11.0811
afbeelding

  • Buffering stage: a buffer step to amalgamate small patches of forest into larger ones to prevent loosing vast areas of forest once minimum area filtering kicks in. Contrary to what you assumed, I do not buffer outwards and inwards in multiple stages. I only buffer outwards. The final smoothing step already has the property of collapsing polygons inwards slightly, and IMO removes the need to buffer inward again. Note that the buffering, to reduce vertex complexity and subsequent processing times, uses a very "coarse" setting for buffer accuracy. This stage is done in ArcGIS, and ArcGIS's buffer tool has a setting for this. PostGIS has such setting as well.

  • Dissolving of buffers to create single large forest patches. Note that this will generate humongous big polygons. I actually ran into processing difficulties because of this in terms of processing times for subsquent generalization steps (e.g. one polygon turned out to have 1.7M vertices and just this single record processed for > 10 hours). I introduced an intermediate subdividing step to combat this, but it must be done with care and balanced, as it causes issues with the final smoothing step. Also, much of the processing flow is multi-threaded using Python multi-threading options, so can take advantage of a proper multi-core database server to reduce the total processing times.

  • Merge back in the "long / ragged / thin features" that were put "on hold" in one of the previous stages.

  • Second stage generalization, minimum area filtering & inner hole elimination. This will also eliminate most of the "long / ragged / thin" features again that were added back in in the previous step, unless they are above the set limits for minimum area filtering, and thus so big as to being a significant feature that must be maintained, and that will be clearly visible in the resulting maps.

  • Final smoothing step. Smoothing actually re-introduces vertexes, and actually slightly "undoes" the generalization, but I found it critical for pleasing high quality cartographic results. I actually use a single smoothing iteration, which suffices and minimizes the added vertex complexity.

  • Final subdividing step to ensure no polygon has more than 5000 vertices.

Note that all of this is a just a rough outline. Finetuning all parameters, also for different zoom scales (as I showed I use 7 different generalized datasets for different scales), was quite a challenge. Finding the "sweet spot" is not that easy, and the margin for tuning these parameters, at least when you are critical for the cartographic end result, is actually quite small.

Another really challenging thing, was to actually create valid output. Most GISs are very critical on the validity of geometries, and just running ST_MakeValid was not enough. I have it tackled now, but this was tough.

@tomass
Copy link

tomass commented May 27, 2020

@mboeringa Thank you! Very interesting and contains a lot of food for thought. It is very difficult to find such detailed and most importantly - practical - steps for generalisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants