Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgresql provider - Use ST_RemoveRepeatedPoints instead of ST_SnapToGrid #2410

Merged
merged 1 commit into from
Dec 3, 2015

Conversation

mdouchin
Copy link
Contributor

When using a PostGIS 2.2 instance, we can now use the ST_RemoveRepeatedPoints function instead of the ST_SnapToGrid function, as described by Paul Ramsey : http://blog.cartodb.com/smaller-faster/

@ahuarte47
Copy link
Contributor

+1, great!

But the ST_RemoveRepeatedPoints method can break the topology between adjacent polygons?
It may be necessary to use this function only for lines.

@m-kuhn
Copy link
Member

m-kuhn commented Nov 2, 2015

Does the topology also matter when using sub-pixel optimization?

@mdouchin
Copy link
Contributor Author

mdouchin commented Nov 2, 2015

@m-kuhn I do not think so. I think simplified data is only used for rendering. Whenever you need to modify a feature (or snap), I remember previous discussions about using un-simplified versions for data interactions. Perhaps someone can confirm this.

It leads to another question. I think this ST_RemoveRepeatedPoints function is equivalent to ST_SnapToGrid when used for sub-pixel (or pixel) optimization. But I think also it will provide better rendering when using a higher tolerance (like 5 pixels), so it would be great to have.

@m-kuhn
Copy link
Member

m-kuhn commented Nov 2, 2015

Yes, it's only used for rendering (in core, plugins may use it differently).

I think it may introduce some offsets between polygons which are supposed to be adjacent. But this only applies if

  • The tolerance is above 1 pixel
  • The polygons are actually adjacent

For other circumstances, ST_Simplify in combination with ST_RemoveRepeatedPoints would give great performance with no trade-off. Maybe topological simplification should be an option (and a hint in the gui that with tolerance <= 1 pixel there's no real reason to do this)?

@ahuarte47
Copy link
Contributor

Hi, there is an old pull with comments about this.

#1131

Simply it commented that ST_Simplify postgis function could cause empty pixels. Now, I guess this error is already fixed.

@palmerj detected the issue:

8b53001#commitcomment-5176067

@mdouchin mdouchin force-pushed the postgis_2_2_server_simplification branch from 1fe72af to 21a2a49 Compare November 12, 2015 09:39
@mdouchin
Copy link
Contributor Author

I have just updated (forced) my branch with 2 goals:

  • make the code more readable: I think it is sometimes better to use plain if / else instead of cascading "? :"
  • I corrected the issue mentioned by @jef-n .

I also added a message in QGIS log to get the PostGis method used, which I would remove after enough checking have been done
I need people to test it with QGIS and postgis 2.2 and report

@m-kuhn
Copy link
Member

m-kuhn commented Nov 17, 2015

Any objections to merging this?

@mdouchin
Copy link
Contributor Author

It needs some testing first. I have not yet tested it with PostGIS 2.2 (since my laptop is under ubuntu and postgis 2.2 cannot be installed easily. I need to use docker to have a testing database...
We also need to check how it impacts performances for various cases

@m-kuhn
Copy link
Member

m-kuhn commented Nov 17, 2015

What did you test?

@mdouchin
Copy link
Contributor Author

@jef-n I will fix this.

This pull-request is here for discussion and testing by devs. I will be happy to make any needed modifications, and close it if our tests show no real performance gain.
I have just set-up a postgis 2.2 local instance via vagrant, so I will report here tests on speed and issues encountered

@m-kuhn
Copy link
Member

m-kuhn commented Nov 24, 2015

@mdouchin any feedback on the speed difference?

@mdouchin
Copy link
Contributor Author

@m-kuhn I have juste tested, and I have good results ( ex: 10s for snaptogrid, and 6s for remoterepeatedpoints ).
I will provide sample data and more information (total number of nodes, query duration, etc.)

@mdouchin mdouchin force-pushed the postgis_2_2_server_simplification branch from 21a2a49 to 976d7db Compare November 27, 2015 13:19
@mdouchin
Copy link
Contributor Author

Hi all !
I have just updated (force) my PR
Here are some quick tests results made with QGIS master with this PR and PostGIS 2.2

They look really promising, so I would ask for some volunteers to test this PR too, and report here.
Thanks in advance

Data test

A polygon layer from Corine Land Cover data (39726 adjacent polygons).
I get this data from French Ministry WFS server:

wget "http://clc.developpement-durable.gouv.fr/geoserver/wfs?SERVICE=WFS&VERSION=1.0.0&REQUEST=GetFeature&TYAME=clc:CLC12&SRSNAME=EPSG:2154&BBOX=595429.63676435151137412,6135281.90846782270818949,975338.32696532714180648,6429916.60834944248199463" -O /tmp/clc.gml

Then I import this data via DBManager into a database with PostGIS 2.2, and use this layer as a test layer.

Intro

When working with adjacent polygons, if tolerance is above 1px, ST_RemoveRepeatedPoints method creates some tiny (but visible) holes. So I will just use ST_SnapToGrid when tolerance is > 1px.
I could also use ST_RemoveRepeatedPoints for any tolerance if the geometry type is not Polygon, but I am not sure yet how I can achieve this. If you think it is usefull, please tell me.

The following results have been made with a 1px tolerance, with simplification done server-side.

Results

No simplification

  • Query:
SELECT st_asbinary("geom",'NDR'),"id" 
FROM "public"."clc" 
WHERE "geom" && st_makeenvelope
(393043.59441092703491449,6114766.10379389766603708,1091598.51655095303431153,6589884.32520610373467207,2154)

  • Explain Analyse:
    • Planning time: 0.144 ms
    • Execution time: 55.537 ms
  • Number of vertices : 3 763 284
  • Example of QGIS log (10 first lines):
2015-11-26 13:35:58 GMT [1088-39] postgres@qgis LOG:  duration: 29.335 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-40] postgres@qgis LOG:  duration: 28.397 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-41] postgres@qgis LOG:  duration: 23.732 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-42] postgres@qgis LOG:  duration: 31.173 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-43] postgres@qgis LOG:  duration: 32.296 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-44] postgres@qgis LOG:  duration: 23.584 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-45] postgres@qgis LOG:  duration: 39.870 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-46] postgres@qgis LOG:  duration: 38.210 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-47] postgres@qgis LOG:  duration: 40.120 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:35:59 GMT [1088-48] postgres@qgis LOG:  duration: 42.670 ms  statement: FETCH FORWARD 2000 FROM qgis_2

ST_SnapToGrid

  • Query
SELECT st_asbinary(st_snaptogrid("geom",403.497),'NDR'),"id" 
FROM "public"."clc" 
WHERE "geom" && st_makeenvelope(393043.59441092703491449,6114766.10379389766603708,1091598.51655095303431153,6589884.32520610373467207,2154)

  • Explain Analyse:
    • Planning time: 0.191 ms
    • Execution time: 198.299 ms
  • Number of vertices: 1 230 492
  • Example of PostGIS log:
2015-11-26 13:38:42 GMT [1099-25] postgres@qgis LOG:  duration: 23.430 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:42 GMT [1099-26] postgres@qgis LOG:  duration: 21.235 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:42 GMT [1099-27] postgres@qgis LOG:  duration: 24.964 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:42 GMT [1099-28] postgres@qgis LOG:  duration: 22.809 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:42 GMT [1099-29] postgres@qgis LOG:  duration: 24.110 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:43 GMT [1099-30] postgres@qgis LOG:  duration: 20.175 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:43 GMT [1099-31] postgres@qgis LOG:  duration: 34.276 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:43 GMT [1099-32] postgres@qgis LOG:  duration: 23.406 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:43 GMT [1099-33] postgres@qgis LOG:  duration: 23.196 ms  statement: FETCH FORWARD 2000 FROM qgis_2
2015-11-26 13:38:43 GMT [1099-34] postgres@qgis LOG:  duration: 19.968 ms  statement: FETCH FORWARD 2000 FROM qgis_2

ST_RemoveRepeatedPoints

  • Query
SELECT st_asbinary(st_removerepeatedpoints("geom",407.618),'NDR'),"id" 
FROM "public"."clc" 
WHERE "geom" && st_makeenvelope(393043.59441092703491449,6112340.26313023269176483,1091598.51655095303431153,6592310.16586976870894432,2154)

  • Explain Analyse:
    • Planning time: 0.149 ms
    • Execution time: 117.905 ms
  • Number of vertices: 689 300
  • Example of QGIS log:
2015-11-26 13:35:37 GMT [1088-15] postgres@qgis LOG:  duration: 13.760 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:37 GMT [1088-16] postgres@qgis LOG:  duration: 13.244 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:37 GMT [1088-17] postgres@qgis LOG:  duration: 11.229 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:37 GMT [1088-18] postgres@qgis LOG:  duration: 14.678 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-19] postgres@qgis LOG:  duration: 13.486 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-20] postgres@qgis LOG:  duration: 10.475 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-21] postgres@qgis LOG:  duration: 13.571 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-22] postgres@qgis LOG:  duration: 13.639 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-23] postgres@qgis LOG:  duration: 14.216 ms  statement: FETCH FORWARD 2000 FROM qgis_1
2015-11-26 13:35:38 GMT [1088-24] postgres@qgis LOG:  duration: 13.632 ms  statement: FETCH FORWARD 2000 FROM qgis_1

Quick comparison

When summing up only the time fetching data for the ten first lines, I get these times in ms:

raw data snaptogrid removerepeatedpoints
29,34 23,43 13,76
28,40 21,24 13,24
23,73 24,96 11,23
31,17 22,81 14,68
32,30 24,11 13,49
23,58 20,18 10,48
39,87 34,28 13,57
38,21 23,41 13,64
40,12 23,20 14,22
42,67 19,97 13,63
sum sum sum
329,387 237,569 131,93

@m-kuhn
Copy link
Member

m-kuhn commented Nov 27, 2015

Interesting, good work.

If I read Paul Ramsey's article it seems as if this is done as a pre-filtering step and ST_Simplify is still applied on top. Does this make sense?

@mdouchin
Copy link
Contributor Author

@m-kuhn I have not yet figured out what ST_Simplify would add here. If anyone knows...

I think the main improvement for QGIS would be to use TinyWKB instead of WKB to decrease the data size. But this is beyond my QGIS knowledge at the moment...

@mdouchin
Copy link
Contributor Author

It seems travis CI build failed. Ayn Travis "guru" to help me find out what is going on here ?

I will also test soon the use of ST_Simplify( "geomSimplifiedByRemoveRepeatedPoints" , tolerance, True) to compare the results

@m-kuhn
Copy link
Member

m-kuhn commented Nov 28, 2015

The sip bindings for the new methods are missing. No big issue.

@mdouchin
Copy link
Contributor Author

ok @m-kuhn I will add them then. Thanks

@nyalldawson
Copy link
Collaborator

@m-kuhn I have not yet figured out what ST_Simplify would add here. If anyone knows...

Yep - imagine a section of a geometry which consists of lots little details but appears as a straight line when zoomed out sufficiently. ST_RemoveRepeatedPoints will remove vertices which are <= 1px from each other, but (at worst) this section of the geometry will still include vertices corresponding to each pixel along the line. Calling ST_Simplify on the geometry with a suitable tolerance will strip out all these additional vertices and result in just a start & end vertex for the section of geometry. It would be a big improvement for certain geometries.

Note that you'll need to use the extra parameter to preserve collapsed geometries described by Paul here: http://blog.cartodb.com/smaller-faster/.
I can't find this mentioned anywhere in the PostGIS docs, but it seems like you can add a third boolean argument to st_simplify to preserved these. Eg:

select st_simplify(st_geomfromtext('Polygon((1 1, 1 2, 2 2, 2 1, 1 1 ))')::geometry(Polygon),10)

returns null, but

select st_simplify(st_geomfromtext('Polygon((1 1, 1 2, 2 2, 2 1, 1 1 ))')::geometry(Polygon),10,true)

returns the input polygon untouched. Not sure why this parameter is undocumented (or why passing false to it also results in persistence of collapsed geometries. @strk?

/** Sets the simplification threshold of the vector layer managed */
void setThreshold( float threshold ) { mThreshold = threshold; }
/** Gets the simplification threshold of the vector layer managed */
inline float threshold() const { return mThreshold; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear what the difference between threshold and tolerance is here. This could get quite confusing in future if threshold is used in a different way by another provider. Can you expand these doc strings to describe exactly what the difference is?

(also, why float here?)

@nyalldawson
Copy link
Collaborator

@mdouchin this sounds great... can't wait for this to land! Fantastic work :)

@strk
Copy link
Contributor

strk commented Nov 30, 2015 via email

@mdouchin
Copy link
Contributor Author

mdouchin commented Dec 1, 2015

Thanks all for your review.
I have tested a post simplification by using ST_Simplify and it new third parameter, which gives something like

st_simplify( st_removerepeatedpoints( geom, 200 ) , 200, true )

It helps to decrease again the weight of fetched geometrie, but creates some very small but visible artefacts (holes) in the sample data (adjacent polygons). It is visible if the layer symbology consists in a simple symbol with border and background of the same color.

At present I have only tested with adjacent polygons (not lines) and with a ST_Simplify 2nd parameter equal to the one used in ST_RemoveRepeatedPoints. I will try with another geometry type and to decrease the ST_simplify tolerance.

@mdouchin mdouchin force-pushed the postgis_2_2_server_simplification branch from 976d7db to 706e651 Compare December 1, 2015 13:32
@mdouchin
Copy link
Contributor Author

mdouchin commented Dec 1, 2015

Hi all. Thanks for your review !

New version pushed with the use of pre-filtering ST_simplify, which decreases the number of vertexes to download. For ST_Simplify, I chose to use a tolerance smaller than the one used in pre-filtering with ST_RemoveRepeatedPoints, just to be one the safe side.

Please test and report if anything must be changed

I also added the new methods in the SIP bindings, but travis is still complaining ( but the only errors I see are about DBManager )

@mdouchin
Copy link
Contributor Author

mdouchin commented Dec 1, 2015

Some quick results, with the same data and query as mentioned above, to show how the number of vertexes decreases. It would lead to best performance also because the data will be downloaded from the PostGIS faster

query number of vertexes
"geom" 3 763 284
st_snaptogrid( "geom", 407.618 ) 1 220 892
st_removerepeatedpoints( "geom", 407.618 ) 689 300
st_simplify( st_removerepeatedpoints( "geom", 407.618 ), 356.665, true ) 303 292

@mdouchin
Copy link
Contributor Author

mdouchin commented Dec 3, 2015

@m-kuhn @ahuarte47 @jef-n @nyalldawson @strk Any objection to merge it as is, to let more people test it, and modify behaviour afterwards if necessary ? I have no answer about it in qgis-dev mailing list. Which core dev is in charge of this part of the code ? Please assign her/him (I have no github rights to add tags or assignees to an issue for QGIS repository )

m-kuhn added a commit that referenced this pull request Dec 3, 2015
Postgresql provider - Use ST_RemoveRepeatedPoints instead of ST_SnapToGrid
@m-kuhn m-kuhn merged commit 3d5f33a into qgis:master Dec 3, 2015
@m-kuhn
Copy link
Member

m-kuhn commented Dec 3, 2015

Thank you @mdouchin !

@mdouchin
Copy link
Contributor Author

mdouchin commented Dec 3, 2015

Thanks to all reviewers, and for accepting this PR

@ahuarte47
Copy link
Contributor

Thank you @mdouchin ! great work!

@mdouchin mdouchin deleted the postgis_2_2_server_simplification branch December 3, 2015 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants