-
-
Notifications
You must be signed in to change notification settings - Fork 8
feat(geom,shard): ensure default simplify tolerance of 0.0001
#115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d2af7c0 to
3f0cbc1
Compare
|
Hi there, I wanted to see how much this PR was interesting at perf level. I'm a bit surprised by the result, I think I missed something, could you do some tests with your tools too ? Based on #106 (comment) Database sizesSELECT COUNT(*) from shard;
We save almost 28% disk space by removing 0.000149504% of shards xD Number of shards by geometriesSELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT COUNT(*) AS count FROM shard GROUP BY source, id);
Almost the same here SELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT COUNT(*) AS count FROM shard GROUP BY source, id HAVING COUNT(*) > 1);
Almost the same here Number of points by shardsSELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT ST_NPoints(geom) AS count FROM shard);
Almost the same here Gatling ConfigurationSince I'm using a French dataset, I will query only in French regions, this is the environments SpecsResults
I don't understand why the result on the simplified polygons are that bad, max at 56s 😱. I checked all the stats, CPU/RAM/Network/IO, I see only one inconsistency, the IO read speed is between 2.3MiB/s to 8.2MiB/s with simplify I don't know why this is happening, maybe a issue with indexes ? Or my server ? Viewer differencesSlight differences. |
|
@Joxit yeah that's super weird, I can't think of any reason why simplified polygons would be slower 😆 Anyway, to sanity check I did perform the benchmark again, it's fairly basic in that it only tests a single URL, maybe that's the difference? Can you try this method? git log -1
commit 5d6193c551a68b937cbcb2bbaef289d2fe1ba65b (HEAD -> master, origin/master, origin/HEAD)
Author: Peter Johnson <insomnia@rcpt.at>
Date: Fri Sep 12 11:29:36 2025 +0200
improvement(demo): set default simplification to 0data is the New Zealand extract of WOF (I was looking for something small and fairly representative of real-world data, rather than the ZCTA dataset): .rw-r--r--@ 426Mi peter 10 Sep 10:52 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db
322529f0dc8189e950626e5176c34b70304350a7 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.dbgenerate two databases: sqlite3 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db 'SELECT json_extract(body, "$") FROM geojson' | node bin/spatial.js --db=nz.0.0001.spatial.db import whosonfirst --tweak_module_geometry_simplify=0.0001 --tweak_module_shard_simplify=0.0001sqlite3 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db 'SELECT json_extract(body, "$") FROM geojson' | node bin/spatial.js --db=nz.0.0.spatial.db import whosonfirst --tweak_module_geometry_simplify=0.0 --tweak_module_shard_simplify=0.0/code/pel/spatial master = *6 ?21 ❯ l nz.0.*
.rw-r--r--@ 425Mi peter 15 Sep 11:19 -I nz.0.0.spatial.db
.rw-r--r--@ 104Mi peter 15 Sep 11:18 -I nz.0.0001.spatial.dbrun server node bin/spatial.js server --db=nz.0.0.spatial.dbor/ node bin/spatial.js server --db=nz.0.0001.spatial.dbk6 testing cat load.js
import http from 'k6/http'
const baseurl = 'http://localhost:3000/query/pip/_view/pelias'
export default function () {
const lon = '174.77607'
const lat = '-41.28655'
http.get(`${baseurl}/${lon}/${lat}`)
}k6 run --vus 20 --iterations 10000 load.jsresults show significant improvement from simplification (0.0 on the left and 0.0001 on the right) 19,20c19,20
< http_req_duration.......................................................: avg=6.34ms min=1.23ms med=5.59ms max=55.37ms p(90)=10.52ms p(95)=12.11ms
< { expected_response:true }............................................: avg=6.34ms min=1.23ms med=5.59ms max=55.37ms p(90)=10.52ms p(95)=12.11ms
---
> http_req_duration.......................................................: avg=3.29ms min=641µs med=2.33ms max=37.26ms p(90)=6.74ms p(95)=8.94ms
> { expected_response:true }............................................: avg=3.29ms min=641µs med=2.33ms max=37.26ms p(90)=6.74ms p(95)=8.94ms
22c22
< http_reqs...............................................................: 10000 3121.513659/s
---
> http_reqs...............................................................: 10000 5972.54897/s
25,26c25,26
< iteration_duration......................................................: avg=6.38ms min=1.25ms med=5.64ms max=55.41ms p(90)=10.58ms p(95)=12.16ms
< iterations..............................................................: 10000 3121.513659/s
---
> iteration_duration......................................................: avg=3.33ms min=669.37µs med=2.36ms max=37.29ms p(90)=6.82ms p(95)=9.03ms
> iterations..............................................................: 10000 5972.54897/s
31,32c31,32
< data_received...........................................................: 11 MB 3.4 MB/s
< data_sent...............................................................: 1.1 MB 350 kB/s
---
> data_received...........................................................: 11 MB 6.5 MB/s
> data_sent...............................................................: 1.1 MB 669 kB/smaybe my method is flawed? can you try to reproduce this? |
|
Can you please check you're using the In the past this Pelias view (which is backwards-compatible with |
|
Worth mentioning I actually changed the default from |
|
These changes to the shard counts are far more significant than what you posted, of course this will depend heavily on how over-detailed the geometries were in the first place, I'm very surprised by your comment:
spatialite -silent nz.0.0.spatial.db 'SELECT COUNT(*) FROM shard'
124346
spatialite -silent nz.0.0001.spatial.db 'SELECT COUNT(*) FROM shard'
25212 |
HA HA HA HA HA HA, thanks, I messed up this part 🤣 the I success to increase the number of users to 90k, when I increase the number of users, results are inconsistent, I think this is the breaking point. The overall stats are now logical, the simplified version is a bit faster, here are the new results :
So everything looks fine now 😄 In France, WOF data is already simplified, maybe that's why I don't have a big shard difference ? |
|
@Joxit agh funny, if you get a chance could you please benchmark this PR? I don't like that the In order for it to work the database needs to have a Docker image of that PR branch: https://hub.docker.com/layers/pelias/spatial/pip-pelias-summary-2025-09-12-6089d97b00eb2b9694a382a0fe20d760dec7adea/images/sha256-5ae942e1c54dbfe41f663597660976bb6d5a08effed5453a91867f25390f1bff |
|
Okay I will try your PR tomorrow ! I will try to publish a benchmark with 90k users and try higher one next Note for myself : don't forget to update the images at build time 😂 |

this PR sets the default Douglas-Peuker simplification tolerance for both
geometryandshardtables to0.0001in practice this value is so small it is visually indistinguishable, but has the benefit of reducing the database size by up to 75% 😱
see: https://www.seabre.com/simplify-geometry/
cc/ @Joxit I think this may be preferable to updating the shard complexity, or maybe we do both 🤔