feat(geom,shard): ensure default simplify tolerance of `0.0001` #115

missinglink · 2025-09-10T04:12:05Z

this PR sets the default Douglas-Peuker simplification tolerance for both geometry and shard tables to 0.0001

in practice this value is so small it is visually indistinguishable, but has the benefit of reducing the database size by up to 75% 😱

see: https://www.seabre.com/simplify-geometry/

cc/ @Joxit I think this may be preferable to updating the shard complexity, or maybe we do both 🤔

Joxit · 2025-09-12T18:04:41Z

Hi there, I wanted to see how much this PR was interesting at perf level. I'm a bit surprised by the result, I think I missed something, could you do some tests with your tools too ?

Based on #106 (comment)

Database sizes

SELECT COUNT(*) from shard;

simplify	size	shards
0.0	6079M	722390
0.0001	4366M	722282

We save almost 28% disk space by removing 0.000149504% of shards xD

Number of shards by geometries

SELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT COUNT(*) AS count FROM shard GROUP BY source, id);

simplify	avg	min	50th	95th	99th	max
0.0	1.72975341573561	1	1.0	4.0	12.0	2194
0.0001	1.72949481114682	1	1.0	4.0	12.0	2188

Almost the same here

SELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT COUNT(*) AS count FROM shard GROUP BY source, id HAVING COUNT(*) > 1);

simplify	avg	min	50th	95th	99th	max
0.0	4.70975752264096	2	3.0	12.0	35.0	2194
0.0001	4.7093007682661	2	3.0	12.0	35.0	2188

Almost the same here

Number of points by shards

SELECT AVG(count), MIN(count), stats_median(count), stats_p95(count), stats_p99(count), MAX(count) FROM (SELECT ST_NPoints(geom) AS count FROM shard);

simplify	avg	min	50th	95th	99th	max
0.0	86.1006976840765	4	87.0	181.0	195.0	199
0.0001	86.0987412118812	4	87.0	181.0	195.0	199

Almost the same here

Gatling Configuration

Since I'm using a French dataset, I will query only in French regions, this is the environments USERS_COUNT=100000, USERS_RAMP_TIME=60, SERVER_URL=http://$SPATIAL_IP/query/pip/_view/pelias and CSV I used:

Region,LatMin,LatMax,LngMin,LngMax
AUVERGNE-RHONE-ALPES,44.1154,46.804,2.0629,7.1859
BOURGOGNE-FRANCHE-COMTE,46.1559,48.4001,2.8452,7.1435
BRETAGNE,47.278,48.9008,-5.1413,-1.0158
CENTRE-VAL DE LOIRE,46.3471,48.9411,0.053,3.1286
CORSE,41.3336,43.0277,8.5347,9.56
GRAND EST,47.4202,50.1692,3.3833,8.2333
HAUTS-DE-FRANCE,48.8372,51.089,1.3797,4.2557
ILE-DE-FRANCE,48.1205,49.2413,1.4465,3.5587
NORMANDIE,48.1799,50.0722,-1.9485,1.8027
NOUVELLE-AQUITAINE,42.7775,47.1758,-1.7909,2.6116
OCCITANIE,42.3331,45.0467,-0.3272,4.8456
PAYS DE LA LOIRE,46.2664,48.568,-2.6245,0.9167
PROVENCE-ALPES-COTE D'AZUR,42.9818,45.1268,4.2303,7.7188

Specs

OS: Debian 12.12
CPU: 4 thread (2.3ghz)
RAM: 15Go
Arch: linux/amd64
Docker Version: 28.3.2
Container: `pelias/spatial:master-2025-09-10-55c3660793cbc328aeaa98f7966b5572f6174f51`

Results

simplify	KO	% KO	Cnt/s	Min	50th	75th	95th	99th	Max	Mean	Std Dev
0.0	0	0%	1,639.34	10	12	13	18	27	108	13	4
0.0001	0	0%	1,587.3	12	312	426	856	45,264	56,841	1,429	6,542

I don't understand why the result on the simplified polygons are that bad, max at 56s 😱. I checked all the stats, CPU/RAM/Network/IO, I see only one inconsistency, the IO read speed is between 2.3MiB/s to 8.2MiB/s with simplify 0.0 and between 350MiB/s and 500MiB/s with simplify 0.0001.

I don't know why this is happening, maybe a issue with indexes ? Or my server ?

Viewer differences

GET /explore/pip#16/48.788510/2.310809

Slight differences.

missinglink · 2025-09-15T09:28:25Z

@Joxit yeah that's super weird, I can't think of any reason why simplified polygons would be slower 😆

Anyway, to sanity check I did perform the benchmark again, it's fairly basic in that it only tests a single URL, maybe that's the difference? Can you try this method?

git log -1
commit 5d6193c551a68b937cbcb2bbaef289d2fe1ba65b (HEAD -> master, origin/master, origin/HEAD)
Author: Peter Johnson <insomnia@rcpt.at>
Date:   Fri Sep 12 11:29:36 2025 +0200

    improvement(demo): set default simplification to 0

data is the New Zealand extract of WOF (I was looking for something small and fairly representative of real-world data, rather than the ZCTA dataset):

.rw-r--r--@ 426Mi peter 10 Sep 10:52  /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db
322529f0dc8189e950626e5176c34b70304350a7  /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db

generate two databases:

sqlite3 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db 'SELECT json_extract(body, "$") FROM geojson' | node bin/spatial.js --db=nz.0.0001.spatial.db import whosonfirst --tweak_module_geometry_simplify=0.0001 --tweak_module_shard_simplify=0.0001

sqlite3 /data/wof/sqlite/whosonfirst-data-admin-nz-latest.db 'SELECT json_extract(body, "$") FROM geojson' | node bin/spatial.js --db=nz.0.0.spatial.db import whosonfirst --tweak_module_geometry_simplify=0.0 --tweak_module_shard_simplify=0.0

/code/pel/spatial master = *6 ?21 ❯ l nz.0.*
.rw-r--r--@ 425Mi peter 15 Sep 11:19 -I  nz.0.0.spatial.db
.rw-r--r--@ 104Mi peter 15 Sep 11:18 -I  nz.0.0001.spatial.db

run server

node bin/spatial.js server --db=nz.0.0.spatial.db

or/

node bin/spatial.js server --db=nz.0.0001.spatial.db

k6 testing

cat load.js
import http from 'k6/http'
const baseurl = 'http://localhost:3000/query/pip/_view/pelias'

export default function () {
  const lon = '174.77607'
  const lat = '-41.28655'
  http.get(`${baseurl}/${lon}/${lat}`)
}

k6 run --vus 20 --iterations 10000 load.js

results show significant improvement from simplification (0.0 on the left and 0.0001 on the right)

19,20c19,20
<     http_req_duration.......................................................: avg=6.34ms min=1.23ms med=5.59ms max=55.37ms p(90)=10.52ms p(95)=12.11ms
<       { expected_response:true }............................................: avg=6.34ms min=1.23ms med=5.59ms max=55.37ms p(90)=10.52ms p(95)=12.11ms
---
>     http_req_duration.......................................................: avg=3.29ms min=641µs    med=2.33ms max=37.26ms p(90)=6.74ms p(95)=8.94ms
>       { expected_response:true }............................................: avg=3.29ms min=641µs    med=2.33ms max=37.26ms p(90)=6.74ms p(95)=8.94ms
22c22
<     http_reqs...............................................................: 10000  3121.513659/s
---
>     http_reqs...............................................................: 10000  5972.54897/s
25,26c25,26
<     iteration_duration......................................................: avg=6.38ms min=1.25ms med=5.64ms max=55.41ms p(90)=10.58ms p(95)=12.16ms
<     iterations..............................................................: 10000  3121.513659/s
---
>     iteration_duration......................................................: avg=3.33ms min=669.37µs med=2.36ms max=37.29ms p(90)=6.82ms p(95)=9.03ms
>     iterations..............................................................: 10000  5972.54897/s
31,32c31,32
<     data_received...........................................................: 11 MB  3.4 MB/s
<     data_sent...............................................................: 1.1 MB 350 kB/s
---
>     data_received...........................................................: 11 MB  6.5 MB/s
>     data_sent...............................................................: 1.1 MB 669 kB/s

maybe my method is flawed? can you try to reproduce this?

missinglink · 2025-09-15T09:31:56Z

Can you please check you're using the /query/pip/_view/pelias endpoint in both cases?

In the past this Pelias view (which is backwards-compatible with wof-admin-lookup) was much slower, I'm working on getting it closer to the performance of /query/pip.

missinglink · 2025-09-15T09:40:21Z

Worth mentioning I actually changed the default from 0.0001 to 0.00003, probably shouldn't affect this, just worth mentioning.

missinglink · 2025-09-15T09:43:26Z

These changes to the shard counts are far more significant than what you posted, of course this will depend heavily on how over-detailed the geometries were in the first place, I'm very surprised by your comment:

We save almost 28% disk space by removing 0.000149504% of shards xD

spatialite -silent nz.0.0.spatial.db 'SELECT COUNT(*) FROM shard'
124346

spatialite -silent nz.0.0001.spatial.db 'SELECT COUNT(*) FROM shard'
25212

Joxit · 2025-09-15T15:16:47Z

Can you please check you're using the /query/pip/_view/pelias endpoint in both cases?

In the past this Pelias view (which is backwards-compatible with wof-admin-lookup) was much slower, I'm working on getting it closer to the performance of /query/pip.

HA HA HA HA HA HA, thanks, I messed up this part 🤣 the /query/pip/_view/pelias was used only on the simplified version, the other endpoint was used on the non simplified, sorry false alarm 😅

I success to increase the number of users to 90k, when I increase the number of users, results are inconsistent, I think this is the breaking point.

The overall stats are now logical, the simplified version is a bit faster, here are the new results :

simplify	KO	% KO	Cnt/s	Min	50th	75th	95th	99th	Max	Mean	Std Dev
0.0	0	0%	1,475.41	11	18	23	41	59	923	22	27
0.0001	0	0%	1,475.41	10	17	21	31	43	265	19	8

So everything looks fine now 😄

In France, WOF data is already simplified, maybe that's why I don't have a big shard difference ?
The simplified database is saving only 108 shards and I have 95% of the geometries between 1-195 points per shards 🤷

missinglink · 2025-09-16T15:29:50Z

@Joxit agh funny, if you get a chance could you please benchmark this PR?

I don't like that the /query/pip/_view/pelias endpoint is so slow and inconsistent in your testing, that PR should hopefully reduce those nasty P99 scores as well as reducing all of the latencies across the board.

In order for it to work the database needs to have a summary table, if you don't have it you can regenerate the database using the latest code or generate it manually.

Docker image of that PR branch: https://hub.docker.com/layers/pelias/spatial/pip-pelias-summary-2025-09-12-6089d97b00eb2b9694a382a0fe20d760dec7adea/images/sha256-5ae942e1c54dbfe41f663597660976bb6d5a08effed5453a91867f25390f1bff

Joxit · 2025-09-16T16:13:32Z

Okay I will try your PR tomorrow !

I will try to publish a benchmark with 90k users and try higher one next

Note for myself : don't forget to update the images at build time 😂

missinglink mentioned this pull request Sep 10, 2025

tune module.shard.complexity #106

Open

feat(geom,shard): ensure default simplify tolerance of 0.0001

3f0cbc1

missinglink force-pushed the light-geometry-simplification branch from d2af7c0 to 3f0cbc1 Compare September 10, 2025 05:17

missinglink merged commit 55c3660 into master Sep 10, 2025
1 check passed

missinglink deleted the light-geometry-simplification branch September 10, 2025 05:45

This was referenced Sep 10, 2025

Pip updates #116

Merged

pip: rewrite pelias pip view to use summary table #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(geom,shard): ensure default simplify tolerance of `0.0001` #115

feat(geom,shard): ensure default simplify tolerance of `0.0001` #115

Uh oh!

missinglink commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Joxit commented Sep 12, 2025 •

edited

Loading

Uh oh!

missinglink commented Sep 15, 2025 •

edited

Loading

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

Joxit commented Sep 15, 2025

Uh oh!

missinglink commented Sep 16, 2025 •

edited

Loading

Uh oh!

Joxit commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat(geom,shard): ensure default simplify tolerance of 0.0001 #115

feat(geom,shard): ensure default simplify tolerance of 0.0001 #115

Uh oh!

Conversation

missinglink commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Joxit commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Database sizes

Number of shards by geometries

Number of points by shards

Gatling Configuration

Specs

Results

Viewer differences

Uh oh!

missinglink commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

missinglink commented Sep 15, 2025

Uh oh!

Joxit commented Sep 15, 2025

Uh oh!

missinglink commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joxit commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(geom,shard): ensure default simplify tolerance of `0.0001` #115

feat(geom,shard): ensure default simplify tolerance of `0.0001` #115

missinglink commented Sep 10, 2025 •

edited

Loading

Joxit commented Sep 12, 2025 •

edited

Loading

missinglink commented Sep 15, 2025 •

edited

Loading

missinglink commented Sep 16, 2025 •

edited

Loading