# Weighted median #176

Closed
wants to merge 83 commits into
from

## Conversation

Projects
None yet
4 participants
Member

### Komzpa commented Dec 22, 2017 • edited

 Todo: test with all negative weights; `Enhanced:` documentation entry;

### dbaston reviewed Dec 24, 2017

 { uint32_t i;

#### dbaston Dec 24, 2017

Member

PostGIS point array size is limited to 32-bit. (I'm not sure why all counters were changed to `size_t` ?)

#### dbaston Dec 24, 2017

Member

Add comment to clarify that `m` ordinate is taken to be a weight

#### Komzpa Dec 24, 2017

Member

Accessing 32-bit variables in 64-bit environments is slower in my experience, especially in loops. To get that right across architectures maps.me uses internal convention to use `size_t` for counters.

Documentation change should be there, at least `Enhanced:` entry.

 } else { hit = LW_TRUE;

#### dbaston Dec 24, 2017

Member

Can this statement have an effect?

#### Komzpa Dec 26, 2017

Member

I think there is a possible case where we trip to a point where negative inverse weighted distances balance out positive ones, although total weight is larger than zero.

 for (i = 0; i < npoints; i++) { distances[i] = distance3d_pt_pt(curr, &points[i]); distances[i] = distance3d_pt_pt((POINT3D*)curr, (POINT3D*)&points[i]) * points[i].m;

#### dbaston Dec 24, 2017

Member

I'd suggest using a weighted distance function that accepts two `POINT4D` pointers, instead of casting to `POINT3D` every time this is called.

#### Komzpa Dec 24, 2017

Member

I'd even prefer to hand-inline the logic as it's a single line :)
Tricky part is to get naming right: the weight is stored just in one of the points, and if you pass two `POINT4D`'s you're passing two of them.

#### Komzpa Dec 26, 2017

Member

Another question: will it work with squared distance instead? that'll save an `sqrt`.

 { if (!FP_IS_ZERO(points[n].m)) { n++;

Member

#### dbaston Dec 24, 2017

Member

What happens when weights are negative?

#### Komzpa Dec 24, 2017

Member

A point with negative weight will cancel out a point with positive weight. When total sum of weights is negative, the point will go to infinity. Need to add a check for that.

 @@ -173,11 +198,17 @@ lwmpoint_median(const LWMPOINT* g, double tol, uint32_t max_iter, char fail_if_n median = init_guess(points, npoints); if (FP_IS_ZERO(median.m))

#### dbaston Dec 24, 2017

Member

Under what condition does our initial guess have a zero m value, and why do we want to return empty in this case?

#### Komzpa Dec 27, 2017 • edited

Member

Initial guess can have zero m value if sum of negative values is equal to sum of positive values.

Let's visualize sum of distances for all cases:
larger than zero weight,
`create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, 5*sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);`

• we have a nice minimum.

Weight of zero,
`create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);`

• minimum is actually a line in this case. If we add more points, it shows it may be possible to find median:
`create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) + sqrt((x+10)^2+y^2) - sqrt(x^2+(y-10)^2) from generate_series(-100, 100) x, generate_series(-100,100) y);`

Unfortunately, this causes division by zero in centroid code now. A common solution to issues in median now seems to be "fail if asked to fail, empty othewise".

Less than zero:
`create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, -1* sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);`

• we have a peak in center, and minimum is in infinity somewhere. It's basically inverse of positive case. We really don't want to iterate to infinity, and unlikely want `POINT(Infinity, Infinity)` in database.
 lwfree(points); return lwpoint_construct_empty(g->srid, 0, 0); } distances = lwalloc(npoints * sizeof(double)); for (i = 0; i < max_iter && delta > tol; i++)

#### dbaston Dec 24, 2017

Member

Does weighting impact the significance of `tol` ?

#### Komzpa Dec 26, 2017

Member

It shouldn't - steps are normalized by weighted distance, so any linear scaling of weights should cancel itself. Or is there some other case you're thinking of?

 uint32_t i; POINT3D next = { 0, 0, 0 }; size_t i; POINT4D next = { 0, 0, 0, 1 };

#### dbaston Dec 24, 2017

Member

Does `next` need to be `POINT4D` ? Is the `m` ordinate ever accessed?

 POINT3D* points = lwmpoint_extract_points_3d(g, &npoints); POINT3D median; POINT4D* points = lwmpoint_extract_points_4d(g, &npoints); POINT4D median;

#### dbaston Dec 24, 2017

Member

Does this need to be 4D? The returned values are either 2D or 3D.

Member

# Codecov Report

Merging #176 into svn-trunk will decrease coverage by `<.01%`.
The diff coverage is `91.02%`.

```@@              Coverage Diff              @@
##           svn-trunk     #176      +/-   ##
=============================================
- Coverage      79.22%   79.22%   -0.01%
=============================================
Files            202      202
Lines          63011    63020       +9
=============================================
+ Hits           49921    49927       +6
- Misses         13090    13093       +3```
Impacted Files Coverage Δ
liblwgeom/cunit/cu_algorithm.c `98.98% <77.77%> (-0.13%)` ⬇️
liblwgeom/lwgeom_median.c `97.14% <95%> (-2.86%)` ⬇️
liblwgeom/lwin_wkt.c `81% <0%> (+0.26%)` ⬆️

`Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`

### robe2 approved these changes Dec 26, 2017

Looks fine to me. Only minor issue I have is the inclusion of formatting changes to Angle docs, but I'm not bothered enough about that to disapprove and I know those were probably introduced by editorconfig

Member

### dbaston commented Dec 26, 2017

 Seems reasonable, but I'd like to review and test before this is committed to trunk.

Member

### Komzpa commented Dec 28, 2017

 @dbaston I've done some more research and found this: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9787.1992.tb00200.x/abstract Basically Weiszfeld algorithm may find just a local optimum in case of negative weights, as problem is not convex anymore. It's going to be one closest to the centroid, though. Fermat-Weber problem is constrained by positive only weights. There are different ways I see to resolve: add warning to documentation that in case of negative weights it may be just local optimum. fail immediately on negative weights input. we can fail not always, but only on fail_if_not_converged is true - local optimum is a kind of non-converged global optimum :) replace Weiszfeld algorithm with another tweaked version, proposed say in this paper: https://www.researchgate.net/publication/265359176_The_Weber_Problem_On_The_Plane_With_Some_Negative_Weights
Member

### dbaston commented Dec 28, 2017

 I think the simplest thing would be to fail on negative weights. Do you have a particular use case for negative weights?
Member

### dbaston commented Dec 29, 2017

 By "fail on negative weights," I mean remove support for them entirely. It seems like they add a lot of code complexity. Is there a use case that benefits from the way they're partially handled in this PR?

### Komzpa added some commits Dec 29, 2017

``` Fail convergence if multipoint weight is negative. ```
``` 3bc9409 ```
``` whitespace fix ```
``` 4e848e4 ```
``` Update cu_algorithm.c ```
``` fecf6c4 ```
``` Let the weights be non-negative ```
``` c966972 ```
``` Update cu_algorithm.c ```
``` 6bb235b ```
``` Update lwgeom_median.c ```
``` 727e8b2 ```
``` Update cu_algorithm.c ```
``` b48024e ```
``` Update lwgeom_median.c ```
``` 0fa3a4c ```
``` Update lwgeom_median.c ```
``` 980034c ```
``` Update lwgeom_median.c ```
``` 91bfa22 ```
``` Update lwgeom_median.c ```
``` 75785c6 ```
``` Update lwgeom_median.c ```
``` 4abe248 ```

### Komzpa added some commits Jan 4, 2018

``` never free, never me ```
``` 4d67045 ```
``` Update lwgeom_median.c ```
``` eb0cc84 ```
``` Update lwgeom_median.c ```
``` 4fdfb99 ```
``` Update lwgeom_median.c ```
``` 5fd7903 ```
``` Update .travis.yml ```
``` ed2de41 ```
``` Update .travis.yml ```
``` 4c05a31 ```
``` Update lwgeom_median.c ```
``` af09540 ```
``` gcc7.2 warning fix ```
``` 84d2df9 ```
``` trailing spaces ```
``` 909d9e9 ```
``` Update lwgeom_median.c ```
``` bbbe584 ```
``` Update lwgeom_median.c ```
``` 633e89b ```
``` gcc7.2 warning ```
``` 9e3751b ```
``` Update lwgeom_median.c ```
``` 709b8ab ```
``` Update cu_algorithm.c ```
``` 2db2f2c ```
``` Update lwgeom_median.c ```
``` 420e691 ```
``` Update lwgeom_median.c ```
``` be12615 ```
``` Update cu_algorithm.c ```
``` d16ae69 ```
Member

### Komzpa commented Jan 4, 2018

 @dbaston please take another look. it's now fixed.
Member

### dbaston commented Jan 4, 2018

 @Komzpa thanks. Do you a know a way to get GitHub to compare this PR with the repository state before the weighted changed were committed?
Member

### Komzpa commented Jan 4, 2018

 @dbaston set up one for you by trimming top of svn-trunk history. It's a bit crowded but has all median as single diff. https://github.com/Komzpa/postgis/pull/1/files

### Komzpa added some commits Jan 5, 2018

``` review fixes ```
``` 0faff83 ```
``` Update lwgeom_median.c ```
``` a26f9d8 ```
``` Update lwgeom_median.c ```
``` 94b08b1 ```
``` Update lwgeom_median.c ```
``` e67abdd ```
``` Update lwgeom_median.c ```
``` b9d7f32 ```

Closed

Closed