New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted median #176

Closed
wants to merge 83 commits into
base: svn-trunk
from

Conversation

Projects
None yet
4 participants
@Komzpa
Member

Komzpa commented Dec 22, 2017

Todo:

  • test with all negative weights;
  • Enhanced: documentation entry;
{
uint32_t i;

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

PostGIS point array size is limited to 32-bit. (I'm not sure why all counters were changed to size_t ?)

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Add comment to clarify that m ordinate is taken to be a weight

This comment has been minimized.

@Komzpa

Komzpa Dec 24, 2017

Member

Accessing 32-bit variables in 64-bit environments is slower in my experience, especially in loops. To get that right across architectures maps.me uses internal convention to use size_t for counters.

Documentation change should be there, at least Enhanced: entry.

}
else
{
hit = LW_TRUE;

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Can this statement have an effect?

This comment has been minimized.

@Komzpa

Komzpa Dec 26, 2017

Member

I think there is a possible case where we trip to a point where negative inverse weighted distances balance out positive ones, although total weight is larger than zero.

for (i = 0; i < npoints; i++)
{
distances[i] = distance3d_pt_pt(curr, &points[i]);
distances[i] = distance3d_pt_pt((POINT3D*)curr, (POINT3D*)&points[i]) * points[i].m;

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

I'd suggest using a weighted distance function that accepts two POINT4D pointers, instead of casting to POINT3D every time this is called.

This comment has been minimized.

@Komzpa

Komzpa Dec 24, 2017

Member

I'd even prefer to hand-inline the logic as it's a single line :)
Tricky part is to get naming right: the weight is stored just in one of the points, and if you pass two POINT4D's you're passing two of them.

This comment has been minimized.

@Komzpa

Komzpa Dec 26, 2017

Member

Another question: will it work with squared distance instead? that'll save an sqrt.

{
if (!FP_IS_ZERO(points[n].m))
{
n++;

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Add comment that points with zero weight are ignored

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

What happens when weights are negative?

This comment has been minimized.

@Komzpa

Komzpa Dec 24, 2017

Member

A point with negative weight will cancel out a point with positive weight. When total sum of weights is negative, the point will go to infinity. Need to add a check for that.

@@ -173,11 +198,17 @@ lwmpoint_median(const LWMPOINT* g, double tol, uint32_t max_iter, char fail_if_n
median = init_guess(points, npoints);
if (FP_IS_ZERO(median.m))

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Under what condition does our initial guess have a zero m value, and why do we want to return empty in this case?

This comment has been minimized.

@Komzpa

Komzpa Dec 27, 2017

Member

Initial guess can have zero m value if sum of negative values is equal to sum of positive values.

Let's visualize sum of distances for all cases:
larger than zero weight,
create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, 5*sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);
image

  • we have a nice minimum.

Weight of zero,
create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);
image

  • minimum is actually a line in this case. If we add more points, it shows it may be possible to find median:
    create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) + sqrt((x+10)^2+y^2) - sqrt(x^2+(y-10)^2) from generate_series(-100, 100) x, generate_series(-100,100) y);
    image

Unfortunately, this causes division by zero in centroid code now. A common solution to issues in median now seems to be "fail if asked to fail, empty othewise".

Less than zero:
create table median_0_debug as (select ST_SetSRID( ST_MakePoint(x, y), 3857) as geom, -1* sqrt(x^2+y^2) - sqrt((x-10)^2+y^2) from generate_series(-100, 100) x, generate_series(-100,100) y);
image

  • we have a peak in center, and minimum is in infinity somewhere. It's basically inverse of positive case. We really don't want to iterate to infinity, and unlikely want POINT(Infinity, Infinity) in database.
lwfree(points);
return lwpoint_construct_empty(g->srid, 0, 0);
}
distances = lwalloc(npoints * sizeof(double));
for (i = 0; i < max_iter && delta > tol; i++)

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Does weighting impact the significance of tol ?

This comment has been minimized.

@Komzpa

Komzpa Dec 26, 2017

Member

It shouldn't - steps are normalized by weighted distance, so any linear scaling of weights should cancel itself. Or is there some other case you're thinking of?

uint32_t i;
POINT3D next = { 0, 0, 0 };
size_t i;
POINT4D next = { 0, 0, 0, 1 };

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Does next need to be POINT4D ? Is the m ordinate ever accessed?

POINT3D* points = lwmpoint_extract_points_3d(g, &npoints);
POINT3D median;
POINT4D* points = lwmpoint_extract_points_4d(g, &npoints);
POINT4D median;

This comment has been minimized.

@dbaston

dbaston Dec 24, 2017

Member

Does this need to be 4D? The returned values are either 2D or 3D.

This comment has been minimized.

@strk

strk Jan 4, 2018

Member

Ansi C does not allow declaring variables after code (my compiler complains about this)

@codecov

This comment has been minimized.

codecov bot commented Dec 24, 2017

Codecov Report

Merging #176 into svn-trunk will decrease coverage by <.01%.
The diff coverage is 91.02%.

Impacted file tree graph

@@              Coverage Diff              @@
##           svn-trunk     #176      +/-   ##
=============================================
- Coverage      79.22%   79.22%   -0.01%     
=============================================
  Files            202      202              
  Lines          63011    63020       +9     
=============================================
+ Hits           49921    49927       +6     
- Misses         13090    13093       +3
Impacted Files Coverage Δ
liblwgeom/cunit/cu_algorithm.c 98.98% <77.77%> (-0.13%) ⬇️
liblwgeom/lwgeom_median.c 97.14% <95%> (-2.86%) ⬇️
liblwgeom/lwin_wkt.c 81% <0%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c638b40...b9d7f32. Read the comment docs.

@Komzpa Komzpa requested a review from robe2 Dec 25, 2017

@Komzpa Komzpa closed this Dec 25, 2017

@Komzpa Komzpa reopened this Dec 25, 2017

@robe2

robe2 approved these changes Dec 26, 2017

Looks fine to me. Only minor issue I have is the inclusion of formatting changes to Angle docs, but I'm not bothered enough about that to disapprove and I know those were probably introduced by editorconfig

@strk strk closed this in fb1cf34 Dec 26, 2017

@dbaston

This comment has been minimized.

Member

dbaston commented Dec 26, 2017

Seems reasonable, but I'd like to review and test before this is committed to trunk.

@Komzpa Komzpa reopened this Dec 27, 2017

@Komzpa

This comment has been minimized.

Member

Komzpa commented Dec 28, 2017

@dbaston
I've done some more research and found this:
http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9787.1992.tb00200.x/abstract

Basically Weiszfeld algorithm may find just a local optimum in case of negative weights, as problem is not convex anymore. It's going to be one closest to the centroid, though.

Fermat-Weber problem is constrained by positive only weights.

There are different ways I see to resolve:

@dbaston

This comment has been minimized.

Member

dbaston commented Dec 28, 2017

I think the simplest thing would be to fail on negative weights. Do you have a particular use case for negative weights?

@dbaston

This comment has been minimized.

Member

dbaston commented Dec 29, 2017

By "fail on negative weights," I mean remove support for them entirely. It seems like they add a lot of code complexity. Is there a use case that benefits from the way they're partially handled in this PR?

Komzpa added some commits Jan 4, 2018

@Komzpa

This comment has been minimized.

Member

Komzpa commented Jan 4, 2018

@dbaston please take another look. it's now fixed.

@dbaston

This comment has been minimized.

Member

dbaston commented Jan 4, 2018

@Komzpa thanks. Do you a know a way to get GitHub to compare this PR with the repository state before the weighted changed were committed?

@Komzpa

This comment has been minimized.

Member

Komzpa commented Jan 4, 2018

@dbaston set up one for you by trimming top of svn-trunk history.
It's a bit crowded but has all median as single diff.
https://github.com/Komzpa/postgis/pull/1/files

Komzpa added some commits Jan 5, 2018

@dbaston

dbaston approved these changes Jan 5, 2018

@strk strk closed this in 93d44c4 Jan 5, 2018

@Algunenano Algunenano referenced this pull request Jan 19, 2018

Closed

Fix bug in iterate_4d #196

@Algunenano Algunenano referenced this pull request Feb 5, 2018

Closed

Fix broken median tests #205

@Komzpa Komzpa deleted the Komzpa:weighted_median branch Oct 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment