Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scale handling for PrecisionModel #956

Merged
merged 18 commits into from
Sep 20, 2023

Conversation

dr-jts
Copy link
Contributor

@dr-jts dr-jts commented Sep 8, 2023

Problem

GEOS_setPrecision accepts a parameter specifying the grid size of the desired precision. The most common usage is to reduce the number of decimal places in a geometry (e.g. to 3 decimal places). In this case, the grid size is a negative power of 10 (e.g. 0.001). However, this can result in inexact rounding of ordinate values.

For example, this geosop command reproduces this PostGIS issue and shows how reducing the precision of a geometry using grid size 0.001 produces unwanted "extra decimals" (note that in geosop the reducePrecision operation parameter is negative to indicate the value is a grid size):

bin/geosop -a "LINESTRING (657035.913 6475590.114,657075.57 6475500)" reducePrecision N-0.001

LINESTRING (657035.9130000001 6475590.114, 657075.5700000001 6475500)

The same issue can happen when using a scale factor that is a negative power of 10:

bin/geosop -a "POINT (674169.89 198051.619820510769063)" reducePrecision .00001
LINESTRING (700000 199999.99999999997, 700000 1400000)

Cause

There are two causes of this problem:

  • computing the scale factor as the reciprocal of the grid size may be inexact for some values (i.e. the reciprocal of a fractional power of 10 may not be an exact integral value)
  • the arithmetic used to scale values is subject to rounding issues when non-integral values are used:
return util::round(val / gridSize) * gridSize;

Because negative powers of 10 cannot be represented exactly in binary FP numbers this can cause the resulting "rounded" ordinate values to be inexact.

In contrast, if an integral scale factor is used directly, the rounding arithmetic is:

return util::round(val * scale) / scale;

This produces the desired result:

bin/geosop -a "LINESTRING (657035.913 6475590.114,657075.57 6475500)" reducePrecision 1000   

LINESTRING (657035.913 6475590.114, 657075.57 6475500)

It appears that in FP arithmetic multiplying by an integer value is more accurate than dividing by an (approximately equal) fractional value

Proposed Fix

This PR proposes a fix which has no impact on the GEOS API or semantics (and hence on downstream usage).

The fix does two things:

  • It detects when the PrecisionModel scale factor OR the equivalent grid size is "essentially integral" (i.e. very close to an integer). If one of these is the case, then the exact integer value of grid size or scale factor is used.
  • The arithmetic to adjust ordinate values uses the larger of the scale factor and grid size. This ensures that an integral grid size or scale factor is used if present (which is almost always the case). This produces a more exact scaled value (thanks to the magic voodoo of FP arithmetic).

Almost all (or all?) uses of GEOS_setPrecision use either grid sizes equivalent to integral scale factors OR grid sizes which are obviously NOT integral (e.g. say 0.5), so the heuristic of deciding "close" is fine for practical purposes.

In the very rare cases where both grid size and scale factor are fractional, the original values are used for scaling. Since they are fractional already, the output won't be exact in any case.

As a nice side effect, this fix allows dropping support for negative scale factors, which was introduced solely to accommodate the use of grid size rather than scale factor. (This PR will be extended to delete this code if that is agreed to).

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 9, 2023

I wonder if the inverse problem may occur as well: an integral grid size (e.g. 1000) would produce undesired "slop" when converted to a reciprocal scale factor?

This might require building a fuzzer to test a lot of values to see if this occurs. It's possible that the guard digits in FP arithmetic will handle this problem.

@robe2 robe2 requested a review from strk September 9, 2023 17:59
capi/geos_ts_c.cpp Outdated Show resolved Hide resolved
Copy link
Member

@pramsey pramsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me. How do we feel about the behaviour when gridSize is > 0 (1, 10, 100, etc)?

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 11, 2023

How do we feel about the behaviour when gridSize is > 0 (1, 10, 100, etc)?

It seems to work. But perhaps should only do the integral test when the gridSize is < 1.

If the gridSize > 1, I haven't seen this produce rounding issues. Perhaps that's because the scaling is cutting off digits before the decimal point, so the FP arithmetic works better?

@strk
Copy link
Member

strk commented Sep 12, 2023

Seems ok to me, now I wonder how results compare with ST_SnapToGrid

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 12, 2023

I wonder how results compare with ST_SnapToGrid

ST_SnapToGrid has the same issue - see PG-5425 and PG-3929. That PostGIS code could benefit from a similar fix.

I will add that data as a test case.

Update

Indeed, this is exactly the same behaviour, and the new code will fix it.
(Note that using a negative value forces geosop reducePrecision to treat the value as a gridSize rather than a scale factor, which ends up using the scaling arithmetic that produces rounding errors).

bin/geosop -a "LINESTRING(674169.89 198051.38,674197.7 198065.55,674200.36 198052.38)" reducePrecision N-0.001  
-->  
LINESTRING (674169.89 198051.38, 674197.7000000001 198065.55000000002, 674200.36 198052.38)

bin/geosop -a "LINESTRING(674169.89 198051.38,674197.7 198065.55,674200.36 198052.38)" reducePrecision 1000
-->
LINESTRING (674169.89 198051.38, 674197.7 198065.55, 674200.36 198052.38)

@dr-jts dr-jts force-pushed the fix-setprecision-scale-factor branch from c5c0733 to 3d19930 Compare September 12, 2023 18:41
@mwtoews
Copy link
Contributor

mwtoews commented Sep 12, 2023

ST_SnapToGrid has the same issue - see PG-5425 and PG-3929. That PostGIS code could benefit from a similar fix.

Also recently reported on postgis-users.
As @pramsey noted a workaround for PostGIS is to use ST_AsText(g1, maxdecimaldigits = 3).
A similar workaround for GEOS is to use GEOSWKTWriter_setRoundingPrecision() / writer->setRoundingPrecision()

@mwtoews
Copy link
Contributor

mwtoews commented Sep 12, 2023

Another quirk that could be worth fixing here or in a separate issue is 0.001 or N-1000:

$ ./bin/geosop -a "LINESTRING (657035.913 6475590.114,657075.57 6475500)" reducePrecision 0.001
LINESTRING Z EMPTY

where the expected result should have been:

LINESTRING (657000 6476000, 657000 6476000)

And what should happen with N-100000000? LINESTRING EMPTY? or LINESTRING (0 0, 0 0)?

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 12, 2023

Also recently reported on postgis-users.

Good find. There are lots of examples of this occurring in ST_SnapToGrid! All solved by this fix.

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 12, 2023

Another quirk that could be worth fixing here or in a separate issue

@mwtoews This is a different problem, so should be a separate issue. (Is there already an issue for this?)

Technically zero-length linestrings are invalid, but it seems good to bend the rules in this case.

what should happen with N-100000000? LINESTRING EMPTY? or LINESTRING (0 0, 0 0)?

Probably the latter - so as to not lose information.

@dr-jts
Copy link
Contributor Author

dr-jts commented Sep 13, 2023

How do we feel about the behaviour when gridSize is > 0 (1, 10, 100, etc)?

I have found a case where a large gridsize (100,000) causes rounding error when using the reciprocal scale factor.

Using the gridsize 100,000 directly works:

bin/geosop -a "LINESTRING(674169.89 198051.619820510769063, 674197.71234 1448065.55674200)" -f wkt reducePrecision N-100000
LINESTRING (700000 200000, 700000 1400000)

Using the reciprocal scale factor of 0.00001 causes rounding error:

bin/geosop -a "LINESTRING(674169.89 198051.619820510769063, 674197.71234 1448065.55674200)" -f wkt reducePrecision .00001
LINESTRING (700000 199999.99999999997, 700000 1400000)

To resolve this, gridsizes > 1 need to be computed using the more stable arithmetic formula:

return util::round(val / gridSize) * gridSize;

Also, it's best to move the integral rounding logic into PrecisionModel, so it is widely available.

capi/geos_ts_c.cpp Outdated Show resolved Hide resolved
@dr-jts dr-jts force-pushed the fix-setprecision-scale-factor branch from 88909c1 to 1bede1e Compare September 20, 2023 19:47
@dr-jts dr-jts merged commit a423241 into libgeos:main Sep 20, 2023
27 checks passed
@dr-jts dr-jts deleted the fix-setprecision-scale-factor branch September 20, 2023 22:11
@dr-jts dr-jts changed the title Improve scale handling for GEOS_setPrecision Improve scale handling for PrecisionModel Sep 20, 2023
dr-jts added a commit that referenced this pull request Sep 22, 2023
@dr-jts
Copy link
Contributor Author

dr-jts commented Dec 6, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants