Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: prevent integer overflow in EllipseModel #5179

Merged
merged 4 commits into from
Jan 21, 2021

Conversation

mark-boer
Copy link
Contributor

Description

This is a bugfix to issue #5042. It makes sure that the EllipseModel and CircleModel do not cause integer overflows

Checklist

I have a question, while trying to implement a unit test I ran into another bug. The test is still implemented, but is marked with a pytest.mark.skip. How would you like to handle this?

Also, does the code need to pass any kind of static analysis? Like black or flake8?

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@pep8speaks
Copy link

pep8speaks commented Jan 10, 2021

Hello @mark-boer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-21 02:12:32 UTC

@mark-boer
Copy link
Contributor Author

Found the reason the test I wrote is failing, the input data chosen, leads to a ridiculously small eigenvalue, which in turn led to significant floating point errors in the matrix inverse. It is probably a solution to first subtract the centroid of the input data and then add this centroid to the center of the Circle. Also I found that the algebraic method used, is not actually a proper least square fit. I'll create a new issue.

@mark-boer
Copy link
Contributor Author

The behaviour could be improved by using something like this:

A = np.append(xy * 2, np.ones((xy.shape[0], 1)), axis=1)
f = np.sum(xy ** 2, axis=1)
C, _, _, _ = np.linalg.lstsq(A, f, rcond=None)
center = C[0:2]
distances = scipy.spatial.minkowski_distance(center,xy)
r = np.sqrt(np.mean(distances ** 2))

See e.g. sphere fit i wrote a year ago, this contains 2 fits, one fast algebraic one and one iterative one:
https://github.com/mark-boer/geomfitty/blob/4f24071f524302a95144dbd91bc84fa33c5dc26b/geomfitty/fit3d.py#L42

@stefanv
Copy link
Member

stefanv commented Jan 17, 2021

Thanks for the fix, @mark-boer. I did a quick check and it looks like vstack does type promotion already. Could you explain the rationale behind the fix (casting the data to float)?

Before merging, we should remove the failing test and address that as part of your other issue.

@mark-boer
Copy link
Contributor Author

The vstack on which line do you mean? The EllipseModel overflows in the following line:

S1 = D1.T @ D1

For more info on what goes wrong see issue #5042.

@jni
Copy link
Member

jni commented Jan 19, 2021

@stefanv I think the point of the cast might be to keep things at the same float bit depth for both D1 and D2? (ie both float32 or both float64, depending on input dtype).

@jni
Copy link
Member

jni commented Jan 19, 2021

I agree with @stefanv that ideally the skipped test should instead get added in a separate PR that fixes the issue, but I'm not that fussed if we instead just merge this as-is and remove the skip marker in a follow-up PR.

@scikit-image/core any others up for review?

Thank you @mark-boer!

@mark-boer
Copy link
Contributor Author

@jni, I changed the input data so it cast to a float of at least 32 bits. But if the user chooses to call the function with a float64, it is not cast down to a float32. If that doesn't make sense, we can also just forcibly cast it to a float64.

I will remove the failing test this evening and create a new PR with the fix to this failing test.

@jni
Copy link
Member

jni commented Jan 19, 2021

I changed the input data so it cast to a float of at least 32 bits. But if the user chooses to call the function with a float64, it is not cast down to a float32. If that doesn't make sense, we can also just forcibly cast it to a float64.

Makes perfect sense, actually! I think the fact that dtype was set for the second vstack and not the first confused @stefanv upon reading the code. Took me a while as well after reading his comment. Perhaps adding a dtype= to the vstack calls, even if redundant, will make the code clearer? Or maybe just add a comment to the effect of what you just wrote?

@mark-boer
Copy link
Contributor Author

mark-boer commented Jan 19, 2021

The dtype, is actually an argument of the np.ones, even though this is not really necessary , I thought that saved an additional cast.

edit: I will add a comment to the code 😉 that explains the why

x = data[:, 0]
y = data[:, 1]

# Quadratic part of design matrix [eqn. 15] from [1]
D1 = np.vstack([x ** 2, x * y, y ** 2]).T
# Linear part of design matrix [eqn. 16] from [1]
D2 = np.vstack([x, y, np.ones(len(x))]).T
D2 = np.vstack([x, y, np.ones(len(x), dtype=float_type)]).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
D2 = np.vstack([x, y, np.ones(len(x), dtype=float_type)]).T
D2 = np.vstack([x, y, np.ones_like(x)]).T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's even better :-)

Copy link
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is very readable now @mark-boer. I left some minor suggestions, but they're mostly stylistic.

skimage/measure/tests/test_fit.py Outdated Show resolved Hide resolved
skimage/measure/tests/test_fit.py Outdated Show resolved Hide resolved
skimage/measure/tests/test_fit.py Outdated Show resolved Hide resolved
mark-boer and others added 2 commits January 21, 2021 01:02
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gonna approve this some more for good measure. =P

@stefanv stefanv merged commit 9b05cac into scikit-image:master Jan 21, 2021
@stefanv
Copy link
Member

stefanv commented Jan 21, 2021

Thank you very much, @mark-boer

rfezzani added a commit that referenced this pull request Jan 22, 2021
* resize using scipy.ndimage.zoom

* Clip negative values in output of RGB to HED conversion. Fixes #5164

* Fixed stain separation tests.

- Round trips start with stains, such that the roundtrip does not require
negative values.
- Using a blue+red+orange stain combination, the test previously used
a 2-stain combination that made it impossible to do a correct roundtrip.

* Improve IHC stain separation example.

* BENCH: add benchmarks for resize and rescale

* move mode conversion code to _shared.utils.py

use grid-constant and grid-wrap for SciPy >= 1.6.0

closes gh-5064

* separate grid mode conversion utility

* Update doc/examples/color_exposure/plot_ihc_color_separation.py

Improvement to the IHC example.

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* add TODO for removing legacy code paths

* avoid pep8 warning in benchmark_interpolation.py

* keep same ValueError on unrecognized modes

* test_rank.py: fetch reference data once into a dictionary of arrays (#5175)

This should avoid stochastic test failures that have been occuring on GitHub Actions

* Add normalized mutual information metric (#5158)

* Add normalized mutual information metric

* Add tests for NMI

* Add NMI to metrics init

* Fix error message for incorrect target shape

* Add 3D NMI test

* properly add nmi to metrics init

* _true, _test -> 0, 1

* ravel -> reshape(-1)

* Apply suggestions from @mkcor's code review

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

* Fix warning and exception messages

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

* segment Other section of TODO by date

based on NEP-29 24 month window for dropping upstream package version

* Fix bug in optical flow code and examples

mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge'

* fix additional cases of warp with 'nearest'->'edge'

* DOC: Fixup to linspace in example (#5181)

Right now the data is not sampled every half degree. Also, -pi/2 to pi/2.

* Minor fixups to Hough line transform code and examples (#5182)

* Correction to linspace call in hough_line default

`theta = np.pi / 2 - np.arange(180) / 180.0 * np.pi` done correctly below.

* BUG: Fixup to image orientation in example

* DOC: Tiny typo

* Changed range to conform to documentation

The original was [pi/2, -pi/2), not [-pi/2, pi/2). Updated wording in docs.

* Fixed x-bounds of hough image in example

* Fixed long line

* Fixes to tests

Number of lines in checkerboard, longer line in peak ordering assertion, updated angle

* Accidentally modified the wrong value.

* Updated example to include three lines

As requested in review

* Fixed off-by one error in pixel bins in hough line transform (#5183)

* Added 1/2 pixel bounds to extent of displayed images (#5184)

* fast, non-Cython implementation for correlate_sparse (#5171)

* Replace correlate_sparse Cython version with a simple slicing-based variant in _mean_std

This slicing variant has two benefits:
1.) profiled 30-40% faster on 2d and 3d cases I tested with
2.) can immediately be used by other array backends such as CuPy

* remove intermediate padded_sq variable to reduce memory footprint

* Update skimage/filters/_sparse.py

remove duplicate declarations

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* move _to_np_mode and _to_ndimage_mode to _shared.utils.py

* add comment about corner_index needing to be present

* raise RuntimeError on missing corner index

* update code style based on reviewer comments

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* Add release step on github to RELEASE.txt (See #5185) (#5187)

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

* Prevent integer overflow in EllipseModel (#5179)

* Change dtype of CircleModel and EllipseModel to float, to avoid integer overflows

* remove failing test and add code review notes

* Apply suggestions from code review

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

* Remove unused pytest import

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

* Add saturation parameter to color.label2rgb (#5156)

* Remove reference to opencv in threshold_local documentation (#5191)

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Riadh <r.fezzani@vitadx.com>

* resize using scipy.ndimage.zoom

* BENCH: add benchmarks for resize and rescale

* move mode conversion code to _shared.utils.py

use grid-constant and grid-wrap for SciPy >= 1.6.0

closes gh-5064

* separate grid mode conversion utility

* add TODO for removing legacy code paths

* avoid pep8 warning in benchmark_interpolation.py

* keep same ValueError on unrecognized modes

* segment Other section of TODO by date

based on NEP-29 24 month window for dropping upstream package version

* Fix bug in optical flow code and examples

mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge'

* fix additional cases of warp with 'nearest'->'edge'

* Apply suggestions from code review

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

Co-authored-by: Cris Luengo <cris.l.luengo@gmail.com>
Co-authored-by: Cris Luengo <crisluengo@users.noreply.github.com>
Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>
Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Joseph Fox-Rabinovitz <madphysicist@users.noreply.github.com>
Co-authored-by: François Boulogne <fboulogne@sciunto.org>
Co-authored-by: Mark Boer <m.h.boer.2@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Carlos Andrés Álvarez Restrepo <charlie_cha@outlook.com>
Co-authored-by: Riadh <r.fezzani@vitadx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants