optimisation of grayscale morphology operators #598

Morgane55440 · 2024-05-08T17:09:39Z

This PR has been made following the issue #597 : optimization of grayscale morphology operators

it aims to work on the internal implementation of the Mask and associated functions to optimize the execution time, as well as add parallel versions of some public functions using rayon.

in the first commit that i have made, i have added bench for the Mask creation methods and have reworked the implementation of the from_image methods, which allowed a 4x speed increase (200 ms to 50ms) for a large image (200 by 200 pixels). i would expect the improvement to be less significant for smaller images, although i have not tested it.

it was brought to my attention that i was accessing image pixels in col-major order. That was due to a misunderstanding of the image layout from my part. In light of this, i have reworked the implementation of the Mask to facilitate and incentivize access in row-major order by storing the positions row by row.

i have also been advised to change (i16, i16) to Point for readability, which i will do in a further commit.

Finally, through my testing, i tried to use unsafe_get_pixel to optimize my get_pixel calls. While it was very efficient, i found that accessing the buffer directly and using the chunks method was faster by around 17% (~65 to ~50microsecond ). i tested it several times, and the slowest chunks time was faster than the fastest unsafe_get_pixel by 10 ms.

i believe that it might be interesting to look at for other uses of unsafe_get_pixel which traverse the image line by line, as it potentially could be the case that directly traversing a [u8] might allow the compiler to do better optimizations than when these accesses are obfuscated through unsafe_get_pixel

…e-by-line

Morgane55440 · 2024-05-08T17:28:21Z

next i'm changing the (i16, i16) to Point, benching grayscale_erode and grayscale_dilate, and testing different implementations

cospectrum · 2024-05-08T18:08:29Z

What about this? Similar performance. I'm a fan of image methods, the devs have already come up with effective access to pixels

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

…eems slightly faster)

Morgane55440 · 2024-05-08T19:09:37Z

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

thanks a ton ! it is just as fast, if not slightly faster (seems maybe 5-10 microsecond faster on average, but there was overlap)

Morgane55440 · 2024-05-08T19:10:55Z

i am not super knowledgeable on all the image methods, but i'll look into them more

Morgane55440 · 2024-05-08T19:12:20Z

i think i probably should look into the other constructors

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:     185,236 ns/iter (+/- 19,262)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:     112,540 ns/iter (+/- 36,400)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      47,545 ns/iter (+/- 1,532)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      43,016 ns/iter (+/- 1,629)

(they all make masks of the same size)

cospectrum · 2024-05-08T20:55:43Z

The new release is scheduled for May 14th, if anything. I think you will make it in time

src/morphology.rs

Morgane55440 · 2024-05-08T22:32:59Z

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:      17,375 ns/iter (+/- 1,134)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:      33,289 ns/iter (+/- 3,166)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      50,766 ns/iter (+/- 2,226)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      40,882 ns/iter (+/- 2,205)

pretty happy with this. going to sleep. Tomorrow i'll work on making the disk mask code clearer, and then i'll move on to erode and dilate

Morgane55440 · 2024-05-09T09:18:03Z

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

Morgane55440 · 2024-05-09T11:10:35Z

initial state of benches :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:  87,424,590 ns/iter (+/- 3,543,468)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:   2,013,255 ns/iter (+/- 65,107)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:   3,414,165 ns/iter (+/- 91,476)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench: 133,544,210 ns/iter (+/- 30,206,688)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:      81,142 ns/iter (+/- 7,243)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:     242,079 ns/iter (+/- 58,845)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   6,978,410 ns/iter (+/- 267,406)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       5,356 ns/iter (+/- 636)

test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:  87,889,700 ns/iter (+/- 24,124,426)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:   2,287,300 ns/iter (+/- 64,361)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:   3,316,540 ns/iter (+/- 50,421)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench: 134,237,150 ns/iter (+/- 29,765,530)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:      99,185 ns/iter (+/- 10,606)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:     213,802 ns/iter (+/- 5,383)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   7,203,760 ns/iter (+/- 2,410,854)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       6,268 ns/iter (+/- 1,050)

cospectrum · 2024-05-09T13:50:44Z

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

Morgane55440 · 2024-05-09T13:57:41Z

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

it is, but it's 3-4 times slower

Morgane55440 · 2024-05-09T14:00:52Z

what i did there was, instead of just checking if each pixel is inside or outside the disk, i computed the edges of the disk, and then filled it in.

it is messier, but much faster. i would like to maybe make it less messy without necessarily sacrificing speed

ripytide · 2024-05-09T14:22:59Z

src/morphology.rs

-        let range = -(radius as i16)..=(radius as i16);
-        Self {
-            elements: range
+        let radius_squared = (radius as u32) * (radius as u32);


Is there a reason why clippy::cast_lossless is an allowed lint? It's quite confusing when reading a lot of casts from u8 to bigger types like u16 and u32 when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.

i know very little about clippy, i have no idea. i just run clippy and do what it tells me because i has always made sense so far. if i should change anything please tell me. in this specific function, i cast the i16s as u32s to prevent potential overflow from the squaring

Apologies my question was more about this projects clippy settings and not your PR (it just happens to be the first PR I've noticed it on), if you're interested though you can read about the lint here: https://rust-lang.github.io/rust-clippy/master/index.html#cast_lossless. In this scenario I would recommend using u32::from(radius) or using x.into() methods over explicit casts. Ideally, clippy would enforce this for us however which was my wider point although that's not related to this particular PR.

thanks, that makes a lot of sense, i've tried to use as many froms and into()s as possible.

my next commit will take a little while because i've been working on the optimization of erode and dilate too.

Is there a reason why clippy::cast_lossless is an allowed lint? It's quite confusing when reading a lot of casts from u8 to bigger types like u16 and u32 when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.

I think the reasons are again the same as in the case of unused imports. Someone wrote code a long time ago that violates the rules, and it will take some time to rewrite it. Personally, I would remove all allows (in lib.rs), and where the rules still need to be violated, let it be localized to a function or module.

If that is the case, I 100% agree with your approach. I can draft a PR to make that change but it would be quite a lot of code churn it might cause merge conflicts with some in-flight PRs. Perhaps after the next release if the PRs die down a bit we can give it a shot then.

Morgane55440 · 2024-05-09T17:14:49Z

the new commit is faster by about 20-30% for big images, but it is utterly unreadable, i will rework it to make it even faster and significantly more readable

also, i did not know about the minimum supported version, will rewrite the offending code

cospectrum · 2024-05-10T13:16:27Z

You can write simple reference implementations (in mod tests) and test against them. They will also be useful in the future.

src/morphology.rs

Morgane55440 · 2024-05-12T15:23:55Z

new benches for erode() and dilate() :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:   1,906,060 ns/iter (+/- 57,854)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:      71,312 ns/iter (+/- 28,722)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:     146,737 ns/iter (+/- 4,326)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench:   8,119,980 ns/iter (+/- 1,010,798)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:       4,376 ns/iter (+/- 3,639)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:      42,703 ns/iter (+/- 3,684)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   1,232,370 ns/iter (+/- 361,772)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       1,036 ns/iter (+/- 83)

test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:   1,837,840 ns/iter (+/- 30,849)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:      64,265 ns/iter (+/- 1,396)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:     158,506 ns/iter (+/- 11,497)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench:   8,156,330 ns/iter (+/- 2,925,125)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:       4,266 ns/iter (+/- 115)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:      41,219 ns/iter (+/- 777)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   1,202,150 ns/iter (+/- 300,326)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       1,025 ns/iter (+/- 18)

Morgane55440 · 2024-05-12T16:02:04Z

ok, so it all seems good, just 2 things i was thinking about :

first, small and simple : i think it would be good to refactor the erode and dilate code into a single function, something like
per_pixel_mask_based_reduce(image: &GrayImage, mask: &Mask, neutral : u8, operator : Fn (u8,u8) -> u8)i'd like a second opinion to be sure, especially on the name
second, less simple, maybe not super relevant anymore : i mentioned adding parallel versions of the user-facing functions in the beginning of the PR. that wasn't the focus at first, but i think it could easily be completely finished by tomorrow. of course, it might make more sense to leave that for another PR and therefore another release

Morgane55440 · 2024-05-12T16:02:32Z

then it should all be ready to merge

Morgane55440 · 2024-05-13T11:59:13Z

i did the refactor, if anyone has a better function name idea, i'm all ears, otherwise, i think this could be good to merge

it's not user-facing, so it's not a big deal

ripytide

Excellent work!

src/morphology.rs

Co-authored-by: ripytide <62516857+ripytide@users.noreply.github.com>

Morgane55440 · 2024-05-13T14:35:28Z

@ripytide
all of the comments remaining are about Mask::new().

this was not part of my intended code, i added it due to a suggestion by @cospectrum , and i think it was a mistake on my part

You can add private new and call it instead of Self { elements } everywhere

fn new(elements: Vec<Point<i16>>) -> Self {
    assert!(elements.len() <= (511 * 511) as usize);
    Self { elements } 
}

the code that i want to make guarantees that for all Masks :
all the integer values will be strictly between -512 and 512

all tuples will be sorted in lexicographic order
all tuples will be sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
no tuple shall appear twice

this shall be true for all Masks, and is garanteed by the current implementation of all the user-facing constructor.
i think the new should just be removed. the creation of custiom maks is already possible through from_image

Morgane55440 · 2024-05-13T14:40:29Z

maybe i could add tests that check the conditions are always met ?

ripytide · 2024-05-13T14:51:28Z

Ah okay, that makes sense. Since there are tests already I would expect them to fail if any of the constructors stopped abiding by the correct ordering so you probably don't need to write explicit tests for checking the ordering as that is just an implementation detail.

Morgane55440 · 2024-05-13T15:05:55Z

well then it should be good as-is

i could add a comment above the initializations, something like :

 // direct initialization stands as guarantee that all conditions mentioned by the doc are fulfilled for the struct and each of it's parts
Self { elements }

Morgane55440 · 2024-05-13T16:28:13Z

thanks a ton for typos @ripytide . i fixed the others it showed me, including some in distance_transform because i wrote them, they seemed relevant and there are no other PR on the subject atm

cospectrum · 2024-05-13T17:31:02Z

Why did you remove new?

Morgane55440 · 2024-05-13T17:49:41Z

@cospectrum because it didn't do anything and was confusing.

each constructor guarantees the struct it returns is valid in it's own way.
the existence of the new method implied that it was validating the data.
it wasn't, and making it truly validate the data would require it to iterate through and check the whole vector, which would slow down all the constructors just to double check something that was already assured.

Morgane55440 · 2024-05-13T17:51:59Z

replacing Mystruct { my_data } by Mystruct::new(my_data ) is great, except if you're already in a constructor which is doing it's own validation

Morgane55440 · 2024-05-13T18:11:24Z

i think i have an alternative solution that may help documentation, to make clearer what's happening:

    /// new_unchecked creates a new mask without doing any form of data validation
    /// 
    /// # Safety
    /// 
    /// this method is not user-facing and marked `unsafe` because it may lead 
    /// to invalid state if called with the wrong arguments.
    /// 
    /// By calling new_unchecked, you guarantee that :
    /// - all the integer values in `elements` will be strictly between -512 and 512
    /// - the maximum L_inf distance between any 2 points in `elements` is 512
    /// - all Points in `elements` are sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
    /// - no point appears twice in `elements`
    unsafe fn new_unchecked(elements: Vec<Point<i16>>) -> Self {
        Self { elements }
    }

Morgane55440 · 2024-05-13T18:14:49Z

none of it is user-facing, so it's not a huge deal, but it probably would be helpful for maintainability to have the danger laid out by the unsafe block and all the documentation stored in the same place

cospectrum · 2024-05-13T19:44:15Z

except if you're already in a constructor which is doing it's own validation

None of your constructors do real validation.
Comments prove nothing, asserts do

the existence of the new method implied that it was validating the data. It wasn't

It does 1 check. One is better than nothing. And it may have more in the future.

and making it truly validate the data would require it to iterate through and check the whole vector

That's why debug_assert exists

Morgane55440 · 2024-05-13T20:01:58Z

so you would prefer a new method with debug_asserts ? i can make it if you think it's the best way to guarantee safety.

i'm just trying to make code as good as possible, following the advice and criticisms i'm given, this is all work in progress.

theotherphil · 2024-05-14T07:05:58Z

Thanks @Morgane55440 ! It’ll be interesting to see if we can copy the chunks approach in other functions for decent speed ups and less unsafe code.

benched mask gen + optimized from_image + switched ordering to be lin…

ebc8a28

…e-by-line

switched to using point for Masks + rewrote using enumerate_pixels (s…

ab0464f

…eems slightly faster)

rewrote the constructors + 10x improved diamond speed

9e272cc

cospectrum reviewed May 8, 2024

View reviewed changes

src/morphology.rs Outdated Show resolved Hide resolved

sped up disk mask 3-4x + removed panic in diamond mask

8290d0c

hopefully improved readability

b2b43b9

added benches

0602815

ran cargo fmt

118d177

ripytide reviewed May 9, 2024

View reviewed changes

morgane.baizeau added 2 commits May 9, 2024 18:58

faster but much uglier dilate + froms and intos in disk

62eb25d

clippy

7a670f7

morgane.baizeau added 2 commits May 9, 2024 20:18

fixed version issue by making the code worse (not slower tho)

0561c0d

still unreadable but working again

99bcdde

cospectrum reviewed May 10, 2024

View reviewed changes

src/morphology.rs Outdated Show resolved Hide resolved

cospectrum reviewed May 10, 2024

View reviewed changes

src/morphology.rs Outdated Show resolved Hide resolved

cospectrum reviewed May 10, 2024

View reviewed changes

src/morphology.rs Outdated Show resolved Hide resolved

removed apply(), fixed lines()

95b56cb

theotherphil mentioned this pull request May 12, 2024

Release version 0.25 #595

Closed

settled on "mask_reduce" for now

eda6236

Morgane55440 marked this pull request as ready for review May 13, 2024 12:00

ripytide reviewed May 13, 2024

View reviewed changes

Morgane55440 and others added 2 commits May 13, 2024 16:24

typo

66a7551

Co-authored-by: ripytide <62516857+ripytide@users.noreply.github.com>

typo

d25a4f9

Co-authored-by: ripytide <62516857+ripytide@users.noreply.github.com>

morgane.baizeau added 2 commits May 13, 2024 16:43

removed the Mask::new() that causes more problems than it solves

57a231f

fmt

4aaca0b

typos

17caad5

theotherphil merged commit 03f6f5c into image-rs:master May 14, 2024
15 checks passed

Morgane55440 mentioned this pull request May 14, 2024

initialization of morphology::Mask #604

Closed

ripytide mentioned this pull request May 19, 2024

Remove top-level clippy allows #616

Closed

optimisation of grayscale morphology operators #598

optimisation of grayscale morphology operators #598

Conversation

Morgane55440 commented May 8, 2024 • edited

Morgane55440 commented May 8, 2024

cospectrum commented May 8, 2024 • edited

Morgane55440 commented May 8, 2024 • edited

Morgane55440 commented May 8, 2024 • edited

Morgane55440 commented May 8, 2024

cospectrum commented May 8, 2024

Morgane55440 commented May 8, 2024

Morgane55440 commented May 9, 2024

Morgane55440 commented May 9, 2024 • edited

cospectrum commented May 9, 2024

Morgane55440 commented May 9, 2024

Morgane55440 commented May 9, 2024

ripytide May 9, 2024

Choose a reason for hiding this comment

Morgane55440 May 9, 2024 • edited

Choose a reason for hiding this comment

ripytide May 9, 2024 • edited

Choose a reason for hiding this comment

Morgane55440 May 9, 2024 • edited

Choose a reason for hiding this comment

cospectrum May 9, 2024

Choose a reason for hiding this comment

ripytide May 9, 2024

Choose a reason for hiding this comment

Morgane55440 commented May 9, 2024 • edited

cospectrum commented May 10, 2024 • edited

Morgane55440 commented May 12, 2024

Morgane55440 commented May 12, 2024 • edited

Morgane55440 commented May 12, 2024

Morgane55440 commented May 13, 2024 • edited

ripytide left a comment

Choose a reason for hiding this comment

Morgane55440 commented May 13, 2024 • edited

Morgane55440 commented May 13, 2024

ripytide commented May 13, 2024

Morgane55440 commented May 13, 2024

Morgane55440 commented May 13, 2024

cospectrum commented May 13, 2024

Morgane55440 commented May 13, 2024

Morgane55440 commented May 13, 2024

Morgane55440 commented May 13, 2024

Morgane55440 commented May 13, 2024

cospectrum commented May 13, 2024

Morgane55440 commented May 13, 2024

theotherphil commented May 14, 2024

Morgane55440 commented May 8, 2024 •

edited

cospectrum commented May 8, 2024 •

edited

Morgane55440 commented May 8, 2024 •

edited

Morgane55440 commented May 8, 2024 •

edited

Morgane55440 commented May 9, 2024 •

edited

Morgane55440 May 9, 2024 •

edited

ripytide May 9, 2024 •

edited

Morgane55440 May 9, 2024 •

edited

Morgane55440 commented May 9, 2024 •

edited

cospectrum commented May 10, 2024 •

edited

Morgane55440 commented May 12, 2024 •

edited

Morgane55440 commented May 13, 2024 •

edited

Morgane55440 commented May 13, 2024 •

edited