Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimisation of grayscale morphology operators #598

Merged

Conversation

Morgane55440
Copy link
Contributor

@Morgane55440 Morgane55440 commented May 8, 2024

This PR has been made following the issue #597 : optimization of grayscale morphology operators

it aims to work on the internal implementation of the Mask and associated functions to optimize the execution time, as well as add parallel versions of some public functions using rayon.

in the first commit that i have made, i have added bench for the Mask creation methods and have reworked the implementation of the from_image methods, which allowed a 4x speed increase (200 ms to 50ms) for a large image (200 by 200 pixels). i would expect the improvement to be less significant for smaller images, although i have not tested it.

it was brought to my attention that i was accessing image pixels in col-major order. That was due to a misunderstanding of the image layout from my part. In light of this, i have reworked the implementation of the Mask to facilitate and incentivize access in row-major order by storing the positions row by row.

i have also been advised to change (i16, i16) to Point for readability, which i will do in a further commit.

Finally, through my testing, i tried to use unsafe_get_pixel to optimize my get_pixel calls. While it was very efficient, i found that accessing the buffer directly and using the chunks method was faster by around 17% (~65 to ~50microsecond ). i tested it several times, and the slowest chunks time was faster than the fastest unsafe_get_pixel by 10 ms.

i believe that it might be interesting to look at for other uses of unsafe_get_pixel which traverse the image line by line, as it potentially could be the case that directly traversing a [u8] might allow the compiler to do better optimizations than when these accesses are obfuscated through unsafe_get_pixel

@Morgane55440
Copy link
Contributor Author

next i'm changing the (i16, i16) to Point, benching grayscale_erode and grayscale_dilate, and testing different implementations

@cospectrum
Copy link
Contributor

cospectrum commented May 8, 2024

What about this? Similar performance. I'm a fan of image methods, the devs have already come up with effective access to pixels

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 8, 2024

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

thanks a ton ! it is just as fast, if not slightly faster (seems maybe 5-10 microsecond faster on average, but there was overlap)

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 8, 2024

i am not super knowledgeable on all the image methods, but i'll look into them more

@Morgane55440
Copy link
Contributor Author

i think i probably should look into the other constructors

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:     185,236 ns/iter (+/- 19,262)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:     112,540 ns/iter (+/- 36,400)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      47,545 ns/iter (+/- 1,532)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      43,016 ns/iter (+/- 1,629)

(they all make masks of the same size)

@cospectrum
Copy link
Contributor

The new release is scheduled for May 14th, if anything. I think you will make it in time

src/morphology.rs Outdated Show resolved Hide resolved
@Morgane55440
Copy link
Contributor Author

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:      17,375 ns/iter (+/- 1,134)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:      33,289 ns/iter (+/- 3,166)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      50,766 ns/iter (+/- 2,226)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      40,882 ns/iter (+/- 2,205)

pretty happy with this. going to sleep. Tomorrow i'll work on making the disk mask code clearer, and then i'll move on to erode and dilate

@Morgane55440
Copy link
Contributor Author

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 9, 2024

initial state of benches :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:  87,424,590 ns/iter (+/- 3,543,468)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:   2,013,255 ns/iter (+/- 65,107)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:   3,414,165 ns/iter (+/- 91,476)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench: 133,544,210 ns/iter (+/- 30,206,688)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:      81,142 ns/iter (+/- 7,243)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:     242,079 ns/iter (+/- 58,845)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   6,978,410 ns/iter (+/- 267,406)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       5,356 ns/iter (+/- 636)
test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:  87,889,700 ns/iter (+/- 24,124,426)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:   2,287,300 ns/iter (+/- 64,361)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:   3,316,540 ns/iter (+/- 50,421)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench: 134,237,150 ns/iter (+/- 29,765,530)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:      99,185 ns/iter (+/- 10,606)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:     213,802 ns/iter (+/- 5,383)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   7,203,760 ns/iter (+/- 2,410,854)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       6,268 ns/iter (+/- 1,050)

@cospectrum
Copy link
Contributor

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

@Morgane55440
Copy link
Contributor Author

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

it is, but it's 3-4 times slower

@Morgane55440
Copy link
Contributor Author

what i did there was, instead of just checking if each pixel is inside or outside the disk, i computed the edges of the disk, and then filled it in.

it is messier, but much faster. i would like to maybe make it less messy without necessarily sacrificing speed

let range = -(radius as i16)..=(radius as i16);
Self {
elements: range
let radius_squared = (radius as u32) * (radius as u32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why clippy::cast_lossless is an allowed lint? It's quite confusing when reading a lot of casts from u8 to bigger types like u16 and u32 when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know very little about clippy, i have no idea. i just run clippy and do what it tells me because i has always made sense so far. if i should change anything please tell me. in this specific function, i cast the i16s as u32s to prevent potential overflow from the squaring

Copy link
Contributor

@ripytide ripytide May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies my question was more about this projects clippy settings and not your PR (it just happens to be the first PR I've noticed it on), if you're interested though you can read about the lint here: https://rust-lang.github.io/rust-clippy/master/index.html#cast_lossless. In this scenario I would recommend using u32::from(radius) or using x.into() methods over explicit casts. Ideally, clippy would enforce this for us however which was my wider point although that's not related to this particular PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that makes a lot of sense, i've tried to use as many froms and into()s as possible.

my next commit will take a little while because i've been working on the optimization of erode and dilate too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why clippy::cast_lossless is an allowed lint? It's quite confusing when reading a lot of casts from u8 to bigger types like u16 and u32 when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.

I think the reasons are again the same as in the case of unused imports. Someone wrote code a long time ago that violates the rules, and it will take some time to rewrite it. Personally, I would remove all allows (in lib.rs), and where the rules still need to be violated, let it be localized to a function or module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, I 100% agree with your approach. I can draft a PR to make that change but it would be quite a lot of code churn it might cause merge conflicts with some in-flight PRs. Perhaps after the next release if the PRs die down a bit we can give it a shot then.

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 9, 2024

the new commit is faster by about 20-30% for big images, but it is utterly unreadable, i will rework it to make it even faster and significantly more readable

also, i did not know about the minimum supported version, will rewrite the offending code

@cospectrum
Copy link
Contributor

cospectrum commented May 10, 2024

You can write simple reference implementations (in mod tests) and test against them. They will also be useful in the future.

src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Outdated Show resolved Hide resolved
@Morgane55440
Copy link
Contributor Author

new benches for erode() and dilate() :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:   1,906,060 ns/iter (+/- 57,854)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:      71,312 ns/iter (+/- 28,722)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:     146,737 ns/iter (+/- 4,326)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench:   8,119,980 ns/iter (+/- 1,010,798)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:       4,376 ns/iter (+/- 3,639)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:      42,703 ns/iter (+/- 3,684)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   1,232,370 ns/iter (+/- 361,772)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       1,036 ns/iter (+/- 83)
test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:   1,837,840 ns/iter (+/- 30,849)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:      64,265 ns/iter (+/- 1,396)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:     158,506 ns/iter (+/- 11,497)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench:   8,156,330 ns/iter (+/- 2,925,125)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:       4,266 ns/iter (+/- 115)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:      41,219 ns/iter (+/- 777)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   1,202,150 ns/iter (+/- 300,326)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       1,025 ns/iter (+/- 18)

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 12, 2024

ok, so it all seems good, just 2 things i was thinking about :

  • first, small and simple : i think it would be good to refactor the erode and dilate code into a single function, something like
    per_pixel_mask_based_reduce(image: &GrayImage, mask: &Mask, neutral : u8, operator : Fn (u8,u8) -> u8)i'd like a second opinion to be sure, especially on the name
  • second, less simple, maybe not super relevant anymore : i mentioned adding parallel versions of the user-facing functions in the beginning of the PR. that wasn't the focus at first, but i think it could easily be completely finished by tomorrow. of course, it might make more sense to leave that for another PR and therefore another release

@Morgane55440
Copy link
Contributor Author

then it should all be ready to merge

@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 13, 2024

i did the refactor, if anyone has a better function name idea, i'm all ears, otherwise, i think this could be good to merge

it's not user-facing, so it's not a big deal

@Morgane55440 Morgane55440 marked this pull request as ready for review May 13, 2024 12:00
Copy link
Contributor

@ripytide ripytide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Show resolved Hide resolved
src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Outdated Show resolved Hide resolved
src/morphology.rs Show resolved Hide resolved
Morgane55440 and others added 2 commits May 13, 2024 16:24
Co-authored-by: ripytide <62516857+ripytide@users.noreply.github.com>
Co-authored-by: ripytide <62516857+ripytide@users.noreply.github.com>
@Morgane55440
Copy link
Contributor Author

Morgane55440 commented May 13, 2024

@ripytide
all of the comments remaining are about Mask::new().

this was not part of my intended code, i added it due to a suggestion by @cospectrum , and i think it was a mistake on my part

You can add private new and call it instead of Self { elements } everywhere

fn new(elements: Vec<Point<i16>>) -> Self {
    assert!(elements.len() <= (511 * 511) as usize);
    Self { elements } 
}

the code that i want to make guarantees that for all Masks :
all the integer values will be strictly between -512 and 512

  • all tuples will be sorted in lexicographic order
  • all tuples will be sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
  • no tuple shall appear twice

this shall be true for all Masks, and is garanteed by the current implementation of all the user-facing constructor.
i think the new should just be removed. the creation of custiom maks is already possible through from_image

@Morgane55440
Copy link
Contributor Author

maybe i could add tests that check the conditions are always met ?

@ripytide
Copy link
Contributor

Ah okay, that makes sense. Since there are tests already I would expect them to fail if any of the constructors stopped abiding by the correct ordering so you probably don't need to write explicit tests for checking the ordering as that is just an implementation detail.

@Morgane55440
Copy link
Contributor Author

well then it should be good as-is

i could add a comment above the initializations, something like :

 // direct initialization stands as guarantee that all conditions mentioned by the doc are fulfilled for the struct and each of it's parts
Self { elements }

@Morgane55440
Copy link
Contributor Author

thanks a ton for typos @ripytide . i fixed the others it showed me, including some in distance_transform because i wrote them, they seemed relevant and there are no other PR on the subject atm

@cospectrum
Copy link
Contributor

Why did you remove new?

@Morgane55440
Copy link
Contributor Author

@cospectrum because it didn't do anything and was confusing.

each constructor guarantees the struct it returns is valid in it's own way.
the existence of the new method implied that it was validating the data.
it wasn't, and making it truly validate the data would require it to iterate through and check the whole vector, which would slow down all the constructors just to double check something that was already assured.

@Morgane55440
Copy link
Contributor Author

replacing Mystruct { my_data } by Mystruct::new(my_data ) is great, except if you're already in a constructor which is doing it's own validation

@Morgane55440
Copy link
Contributor Author

i think i have an alternative solution that may help documentation, to make clearer what's happening:

    /// new_unchecked creates a new mask without doing any form of data validation
    /// 
    /// # Safety
    /// 
    /// this method is not user-facing and marked `unsafe` because it may lead 
    /// to invalid state if called with the wrong arguments.
    /// 
    /// By calling new_unchecked, you guarantee that :
    /// - all the integer values in `elements` will be strictly between -512 and 512
    /// - the maximum L_inf distance between any 2 points in `elements` is 512
    /// - all Points in `elements` are sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
    /// - no point appears twice in `elements`
    unsafe fn new_unchecked(elements: Vec<Point<i16>>) -> Self {
        Self { elements }
    }

@Morgane55440
Copy link
Contributor Author

none of it is user-facing, so it's not a huge deal, but it probably would be helpful for maintainability to have the danger laid out by the unsafe block and all the documentation stored in the same place

@cospectrum
Copy link
Contributor

except if you're already in a constructor which is doing it's own validation

None of your constructors do real validation.
Comments prove nothing, asserts do

the existence of the new method implied that it was validating the data. It wasn't

It does 1 check. One is better than nothing. And it may have more in the future.

and making it truly validate the data would require it to iterate through and check the whole vector

That's why debug_assert exists

@Morgane55440
Copy link
Contributor Author

so you would prefer a new method with debug_asserts ? i can make it if you think it's the best way to guarantee safety.

i'm just trying to make code as good as possible, following the advice and criticisms i'm given, this is all work in progress.

@theotherphil theotherphil merged commit 03f6f5c into image-rs:master May 14, 2024
15 checks passed
@theotherphil
Copy link
Contributor

Thanks @Morgane55440 ! It’ll be interesting to see if we can copy the chunks approach in other functions for decent speed ups and less unsafe code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants