-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimisation of grayscale morphology operators #598
optimisation of grayscale morphology operators #598
Conversation
next i'm changing the |
What about this? Similar performance. I'm a fan of let elements = image
.enumerate_pixels()
.filter(|(_, _, &p)| p[0] != 0)
.map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
.collect();
Self { elements } |
…eems slightly faster)
thanks a ton ! it is just as fast, if not slightly faster (seems maybe 5-10 microsecond faster on average, but there was overlap) |
i am not super knowledgeable on all the |
i think i probably should look into the other constructors
(they all make masks of the same size) |
The new release is scheduled for May 14th, if anything. I think you will make it in time |
pretty happy with this. going to sleep. Tomorrow i'll work on making the disk mask code clearer, and then i'll move on to |
@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making |
initial state of benches :
|
The |
it is, but it's 3-4 times slower |
what i did there was, instead of just checking if each pixel is inside or outside the disk, i computed the edges of the disk, and then filled it in. it is messier, but much faster. i would like to maybe make it less messy without necessarily sacrificing speed |
src/morphology.rs
Outdated
let range = -(radius as i16)..=(radius as i16); | ||
Self { | ||
elements: range | ||
let radius_squared = (radius as u32) * (radius as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why clippy::cast_lossless
is an allowed lint? It's quite confusing when reading a lot of casts from u8
to bigger types like u16
and u32
when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know very little about clippy, i have no idea. i just run clippy and do what it tells me because i has always made sense so far. if i should change anything please tell me. in this specific function, i cast the i16
s as u32
s to prevent potential overflow from the squaring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies my question was more about this projects clippy settings and not your PR (it just happens to be the first PR I've noticed it on), if you're interested though you can read about the lint here: https://rust-lang.github.io/rust-clippy/master/index.html#cast_lossless. In this scenario I would recommend using u32::from(radius)
or using x.into()
methods over explicit casts. Ideally, clippy would enforce this for us however which was my wider point although that's not related to this particular PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that makes a lot of sense, i've tried to use as many from
s and into()
s as possible.
my next commit will take a little while because i've been working on the optimization of erode
and dilate
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why
clippy::cast_lossless
is an allowed lint? It's quite confusing when reading a lot of casts fromu8
to bigger types likeu16
andu32
when such a cast should be lossless. It makes it harder to spot lossy casts when checking correctness.
I think the reasons are again the same as in the case of unused imports
. Someone wrote code a long time ago that violates the rules, and it will take some time to rewrite it. Personally, I would remove all allows
(in lib.rs), and where the rules still need to be violated, let it be localized to a function or module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, I 100% agree with your approach. I can draft a PR to make that change but it would be quite a lot of code churn it might cause merge conflicts with some in-flight PRs. Perhaps after the next release if the PRs die down a bit we can give it a shot then.
the new commit is faster by about 20-30% for big images, but it is utterly unreadable, i will rework it to make it even faster and significantly more readable also, i did not know about the minimum supported version, will rewrite the offending code |
You can write simple reference implementations (in |
new benches for
|
ok, so it all seems good, just 2 things i was thinking about :
|
then it should all be ready to merge |
i did the refactor, if anyone has a better function name idea, i'm all ears, otherwise, i think this could be good to merge it's not user-facing, so it's not a big deal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!
@ripytide this was not part of my intended code, i added it due to a suggestion by @cospectrum , and i think it was a mistake on my part
the code that i want to make guarantees that for all
this shall be true for all |
maybe i could add tests that check the conditions are always met ? |
Ah okay, that makes sense. Since there are tests already I would expect them to fail if any of the constructors stopped abiding by the correct ordering so you probably don't need to write explicit tests for checking the ordering as that is just an implementation detail. |
well then it should be good as-is i could add a comment above the initializations, something like : // direct initialization stands as guarantee that all conditions mentioned by the doc are fulfilled for the struct and each of it's parts
Self { elements } |
Why did you remove |
@cospectrum because it didn't do anything and was confusing. each constructor guarantees the struct it returns is valid in it's own way. |
replacing |
i think i have an alternative solution that may help documentation, to make clearer what's happening: /// new_unchecked creates a new mask without doing any form of data validation
///
/// # Safety
///
/// this method is not user-facing and marked `unsafe` because it may lead
/// to invalid state if called with the wrong arguments.
///
/// By calling new_unchecked, you guarantee that :
/// - all the integer values in `elements` will be strictly between -512 and 512
/// - the maximum L_inf distance between any 2 points in `elements` is 512
/// - all Points in `elements` are sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
/// - no point appears twice in `elements`
unsafe fn new_unchecked(elements: Vec<Point<i16>>) -> Self {
Self { elements }
} |
none of it is user-facing, so it's not a huge deal, but it probably would be helpful for maintainability to have the danger laid out by the |
None of your constructors do real validation.
It does 1 check. One is better than nothing. And it may have more in the future.
That's why debug_assert exists |
so you would prefer a i'm just trying to make code as good as possible, following the advice and criticisms i'm given, this is all work in progress. |
Thanks @Morgane55440 ! It’ll be interesting to see if we can copy the chunks approach in other functions for decent speed ups and less unsafe code. |
This PR has been made following the issue #597 : optimization of grayscale morphology operators
it aims to work on the internal implementation of the
Mask
and associated functions to optimize the execution time, as well as add parallel versions of some public functions using rayon.in the first commit that i have made, i have added bench for the
Mask
creation methods and have reworked the implementation of thefrom_image
methods, which allowed a 4x speed increase (200 ms to 50ms) for a large image (200 by 200 pixels). i would expect the improvement to be less significant for smaller images, although i have not tested it.it was brought to my attention that i was accessing image pixels in col-major order. That was due to a misunderstanding of the image layout from my part. In light of this, i have reworked the implementation of the
Mask
to facilitate and incentivize access in row-major order by storing the positions row by row.i have also been advised to change (i16, i16) to Point for readability, which i will do in a further commit.
Finally, through my testing, i tried to use
unsafe_get_pixel
to optimize myget_pixel
calls. While it was very efficient, i found that accessing the buffer directly and using thechunks
method was faster by around 17% (~65 to ~50microsecond ). i tested it several times, and the slowestchunks
time was faster than the fastestunsafe_get_pixel
by 10 ms.i believe that it might be interesting to look at for other uses of
unsafe_get_pixel
which traverse the image line by line, as it potentially could be the case that directly traversing a[u8]
might allow the compiler to do better optimizations than when these accesses are obfuscated throughunsafe_get_pixel