Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Rethink GenericImage #1118

Open
fintelia opened this issue Jan 24, 2020 · 20 comments
Open

RFC: Rethink GenericImage #1118

fintelia opened this issue Jan 24, 2020 · 20 comments

Comments

@fintelia
Copy link
Contributor

Right now, most of the image algorithms are implemented for the GenericImage trait. I see a couple problems with this design:

  1. Without being able to assume contigious memory layout it is hard to get good cache performance
  2. It makes doing dynamic dispatch for something like DynamicImage rather awkward, requiring a match statement to statically dispatch for each possible color type.
  3. Writing generic code frequently requires adding methods to the Pixel trait (like invert or to_rgba). This isn't viable for downstream users who can't add methods themselves.
  4. We rely heavily on inlining/optimization to counter the overhead of doing a function call per pixel in many algorithms.

I propose we instead implement the various algorithms directly within a single trait that both statically and dynamically typed images can implement. We'll lose the ability for downstream users to add new pixel types (though that capability was dubious in practice since we really don't let users actually describe the meaning of pixel types), but should end up with a design that is both cleaner and hopefully more correct.

/// Encodes valid pairs of color type + space. `ColorSpace` can be 
/// crate-private to start, so we don't have to decide how it works 
/// right away.
pub struct PixelType(ColorType, ColorSpace);
impl PixelType {
    pub fn linear(ty: ColorType) -> ImageResult<Self> { ... }
    pub fn color_type(&self) -> ColorType { self.0 }
}

pub trait ContiguousImage {
    fn dimensions(&self) -> (u32; u32);
    fn pixel_type(&self) -> PixelType;
    fn data(&self) -> &[u64]; // u64 lets us transmute to other primitive types
    fn data_mut(&mut self) -> &mut [u64];

    // Provided methods...
    fn as_bytes(&self) -> &[u8] { bytemuck::bytes_of(self.data()) }
    fn as_bytes_mut(&mut self) -> &mut [u8] { bytemuck::bytes_of_mut(self.data_mut()) }
    fn convert_to(&self, new: PixelType) -> DynamicImage { ... }
    fn resize(&self, ...) -> DynamicImage { ... }
    fn blur(&mut self, radius: f32) { ... }
    fn rotate(&mut self, angle: f32) { ... }
    fn horizontal_convolution(&mut self, filter: &[f32]) { ... }
    fn vertical_convolution(&mut self, filter: &[f32]) { ... }
    fn filter3x3(&mut self, kernel: &[f32]) { ... }
    ...
}

struct DynamicImage {
    dimensions: (u32, u32);
    pixel_type: PixelType;
    data: Vec<u64>;
}
impl ContiguousImage for DynamicImage { ... }
@fintelia fintelia changed the title RFC: Rethink Pixel Types RFC: Rethink GenericImage Jan 24, 2020
@HeroicKatora
Copy link
Member

HeroicKatora commented Jan 24, 2020

Should we make a project for planning for 0.24? Have you tried the Github feature already?

@fintelia fintelia added this to RFC in Version 0.24 Jan 24, 2020
@fintelia
Copy link
Contributor Author

I haven't used it before, but seems like it is worth a try

@birktj
Copy link
Member

birktj commented Jan 30, 2020

I would like to propose something that looks more like the Iterator trait where most functions in the Image trait have signatures like fn something(self, ...) -> Something<Self>. This allows for things like incremental decoding.

@fintelia
Copy link
Contributor Author

fintelia commented Jan 30, 2020

Could you sketch out a bit more of what that might look like? In particular, what would be the type signature of the Iterator::next() function equivalent on Something<Self> that would move data between incremental stages?

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 9, 2020

(*Note: moved from a discussion on an alternative container that should implement the trait)

I'm no longer sure if the trait is the correct public interface. We've already had plenty of examples in imageproc where a specialized method for ImageBuffer showed better performance but it is currently not possible for an external crate to be both generic in GenericImage (or this new ContiguousImage) and have a specialized method that extends the trait at the same time.

My main question was why we are relying on a trait at all? All C libraries for imageprocessing can do without it, so does numpy/Pillow. We have it at the moment because subimage could not be feasible be implemented but the types are different only because ImageBuffer owns the image and SubImage does not. So the radical solution would be to have no owning type that is bound to a specific Pixel and move all processing to a common type borrowing type (not unlike flat::ViewMut). When we ensure it is open and generic enough (as slices are from the standard library), then there might be no need for a trait.

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

it is currently not possible for an external crate to be both generic in GenericImage (or this new ContiguousImage) and have a specialized method that extends the trait at the same time.

One of the key differences between the two traits is that ContiguousImage defines the exact memory layout of the image data. As a result, it makes specialization based on layout irrelevant. It is true that algorithms would potentially need an extra switch based of color type that could be avoided in the static typing scenario, but the API is such that it could be done once per image (instead of say once per pixel) which means the overhead should be imperceptible.

The net result is that a downstream developer (or even the imageproc crate) could write their own ContigiousImageExt trait with additional image processing algorithms that either was implemented in terms of the provided ones, or matched on the color type and then operated on the backing buffer directly, and wouldn't pay any performance penalty for doing so

My main question was why we are relying on a trait at all? All C libraries for imageprocessing can do without it, so does numpy/Pillow. We have it at the moment because subimage could not be feasible be implemented but the types are different only because ImageBuffer owns the image and SubImage does not.

The only reason to have a trait would be so that dynamic and static color typed images (i.e. DynamicImage and ImageBuffer) could have the same methods. But really the core of the proposal is that all image algorithms should be implemented for dynamically typed images with a well defined layout. This is a property that the python libraries, and I believe some/most C libraries do have. Using a trait to accomplish it is secondary

So the radical solution would be to have no owning type that is bound to a specific Pixel and move all processing to a common type borrowing type (not unlike flat::ViewMut). When we ensure it is open and generic enough (as slices are from the standard library), then there might be no need for a trait.

I don't think that would actually be much of a difference. Wrapping the values of the required methods from ContigiousImage into XxxView and then implementing AsRef / AsMut would be essentially equivelent to implementing ContigiousImage directly.

@HeroicKatora
Copy link
Member

One of the key differences between the two traits is that ContiguousImage defines the exact memory layout of the image data.

Imposing specific invariants on some data is what a type is for, not a trait. Unless we are talking about making an unsafe trait, the memory layout imposed is not guaranteed to any consumer. This is basically why pixel_mut failed, and I want to avoid having a similar situation all over again while providing such methods and vocabulary to downstream crates.

Furthermore, #1084 and planar images are (imo) evidence that we can not feasibly work with a single memory layout. Adding multiple traits doesn't work since that will involve specialization to make use of them but adding different types for these different memory layouts does work, and allows well-defined extension traits as well.

The net result is that a downstream developer (or even the imageproc crate) could write their own ContigiousImageExt trait with additional image processing algorithms that either was implemented in terms of the provided ones, or matched on the color type and then operated on the backing buffer directly, and wouldn't pay any performance penalty for doing so

The same is true if these are types and downstream crates add a trait that add a couple of methods to the types.

I don't think that would actually be much of a difference. Wrapping the values of the required methods from ContigiousImage into XxxView and then implementing AsRef / AsMut would be essentially equivelent to implementing ContigiousImage directly.

Indeed, that was the point I was trying to make. If presented with two largely equivalent solutions, the simpler one should be chosen.

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

I think it is OK not to have first class support for weird memory layouts. Just providing conversion methods should be enough for most use cases (including for the author of #1084). Though I'm not sure I follow your point about needing an unsafe trait to guarantee memory layout: the buffer returned by ContingiousImage::data() will be assumed to be a packed array of pixels by downstream consumers and the user will just get garbled output/panics if the function is implemented wrong, but never actual UB.

I agree with you about trying to get simplicity, but as I think through the type based version I'm less sure it is actually simpler. For the trait version it is just two types and a trait that provides methods on them, but I'm a bit fuzzy what the other would look like. Could you sketch out exactly how the various derefs and View / ViewMut types would look like?

@HeroicKatora
Copy link
Member

The following is wrong with wanting to provide a safe trait that tries to enforce a particular layout of image data:

    // No guarantee that this corresponds to `ColorType` or the dimensions!
    // Need to recheck on every access in a generic implementation using the
    // trait. We know for a fact from imageproc that the compiler will not always
    // elide these checks.
    fn data_mut(&mut self) -> &mut [u64];
    // An implementation could overwrite this!
    // Same problem as `Deref` and `AsRef` and `Error::type_id` on which you
    // can not unsafely rely.
    fn as_bytes(&self) -> &[u8] { bytemuck::bytes_of(self.data()) }
    // Either `DynamicImage` is very open or it is hard to give a performant 
    // implementation of this. What about in-place conversion? It would introduce
    // a new `Sized` bound or is not always implementable.
    fn convert_to(&self, new: PixelType) -> DynamicImage { ... }
    // Why exactly `DynamicImage`? And if we have one specific choice here then
    // what is the benefit over mutable inherent methods on `DynamicImage`?
    fn resize(&self, ...) -> DynamicImage { ... }

Could you sketch out exactly how the various derefs and View / ViewMut types would look like?

We already have ViewMut and View, in the flat module. I'm mostly arguing for promoting them to the main underlying abstraction on which algorithms are implemented. That requires some cleanup, surely, but they are already basically the common denominator between ImageBuffer and DynamicImage and the most primitive type to implement GenericImage.

The trait could be similar to AsRef but require returning ViewMut<'_, T> instead of &'_ T. The name is bikeshedding, see below for some comments on it.

trait AsPixels<P: Pixel> {
    fn as_pixels(&self) -> ViewMut<&'_ mut [P::Subpixel], P>;
}

This solves the problem with SubImage as well. It no longer needs to be generic over the underlying image but can instead concretely rely on an internal ViewMut, similar to ChunksMut or iterator types that are concrete to their container. The implementation of impl AsPixel for SubImage would return the mutable view itself, and basically forgets about the reference frame and position within the outer image while sub_image(..).sub_image(..) propagates that frame information. (Sidenote: Having a concrete underlying borrow might also finally allow us to split an image into two simultaneously subimages although making this sound requires some work. But it is concrete enough to be provable at all without requiring an unsafe trait).

The remaining question is whether or not ViewMut is too concrete in that it requires samples to be contiguous or not concrete enough in that it allows column and row major and doesn't require width and pitch to agree (see the existing NormalForm). But here concrete types would help as well. If we add a ColumnMajor<'_, T> the we can simply forward all of its methods to ViewMut by 'decaying' while allowing the 'upgrade' with a method returning an option.

Also a short remark on PixelType, can we avoid mixing the color interpretation with in-memory layout of pixel and channel? And then note that this greatly increases the complexity of a potential trait as it affects all implementors but introducing it on particular, single types or adding new types is easy.

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

Ah, you were thinking of "enforcing layout" much more strongly than I was. I was just thinking that if someone chooses to implement the trait in a way that returns bogus/inconsistent values for the methods, then we get panics/breakage (but not UB). This is true of other traits too, like Hash/Eq which the documentation points out can result in weird behavior if implemented wrong. Though you make a good point that it would be nice to validate the fields once and then be able to have unsafe code rely on them being in a consistent state.

We already have ViewMut and View, in the flat module. I'm mostly arguing for promoting them to the main underlying abstraction on which algorithms are implemented.

The problem with using ViewMut directly is that it is parameterized on Pixel, which would prevent DynamicImage from implementing AsPixels.

The trait could be similar to AsRef but require returning ViewMut<', T> instead of &' T. The name is bikeshedding, see below for some comments on it.

Wouldn't this design require a method call on every use of AsPixels?

let mut img: DynamicImage = load_image_somehow();
img.as_pixels().blur(...)?;
img.as_pixels().rotate(...)?;

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

Also a short remark on PixelType, can we avoid mixing the color interpretation with in-memory layout of pixel and channel? And then note that this greatly increases the complexity of a potential trait as it affects all implementors but introducing it on particular, single types or adding new types is easy.

I'd argue that declaring a single in-memory layout for the core of the crate drastically reduces complexity. But I agree that should be a separate discussion from defining PixelType.

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 9, 2020

Wouldn't this design require a method call on every use of AsPixels?

It would be like the relationship of Extend and IntoIterator, a trait to normalize the representation to ViewMut and the impl for ViewMut would be identity. Call it once, and then do multiple calls on the normalized view.

The problem with using ViewMut directly is that it is parameterized on Pixel, which would prevent DynamicImage from implementing AsPixels.

Also already partially solved on a concrete type. The current DynamicImage has as_flat_samples_u8 and as_flat_samples_u16 both of which return an Option of the particular view. The same could be done with a type parameter of P: Pixel due to its associated ColorType. We could make this a trait as well but as some algorithms are only applicable to some pixel types the AsPixels trait could suffice until shown otherwise.

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

The problem with using ViewMut directly is that it is parameterized on Pixel, which would prevent DynamicImage from implementing AsPixels.

Also already partially solved on a concrete type. The current DynamicImage has as_flat_samples_u8 and as_flat_samples_u16 both of which return an Option of the particular view. The same could be done with a type parameter of P: Pixel due to its associated ColorType.

This would be very unpleasant for writing any downstream generic code. Essentially any operation on a dynamic image would be a match with separate branches for each possibility. This is a current annoyance of the crate, and something I'd like to resolve by just making the color/pixel type a field instead of a type parameter in most places.

Wouldn't this design require a method call on every use of AsPixels?

It would be like the relationship of Extend and IntoIterator, a trait to normalize the representation to ViewMut and the impl for ViewMut would be identity. Call it once, and then do multiple calls on the normalized view.

What about something like this then?

trait ContigiousImage {
    // Single methods which ensures invariants are met. If you have a DynamicView/DynamicViewMut.
    // then these are no-ops, otherwise they are cheap checks that the layout is right
    fn as_view(&self) -> DynamicView;
    fn as_view_mut(&mut self) -> DynamicViewMut;

    // Provided methods that can be called directly on any anything that implements this trait. 
    // Internally they all call as_view / as_view_mut and then perform the operation on that
    fn blur(&mut self, radius: f32) { ... }
    fn rotate(&mut self, angle: f32) { ... }
}

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 9, 2020

This would be very unpleasant for writing any downstream generic code. Essentially any operation on a dynamic image would be a match with separate branches for each possibility. This is a current annoyance of the crate, and something I'd like to resolve by just making the color/pixel type a field instead of a type parameter in most places.

Any performant code will need slices of concrete types, there are no cache efficient alternatives. The current usability crisis comes from the fact that there are not enough algorithmic traits for even basic operations since even those traits can not work efficiently on GenericImage and implementation effort is much higher as well. If DynamicImage had the methods for views I mentioned above and it were clear that this is the type for which to implement algorithms then there would be no problem adding a blur trait that matches on the color type. This doesn't make downstream algorithm crates much easier, but it removes the strict typing need for the end user. In theory, we could even give a trait for Pixel oblivious algorithms and thus internalize the matching fully. This adds only ~20 lines overhead up-front and ~9 lines of overhead per method:

trait PixelOblivious {
    fn apply<P: Pixel>(&mut self, view: ViewMut<P>);
}

struct Blur(f32);

impl PixelOblivious for Blur {
    fn apply<P>(&mut self, view: ViewMut<P>) {
        imageops::blur(view, self.0);
    }
}

impl DynamicImage {
    fn apply(&mut self, mut alg: PixelOblivious) {
        match self.color_type {
            ColorType::Rgb8 => alg.apply(self.as_view::<Rgb<u8>>().unwrap()),}
    }

    pub fn blur(&mut self, amount: f32) {
        self.apply(Blur(amount));
    }
}

I'm not oposed to having blur and rotate etc. on DynamicImage but I do not see the need for a trait to achieve this ergonomically.

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 9, 2020

What about something like this then? [..]

This makes ContiguousImage quite like AsDynamicViewMut and that might complement the basic implementation on DynamicViewMut quite well. The only problem might be that we should be very careful what traits are public (note that the PixelOblivious in the above would not be) to ensure that even changes to the layout beneath DynamicImage do not cause compatible issues.

@fintelia
Copy link
Contributor Author

fintelia commented Feb 9, 2020

I'm not oposed to having blur and rotate etc. on DynamicImage but I do not see the need for a trait to achieve this ergonomically.

If DynamicImage was the only type in the crate that supported image algorithms this would be fine with me. But right now you can run image algorithms on anything that implements GenericImage, and DynamicImages get left out. My proposal is inspired by thinking "what if we had a similar trait that worked with statically and dynamically typed images". Adding boilerplate code to DynamicImage for each possible algorithm doesn't really solve this, because downstream users would have to write the same boilerplate (without the help of PixelOblivious).

This makes ContiguousImage quite like AsDynamicViewMut and that might complement the basic implementation on DynamicViewMut quite well.

But why even bother with implementations in other places? AsDynamicViewMut (or whatever we end up calling it) could just be the only place things are implemented and the main types could each implement. DynamicViewMut in particular should have no trouble implementing the trait

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 9, 2020

But why even bother with implementations in other places?

It feels very much as if std adding a trait to implement a method on slices...

Implementing the exact same thing on DynamicImage as on all ViewMut commits it to a particular layout and representation or at least requires it to reorganize its data to one. I do see the next DynamicImage as the truly extensible image type and committing to its internal now runs counter to that (.e.g. do we want to have planar data? Heterogeneous samples? Pixels of non-integer byte sizes? should DynamicImage support Beyer?). Maybe there is room for an intermediate, an untyped borrowed view into a compressed pixel matrix, which is exactly one ViewMut<P> and consequently can expose the trait and method I sketched above.

But why even bother with implementations in other places? AsDynamicViewMut (or whatever we end up calling it) could just be the only place things are implemented and the main types could each implement. DynamicViewMut in particular should have no trouble implementing the trait.

Adding it to one specific struct is powerful as Rust does not have Higher-Ranked-traits, it also creates a unified vocabulary. I see this as similar to how a dynamic amount of borrowed data should almost always be passed via a slice, and not as &Vec<T> or some i&mpl Index<SliceIndex<_>>. I would like to have ViewMut available through Deref on applicable structs but alas there are user-defined unsized pointers.

This option also avoids binary bloat from monomorphization.

@fintelia
Copy link
Contributor Author

Implementing the exact same thing on DynamicImage as on all ViewMut commits it to a particular layout and representation or at least requires it to reorganize its data to one.

We have to make these decisions at some point. In particular it isn't really possible to write generic code without this. In fact, some of the existing algorithm implementations are problematic precisely because they assume facts about layout that aren't actually guaranteed.

I see this as similar to how a dynamic amount of borrowed data should almost always be passed via a slice, and not as &Vec or some i&mpl Index<SliceIndex<_>>. I would like to have ViewMut available through Deref on applicable structs but alas there are user-defined unsized pointers.

The lack of a Deref implementation is the big issue. Nobody would tolerate it if you had to do my_vec.as_slice()[4] to index into vectors.

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 10, 2020

We have to make these decisions at some point. In particular it isn't really possible to write generic code without this. In fact, some of the existing algorithm implementations are problematic precisely because they assume facts about layout that aren't actually guaranteed.

True, and in light of Deref this leads me directly to the conclusion: there should be owned types that guarantee stronger layouts. ImageBuffer being a current one that is took strong and opinionated but an equivalent that has a dynamic pixel type would be preferrable. 👍

Maybe I tried to look at this issue of traits backwards. It is not quite that we should have no traits, in fact operations such as blur, copy_within, etc should be available on as many types as possible. But the from practicial standpoint of coherence and specialization rules it looks to never be preferrable to have generic impls. This holds for internal traits as well as downstream traits. And it is somewhat true for operations that would be implemented as a part of GenericImage.

impl<T: GenericImage> Blur for T {
    // This is too inflexible to get the best performance in all cases.
    // * It hinders downstream impls that must not intersect `GenericImage`. 
    //   This is already a HUGE issue in `imageproc`.
    // * Internal performance suffers as we don't have specialization.
}

But if we only look at concrete impls then the GenericImage crate is practically never used, as inherent methods always provide an alternative.

@jerry73204
Copy link

I missed this thread and just posted #1159. I think it's relevant to this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants