Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APNG Encode #255

Closed
wants to merge 21 commits into from
Closed

APNG Encode #255

wants to merge 21 commits into from

Conversation

Rimpampa
Copy link
Contributor

@Rimpampa Rimpampa commented Nov 22, 2020

As @HeroicKatora suggested, I made a PR directly here, this is supposed to continue the work done on image-png#99.

As I said over there, this PR takes the existing code and adds APNG encoding functionality, maintaining backwards compatibility, but is not supposed to be merged in it's current state. I'd like to hear suggestion from others on how to make it better.

APNG related features:

  • encode frames
  • set a separate default image
  • set different frame rate for each frame
  • encode frames that don't fill the frame
  • set the default dispose and blend operation
  • set the dispose and blend operation for each frame
  • set the filter type for each write

@Rimpampa Rimpampa changed the title Apng APNG Encode Nov 22, 2020
Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns that need not be blockers except for write_separate_default_image. I'd rather see it land at some point than delay it further. However, if you agree with the problems/improvements then it would be good to see them addressed.

src/encoder.rs Outdated
Comment on lines 295 to 314
match self.info {
Info {
animation_control: Some(_),
frame_control:
Some(FrameControl {
sequence_number, ..
}),
..
} => {
self.check_animation()?;
self.write_fcTL()?;
if sequence_number == 0 && !self.separate_default_image {
self.write_chunk(chunk::IDAT, &chunk)?;
} else {
self.write_fdAT(&chunk)?;
}
self.info.animation_control.as_mut().unwrap().num_frames -= 1;
}
_ => self.write_chunk(chunk::IDAT, &chunk)?,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like this pattern for a few reasons. Firstly the matching should be outside the loop so as to not suggest that the path could change while writing the image.

Secondly, changing animation_control is a clever way to avoid another tracking variable but it's not expected. What if we were to add a getter? Then the info could be incorrect after you've written a few chunks.

And lastly, there should be a note on what happens if we write more images than the animation_control allows. The connection between check_animation avoiding the panic from wrapping in the subtraction is rather subtle. I expect we could have a lenient mode at some point that allows this so the control flow should be more obvious. If we don't subtract then there is also no problem.

src/encoder.rs Outdated
@@ -234,7 +290,33 @@ impl<W: Write> Writer<W> {
/// Writes the image data.
pub fn write_image_data(&mut self, data: &[u8]) -> Result<()> {
const MAX_CHUNK_LEN: u32 = (1u32 << 31) - 1;
let zlib_encoded = self.deflate_image_data(data)?;
for chunk in zlib_encoded.chunks(MAX_CHUNK_LEN as usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_CHUNK_LEN is too much if we're writing fdAT data since there are 4 bytes of sequence number added. This can also be addressed by matching first, and then looping.

src/encoder.rs Outdated
/// Sets the default frame rate of the animation
///
/// It will fail if `set_animation_info` isn't called before
pub fn set_frame_rate(&mut self, delay_num: u16, delay_den: u16) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn set_frame_rate(&mut self, delay_num: u16, delay_den: u16) -> Result<()> {
pub fn set_frame_delay(&mut self, delay_num: u16, delay_den: u16) -> Result<()> {

It should also make it more obvious that this only affects future frames.

src/encoder.rs Outdated
/// If `num_plays` is set to 0 it will repeat infinitely
///
/// It will fail if `frames` is 0
pub fn set_animation_info(&mut self, frames: u32, num_plays: u32) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter should just be an AnimationControl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what happens if you call this after having written the first frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the second comment, how could you call it after having written a frame first? Once you call write_header you can't access the Encoder anymore

src/encoder.rs Outdated
@@ -305,6 +384,93 @@ impl<W: Write> Writer<W> {
pub fn into_stream_writer_with_size(self, size: usize) -> StreamWriter<'static, W> {
StreamWriter::new(ChunkOutput::Owned(self), size)
}

pub fn write_separate_default_image(&mut self, data: &[u8]) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd. When transitioning something to animated PNG I would expect the opposite, that I need one or two more calls to setup the animation. I would however not expect that the still image, as which this chunk is interpreted when it is written, needs a separate methods. Consider two scenarios:

  • The image is generated by a separate method whose parameter is Writer<W>.
  • An image is modified in a pipeline.

In both cases it's complicated that the reader has one method for all images but the write_image is only for animated images if the image is animated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You meant write_separate_default_image at the end right?)

This could be solved by moving the method to the Encoder creation, with like a set_separate_default_image(data: &[u8]) that will panic/error if the animation parameters are not set (or sets the default parameters for an animation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to have a mandatory additional step where the FrameControl data needs to be initialized, separately from the call that initalized the animation control. Then treat every call to write_image_data as writing the default image as long as no frame control was set. That's very close to the way the chunks are organized in the image and avoids complicating the constructor.

src/encoder.rs Outdated
@@ -276,7 +355,7 @@ impl<W: Write> Writer<W> {
/// This borrows the writer. This preserves it which allows manually
/// appending additional chunks after the image data has been written
pub fn stream_writer(&mut self) -> StreamWriter<W> {
self.stream_writer_with_size(DEFAULT_BUFFER_LENGTH)
self.stream_writer_with_size(Self::DEFAULT_BUFFER_LENGTH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about stream writer, actually? Should it error or return a StreamWriter for an animated image chunk? Not that it's critical because currently it simply allows one to write multiple IDAT images anyways. 😅

src/encoder.rs Outdated
/// Sets the default frame rate of the animation
///
/// It will fail if `set_animation_info` isn't called before
pub fn set_frame_rate(&mut self, delay_num: u16, delay_den: u16) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, why would it be possible to set this aspect specifically but not other parts of FrameControl? Special casing frame rate seems appropriate for a format-agnostic interface but not for a mid-level interface as png has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some check-boxes to the first comment, I forgot about those when I wrote that

@Rimpampa
Copy link
Contributor Author

So I've thought a bit on your comments @HeroicKatora and this is what I came up to:

About single images

Currently the ChunkWriter is kind of useless as it's only purpose is to bufferize the data coming from the StreamWriter, and add an IDAT chunk on every write, that could be done by using a BufWriter on the base writer itself and by writing the chunk directly on the StreamWriter. Another thing is that it doesn't even really need the Writer as it only needs it's Info to work. Also you can specify a buffer smaller that the in_line size and that wouldn't make much sense as there are two additional buffers of that lenght on the StreamWriter.

Having said that I think we can get rid of the StreamWriter and ChunkWriter objects at the same time by putting in the Writer the prev and curr buffers + a counter for the data written (or the lines written). Then the write_image_data method will just continue to append the data and write IDAT chunks once those are filled, without allocating the two buffers each time, until the complete image gets written.

Note: Right now it is possible to write additional image data as long as it fills the image completely (it can be used to hide an
image in another) so this solution will solve this "problem" too.

If backwards compatibility is required, StreamWriter could just become a wrapper around Writer that implements io::Write (or maybe it could be implemented directly in Writer)


Regarding animations

I was thinking about making a new struct that would implement Iterator:

the way I intended it to work is that it would return an AnimationEncoder for each frame, that could be used to set the properties of that frame, and that could then be made into a AnimationWriter so it could be used to write the frame data. Not sure if this would work, but if both the structs have a mutable reference to the base iterator than it would be impossible (with safe code) to have two AnimationWriters to the same image, so that on each iteration the writer has to be dropped in order to be able to encode the next frame.

The way I imagined all of this to work is like that:

// For single images
{
    let mut encoder = Encoder::new(100, 100);
    // set encorder options
    let mut writer = encoder.write_header().unwrap();

    writer.write_image_data(&data).unwrap(); // First part
                                                // ...
    writer.write_image_data(&data).unwrap(); // Last part
    writer.finish().unwrap(); // Because it can silently fail while dropping
}
// For animations
{
    let mut encoder = AnimationEncoder::new(100, 100, 20); // 20 frames
    // set encoder options
    let mut frames = encorder.write_header(&default_image).unwrap(); // or write_animation_header() if only Encoder is used

    for frame_enc in frames {
        // set frame options
        let writer = frame_enc.write_header(); // or something on that line

        writer.write_image_data(&data).unwrap(); // First part
                                                    // ...
        writer.write_image_data(&data).unwrap(); // Last part
        writer.finish().unwrap(); // Because it can silently fail while dropping
    }
}

Like that it might be better to call the two types of writers ImageWriter and AnimationWriter

Obviously it's still very generic and some more thinking should be done in order to avoid code duplication, I'm not an expert so I don't expect this to be completely right.


While I'm here, I've seen that the write_chunk method is public and the field in ChunkType is also public so anything could be written between the image data. That is right as if you have a propetary extension you want to be able to write it, but I feel like this is not the best option: (mine probably isn't too)

If we put ChunkType as a trait that has a const TYPE: [u8; 4] (and maybe implements Deref<[u8;4]>) then we would be able to make custom constructor for each chunk, for example:

struct IDATChunk;

impl ChunkType for IDATChunk {
    const TYPE: [u8;4] = *b"IDAT";
}

// The above part wouldn't change much the current state/usage

impl IDATChunk { // or maybe impl io::Write
    pub fn write<W: io::Write>(
        writer: &mut W,
        filter: FilterType,
        bpp: BytesPerPixel,
        curr: &[u8],
        prev: &[u8],
    ) {
        // ...
    }
}

Then we could avoid code duplication between the types I mentioned above (Writer/AnimationWriter) and propertary extension could still be made by implementing ChunkType and maybe by exposing a method like get_raw_writer (again it's still too general to get a perfect idea of how things get together).


That's it, I should give it another thought but I want to make progress and can't spend much time on this during the week so I'm going to throw it in and read your opinions

@HeroicKatora
Copy link
Member

HeroicKatora commented Dec 6, 2020

I'm sorry that it took a while to answer but I wanted to be thorough. However, now this PR becomes more complicated with each commit, at least in my perception,—the chunk encoding logic is duplicated, and so is the {Animation,Image}Encoder interface with configuration for gamma, chromaticities,—and I want to respond more quickly to ensure that it doesn't become unreviewable and over engineered. So, this may be slightly unfinished but at least it's a direction.

If backwards compatibility is required, StreamWriter could just become a wrapper around Writer that implements io::Write (or maybe it could be implemented directly in Writer)

It isn't. Currently the plan is for 0.17 to have all breaking changes necessary to improve the API for full APNG support while keeping the rough structure we currently have. And the 0.16 interface is more or less finished, the master changes go into 0.17.


While I'm here, I've seen that the write_chunk method is public and the field in ChunkType is also public so anything could be written between the image data. That is right as if you have a propetary extension you want to be able to write it, but I feel like this is not the best option: (mine probably isn't too)

The png crate should have low-level accessors, I'd even like to give control over the chunk checksums to the user at some point such that broken images can be replicated accurately (or to do stenography, or to disable checksums entirely because they don't have a very useful purpose in this age). The interface is however poor: It should allow assembling the chunk from multiple slices which would avoid some allocations, it should make writing incorrect data (i.e. chunks too long, unknown chunk types, chunk types with incorrect order) more explicit. I'd keep the primitive though in some capacity. An explicit call to a wrapper such as get_raw_writer might be a middle-ground worth looking at. In an independent PR.


Having said that I think we can get rid of the StreamWriter and ChunkWriter objects at the same time by putting in the Writer the prev and curr buffers + a counter for the data written (or the lines written). Then the write_image_data method will just continue to append the data and write IDAT chunks once those are filled, without allocating the two buffers each time, until the complete image gets written.

Point well taken. I do agree that the exposed ChunkWriter shouldn't be the same owning primitive that is used internally, and that it should borrow from the main writer in the same way as StreamWriter. The only negative aspect is that the borrow restricts the downstream interface. There should be a way to stop writing a chunk, drop the borrow, but later continue writing at the exact same state. Maybe these two interfaces would help:

  • StreamWriter::pause(self) -> PartialChunkData
  • Writer::continue_chunk(&mut self, _: PartialChunkData) -> StreamWriter<'_>

Another good idea for another PR :)

I'm much more critical of merging the two types however. The image line state is necessary only for writing image data and the chunk length restriction is on the length of compressed data which we can't necessarily predict from the raw sample buffers alone, as it also depends on the prediction functions etc. Using less buffer copies could work though.


I was thinking about making a new struct that would implement Iterator: [..]
Like that it might be better to call the two types of writers ImageWriter and AnimationWriter

I don't like the idea of additional owning types, it's already too many. It really complicates many designs downstream to have two incompatible types. An example for this is the image::ImageBuffer (and inflexibility of DynamicImage). It works, but isn't ergonomic. The basic idea of having a new, borrowing adaptor for each frame as a sort of temporary type state sounds great. Here's a diff which actually could make it work, instead of implementing Iterator:

-    for frame_enc in frames {
+    while let Some(frame_enc) = frames.next_animated() {

…where fn next_animated(&mut self) -> FrameWriter<'_> is a method on that type that borrows self mutably. This solves the soundness problem, it isn't really necessary to write two frames at once. The interesting thing about this would be that the main writer could be a single type. In the non-animated case we simply would always return None on that method. Plus it lends itself to an workflow where the default image is not part of the animation.

// set encoder options
let mut encoder = Encoder::new(100, 100)
  .animated(20); // Or don't call it, for static PNG.
let mut frames = encorder.write_header()?;

// This will never be part of the animation. 
// If you don't call it on animated PNG, the first animated frame will write IDAT chunks.
frames.write_image_data(&default_image)?;

// Or don't loop, for static PNG.
while let Some(frame_enc) = frames.next_animated() {
    // set frame options
    let writer = frame_enc.write_header(); // or something on that line

    writer.write_image_data(&data)?; // First part
                                                    // ...
    writer.write_image_data(&data)?; // Last part
    writer.finish()?; // Because it can silently fail while dropping
}

If you don't loop then you've naturally written a non-animated PNG. If you don't call the default write_image_data then all frames are part of the animation.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 6, 2020

I agree with you that this is starting to look over engineered, I was going to write a little review of my code addressing the code duplication problem but you preceded me.

Regarding the merging of the {Animation,Image}{Writer,Encoder} I think it can easily be done, I didn't think about it before because I felt that having two distinct type would have been better, but considering the code duplication problem and the downstream API conflicts, it really might be the best option. The only thing that I would change from your example is setting the separate default image data flag with a method so that it could work like now, where the first frame is actually the default image data and the following frames would be considered as proper frames.

I'd like to have more details on the Stream::pause and get_raw_writer part: as I understood you want this PartialChunkData to permit access to the internal writer so that raw data could be written but should the writer flush before or the internal state should remain untouched?

Also could you take a look at chunk new.rs file, I was experimenting with the chunks data type.

I'm going to start working on merging the writers/encoders, thank you for the useful advice

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have more details on the Stream::pause and get_raw_writer part: as I understood you want this PartialChunkData to permit access to the internal writer so that raw data could be written but should the writer flush before or the internal state should remain untouched?

The current StreamWriter and ChunkWriter have two usage problems. Firstly, all data of a chunk must be available upfront, in a single contiguous slice. That often requires an allocation—have a look at the many vec![] calls in chunk writing implementations.

Secondly, they contain logic when dropped, and no way to avoid that code and continue later. Since they borrow the Encoder this can run into very problematic borrow issues where dropping would be a logic but, but the borrow must be (temporarily) ended. That can only be solved by leaking the Writer so that its Drop is never called. That's certainly not optimal and loses some state. The idea of ::pause would be to allow that in a controlled manner where the data is retained and can be continued/borrowed again. As far as state is concerned, there are already plenty of ways to directly write to the stream. The correctness of the produced png is hardly an invariant we do —and need to— uphold if you corrupt your own output stream. I wouldn't see the problem in merely advising to be careful.

src/chunk new.rs Outdated Show resolved Hide resolved
src/chunk new.rs Outdated Show resolved Hide resolved
src/chunk new.rs Outdated Show resolved Hide resolved
src/chunk new.rs Outdated Show resolved Hide resolved
src/encoder.rs Outdated Show resolved Hide resolved
@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 8, 2020

Got inline with the latest commit on master and merged the Writers/Encoders, now it works like your example but with some differences:

For encoding an image

    let mut enc = Encoder::new(file, width, heigth);
    // enc.set...
    let mut ctrl = enc.write_header().unwrap();
    let mut wrt = ctrl.write_frame_header().unwrap();
    wrt.write_image_data(&data).unwrap();
    // ... Nth write
    wrt.finish().unwrap();
    ctrl.finish().unwrap();

For encoding an animation

    let mut enc = Encoder::new(file, width, heigth);
    // enc.set...
    enc.animated(frames, repeat).unwrap();
    let mut ctrl = enc.write_header().unwrap();
    while ctrl.remaining() > 0 {
        // ctrl.set...
        let mut wrt = ctrl.write_frame_header().unwrap();
        wrt.write_image_data(&data).unwrap();
        // ... Nth write
        wrt.finish().unwrap();
    }
    ctrl.finish().unwrap();

For encoding with a separate default image

    let mut enc = Encoder::new(file, width, heigth);
    // enc.set...
    enc.animated(frames, repeat).unwrap();
    enc.with_separate_default_image().unwrap();
    let mut ctrl = enc.write_header().unwrap();

    // this could go in the while loop if the data is already lined up 
    {
        let mut def_img_wrt = ctrl.write_frame_header().unwrap();
        def_img_wrt.write_image_data(&def_img_data).unwrap();
        def_img_wrt.finish().unwrap();
    }

    while ctrl.remaining() > 0 {
        // ctrl.set...
        let mut wrt = ctrl.write_frame_header().unwrap();
        wrt.write_image_data(&data).unwrap();
        // ... Nth write
        wrt.finish().unwrap();
    }
    ctrl.finish().unwrap();

Regarding the ChunkWriter, I've been trying things out a bit, and came to the conclusion that we cannot go any further with its implementation as long as we can't access the inner writer in the ZlibEncoder, that's because we need to be able to write uncompressed IDAT and fdAT chunks in between the compression stream. Having access to the inner writer will make possible have the pause method you were talking about. If you agree with me I'm going to open a PR on the deflate-rs crate to add that method. BTW I haven't studied the compression algorithm so tell me if I'm missing something that would make this worng/unfeasible.

Right now I'm using a BufWriter with an inner ChunkWriter that adds the chunk data on each write (by the BufWriter) and can change between writing an IDAT chunk and a fdAT chunk with a flag. One thing that I have noticed, in fact, is that even with the method I talked above, the ChunkWriter would need a wrapper because it still need the additional chunk data (like the sequence number) and couldn't be generalized to each chunk without having a specific type for each one (that will till need a wrapper enum for allowing to swap between IDAT and fdAT).

@HeroicKatora
Copy link
Member

HeroicKatora commented Dec 8, 2020

A note: You're still doing many, many superfluous edits left-and-right (changing set_palette, changing return types to Self from an explicit name etc, leaving a lot commented code). It detracts from the goal of apng encoding. Please consider that reviewing code takes time, a lot especially if you're introducing a new idea. Ensure that reviewers can focus on that idea instead of having to spend attention on anything else. Unless you didn't mean for the comment to mean: ready for review.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 8, 2020

Sorry for making your job more difficult, it's my first time contributing to an open source project and I have still a lot to learn. I'm going to remove anything that hasn't to do with APNG encoding (like the time chunk that I forgot to remove after trying things out with the "chunk new" file). Also I thought that using Self instead of the explicit name was a good practice (not referring to the errors on the test, but to return type and constructors). Though, with my comment I was more interested on the ZlibEncoder part.

@fintelia
Copy link
Contributor

fintelia commented Dec 8, 2020

The main challenge right now is that the pull request adds/modifies over 1300 lines. The general advice is to break pull requests down as much as possible since even when changes might be an improvement on their own (like improving formatting/naming/consistency) they can become overwhelming when trying to make sure a complex PR is correct.

@HeroicKatora
Copy link
Member

It's my first time contributing to an open source project and I have still a lot to learn.

Welcome then 🙂

On teaching how to contribute to open source, there have already been two good demonstration on why to keep PRs small and self-contained. The first being the error rework which you had to adjust for. Chances are, other changes are in progress in parallel to your changes and every line or idea is a potential point of conflict. Minimizing those means quicker progress.
The second instance, now, the CI failure. Doing a ton of changes only to discover that the first is incompatible with some requirement feels bad and is very unfortunate as it requires extra work picking out those changes that are unaffected. Again, small changes lower the chances of that happening.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 8, 2020

Did some cleaning of changes that were not needed, hopefully this makes your work easier. Let me know what to do for the ZlibEncoder problem/enhancement and thank you for being kind with me, I already understood my errors and I'll try to not repeat them again in future. I've learned (and I'm going to lear) a lot from this experience.

@HeroicKatora
Copy link
Member

Please also remove the Zlib stuff, especially since you don't know how/if it works out. I've opened #267 as a dedicated tracking issue as it is currently merely a draft.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 8, 2020

It works, when I said I didn't understand the compression algorithm I wasn't referring to the usage of the library but to the actual implementation because I don't know if exposing the inner writer invalidates the compression stream. The ZlibWriter just encapsulates what was previously done inside write_image_data (and previously in deflate_image_data) but allowing to load scanlines in more writes.

I'm not sure if you noticed but StreamWriter right now is just a wrapper around Writer, and that's because there isn't really much of a difference between those two. You don't get much of an advantage by having the complete image data all at once instead of randomly sized chunks as you'd still have to filter each scanline individually, compress the filtered data and write it in a PNG chunk that might split the compressed stream at an arbitrary point (the maximum chunk size).

Having said that, if you think this should not be done, taking those changes back it's easy

@HeroicKatora
Copy link
Member

HeroicKatora commented Dec 8, 2020

Compared to Writer, the StreamWriter allows multiple incremental writes while ensuring¹ that those IDAT chunks are not interspersed with any other chunks as required by the specification. Any interface on Writer directly would not give such convenience as it offers too many low-level methods. Reducing the interfaces—and thus adding new invariants—is the essence of the common newtype concept and it was not unnoticed or surprising even though no additional fields have been added. In addition, having two lifetimes on a struct is a bit of design smell when they should be effectively based on each other. It's not more powerful than the smaller of those lifetimes and suggests that the ownership isn't clear.

¹not 100% ensuring if you have a shared writer but very much encourages it.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 9, 2020

The need of a double lifetime comes from the fact that currently the Writer borrows the inner writer of the Controller (as we discussed previously). We can't insert the into_stream method inside the Controller because that would make the animation control really difficult on the StreamWriter side. The only thing we could do is to make the Writer take full ownership of the Controller, that would solve our problems but will make the usage a bit strange as to continue on with the animation you'd have to get back the controller from the finish method of the Writer.

It would look like this:

pub struct Writer<W: Write> {
    // ...
    ctrl: Option<Controller>,
}

impl<W: Write> Writer<W> {
    // ...
    pub fn finish(&mut self) -> Result<Controller> { /* ... */ }
}

@HeroicKatora
Copy link
Member

HeroicKatora commented Dec 9, 2020

It does not reflect my understanding of what needs to own the stream. I'm trying to pick out your conceptual changes of ownership but there are quite a lot and some unclear to me.

The current concepts are:

  • Encoder: Setup the operation, gather all input which we need for the IHDR etc. Then do that write call, transforming it into a Writer.
  • Writer: For the raw chunk-level operations for all chunks after the header, with one utility method of writing several consecutive IDAT chunks.
  • StreamWriter: It's currently for writing a single image's chunks incrementally, ensuring through type interface and the borrow that they are not interspersed with other chunks as the spec forbids this.

Can you make a similarly clear differentiation for each struct in the PR and justify the difference instead of purely an addition? If not, then don't change them. As far as I can follow the changes:

  • Writer corresponds your PR to Controller with less methods and no more low-level chunk writes?
  • StreamWriter is in your PR now Writer? I honestly can't tell. The backwards compatibility justifications for its conversion to a StreamWriter is brittle at best because you've change the type definition of Writer itself, hence there will be no compatibility anyways. I'd also be interested why you want to differentiate between your versions of Writerand StreamWriter? What guarantees or ergonomics does the distiction afford?

Please focus on the last point in particular since that is the source for your two-lifetime borrows. There's a substantial amount of code expended to that complexity: The Option<io::BufWriter<ChunkWriter<W>>> for taking the stream from behind a borrowed struct, a lot handling for then calling methods after the stream was taken, and configuring and tracking different aspects of the writing in three different structs.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Dec 9, 2020

Ok so, my idea was to remove the Writer and use only StreamWriter as the only type of writer, but now I don't think this is a good idea anymore (forgive me). The current StreamWriter is just an abomination I came up with to fill the gaps and to make all tests work.

For the understanding of the current mechanism, all the structs have the same functions as before, what is new is the Controller which stands in the middle between the Encoder and the Writer. It simply counts the number of images you write and writes the "image header" each time you ask for a new one; this header is either nothing, if it's an image or if it's the separate default image of the animation, or the fdAT chunk for the animation frames. The current Controller spits out Writers that borrow it's inner writer, so that it's not possible to have to writers on the same controller at the same time. That, though, will make the StreamWriter unable to own the writer (because it "doesn't own itself"). The only way to make it work is by having the Writer take ownership of the Controller for as long as it lives, so that it could be later taken back (with the finish method) and used to write other images.

On the Option<W> part, it's the only way to get rid of the need of code execution in the destructor, it's even used in the deflate-rs crate for the same reason.

I'm currently taking back the StreamWriter and making the Writer own the Controller, it's not going to take that much, then you'll probably understand better what is going on. (I understand I made really poor decisions)

.gitignore Outdated Show resolved Hide resolved
@Rimpampa
Copy link
Contributor Author

I completely forgot to post a comment, sorry for taking so long. I think it's ready for a review, I did a test yesterday and the encoding of animations was working

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Jan 8, 2021

@HeroicKatora I was wondering is there any progress going on? I don't want to sound annoying, just asking

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still seems to complicated. I had a question that was not yet answered:

Can you make a similarly clear differentiation [in purpose] for each struct in the PR and justify the difference instead of purely an addition [of methods to existing structs]?

On style itself, this article is a good primer on writing code that will be maintained for probably a good while.

For proceeding I would encourage you to split out the various encode methods into a separate PR—or move those changes to the front of the commit series by rebasing. They are unquestionably useful in their implementation and rather straightforward. Then start from a blank canvas and unchanged structs, and try to get the same APNG working with minimal interface work. A system is clean not when there is nothing more to add, but when there is nothing more to take away.

I'm sorry for the delay between the implementation and that feedback—took the holidays for something completely different—, but I'm convinced it will be worth it to work on the interface.

src/common.rs Outdated
if let Some(t) = &self.trns {
chunk::encode_chunk(w, chunk::tRNS, t)?;
}
self.time.map_or(Ok(()), |v| v.encode(w))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_or is clever but not nice to read. There's reason for the surrounding code to not use it, and for clippy to lint against similar patterns.

Comment on lines +477 to +486
pub struct Controller<W: Write> {
w: Option<W>,
data: ChunkData,
// here the FrameControl is used for costomized data
info: Info,
total: u32,
written: u32,
filter: FilterType,
adaptive: AdaptiveFilterType,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writer was a better name for this. The library writes png, the container format, not only PNG pixmaps. I'd much rather see the specialized structs named ImageWriter, FrameWriter or something of that sort.

adaptive: AdaptiveFilterType,
}

impl<W: Write> Controller<W> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was write_chunk removed from this struct and is no only available when we can write a frame? That is a clear regression in functionality.

self.total - self.written
}

pub fn next_image(mut self) -> Result<Writer<W>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving makes for an awkward design (see the matching on Option<W> later) because Rust's type system is linear. The concept of type states was rejected very early in the language design and for good reason. It requires every consumer to effectively implement the state machine in their own types as enum variants or something.

The idea of enforcing order through the type system might be 'clever' but is hardly usable in practice, it should be an implementation detail and there is no shame in enforcing it through dynamic checks. Unless some data structure invariant depends on it, but there is no such requirement. Since the library can not write all valid PNGs there must be an escape hatch, even if that enables the user to write 'invalid' PNGs—that might even be a feature.

if let Some(c) = &self.info.source_chromaticities {
let enc = Self::chromaticities_to_be_bytes(&c);
write_chunk(&mut self.w, chunk::cHRM, &enc)?;
pub fn finish(&mut self) -> Result<W> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, on the other hand, should take self by value. The specification clearly states that it is required and there are already escape hatches for both variants of not adhering to it: Writing no end can be done by forgetting, writing an additional end by manually calling write_chunk. Instead, this method should have an ergonomic interface where the common usage has the right effect.

You might simply hide the implementation taking &mut _ in a private method.

self.write_chunk(chunk::IDAT, &chunk)?;
}

pub fn finish(&mut self) -> Result<Controller<W>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, take by value in the public interface.

self.write_chunk(chunk::IDAT, &chunk)?;
}

pub fn finish(&mut self) -> Result<Controller<W>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another point against returning the Controller by value, an error will simply drop the controller including the stream and associated data. You'd have to move the controller into the error but please don't do this. If you instead borrowed from it then this can be avoided entirely.

Comment on lines +985 to +988
.unwrap()
.next_image()
.unwrap()
.write_image_data(&buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it pretty clear that the changes are an ergonomic loss. A method call without parameters, between two unwraps/errors is not what you want the common, simple usage to look like.

Comment on lines +207 to +210
self.source_gamma.map_or(Ok(()), |v| v.encode(w))?;
self.source_chromaticities.map_or(Ok(()), |v| v.encode(w))?;
}
self.animation_control.map_or(Ok(()), |v| v.encode(w))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_or is clever, it is not particularly readable and not necessarly. clippy will tell you to use if let instead. I'm puzzled why that was changed in the first place.

Comment on lines +234 to +235
filter: <_>::default(),
adaptive: <_>::default(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotations make this a lot more readable, if one wants to understand what is happening. Another unecessary clever change. You seem to want to try out stuff but changing working, existing code is not the proper outlet for this.

@Rimpampa
Copy link
Contributor Author

I understand all the points on the cleverness of the code I wrote, and please don't get me wrong I didn't wrote that because I don't care about those things, it's just how I'm used to. The article you linked was helpful and I've been encountering many of those aspects since when I started working as programmer because the codebase I'm working on is really full of those clever lines and I have to go back and forth every time to understand what is going on and talking with my collegues started to make me think more on this aspect of writing code.


I will open the pull requests for the encode methods as you suggested as soon as I can. Should I implement them in the encoder.rs file like I did here or directly in common.rs? Should it be a single PR or one for each struct?


Regarding the other questions I feel like many of them have already been answered in my previous comment. I think there has been a misunderstanding in that situation as the reaction to the comment and the fact that no other comment was submitted in the following day made me think that you were ok with what I wrote and that's why I started working on it.

Regardless, I'm going to explain it again:

First of all the only two addition are the Controller struct and the ChunkData enum, all the others (Writer, Encoder, StreamWriter, ChunkOutput, ChunkWriter) have been modified just to match the other code and to allow the new functionalies.

I feel like the Controller name is kind of confusing you as I've seen you often comparing it to the Writer but it's not what it is. I cannot think of a better name for it but hopefully this will help.

The Controller as 3 functions:

  • select the parameters for the image that is going to be written next (this includes the filter, the blend op, the frame delay, etc.);
  • write an image "header" depending on if it's an image or an animation and other things (separate default image, first frame, etc.);
  • provide a writer that allows the image data to be written;

+1 that will be added, as you suggested: write chunks in-between frames directly (through Controller::write_chunk).

Here comes in the new ChunkData: it allows to store the state of the image/animation so that neither the Writer nor the StreamWriter have to know what they are writing, they just call encode and it will take care of writing IDAT or fdAT chunks, incrementing the sequence number. BTW I'd like some more explanation on the comment about the state machine as I didn't quite get that.

The Writer and StreamWriter didn't see much of a change other that the use of an Option<_<W>>. This allows to move the inner writer without having to drop all the struct and that is needed to allow for the Writer and StreamWriter to own the controller. As an example I pointed to the deflate-rs crate that does the same thing for the same reason. And yes the public interface can easily be modified so that it consumes the Writer.

Now, the StreamWriter is supposed to be able to either own the inner writer or borrow it exclusively, but on your comment you suggested removing the Option around the inner writer so that it could be borrowed, but by doing so then it would be impossible to have a StreamWriter that owns the inner writer. You are right that it would allow for the controller to not be dropped when an error occurs during a finish call, but how would you recover from a probably corruped PNG stream during the encoding? Wouldn't you just have to start all over again? Currently there is no error recovering system so I don't think it's important, let me know about that.

On the note about not being scared of making dynamic checks in order to favour simplicity I'll try to come up with an implementation that uses only Writer and StreamWriter. When I did it the first time I found it confusing, now that I understand the library a bit better I might find a better way.

Let me know what you think of everything, also if you can give me other tips or anything it will be highly appreciated.

@HeroicKatora
Copy link
Member

HeroicKatora commented Jan 12, 2021

The Controller as 3 functions: [..]

This is pretty much what I had meant for and described the current Writer to provide, except the parameter selection was never added. It's a strict addition, so no new type is necessary to justify this. The other two points apply directly: (2) by write_image and (3) by the borrow/conversion to StreamWriter. Plus the additional functionality of writing arbitrary chunks independent of an image because the PNG specification does not place inherent order constraints on chunks, except for:

  • IHDR (shall be first; enforced by write_header)
  • IEND (shall be last; enforced by drop)
  • the standard chunks, not consistently enforced and doing all of them by type system leads to the problem address in the following section on state machine.

On owning vs. non-owning I'm very confused by your statement that an Option is required—mostly for that fact that you have kept ChunkOutput. That enum models a solution for precisely this problem: the ability to either own or borrow a controller and its contained writer W. And before your PR the StreamWriter already does this. (stream_writer(&mut) vs. into_stream_writer). It that is no longer clear to yourself then cleaning up and rebase could give the breather to take another look at the current types, unconfounded by the lifetime troubles of the first approach.

On state machine: Rust's type system makes it so that you can use parameters zero or one time but you can not arbitrary copy them (this propery is called affine). When writing methods that take their parameter by value your code places a restriction on the caller: They can not use that object again, and they must have owned it. So, the methods taking self do encode a statemachine through the type system. (Clarification: I think this is an anti-pattern):

Encoder --[write_header]--> Controller --[next_image]--> Writer --[into_stream_writer]--> StreamWriter

By doing this, any caller that wants to call next_image twice must own a controller, or encode in its own struct the state for 'this controller is currently wrapped in a Writer', even if they never intend to do a partial write of an image. In other words, they must Option-wrap the Controller. Confusingly, the methods to get the controller back (Writer::finish) do not take by value so you still remaing with a Writer afterwards. Now that's plainly inconsistent. And no caller can rely on ever getting a Controller back, at least as far as the type system is concerned.

All the Option wrapping troubles come from a similar fact: By choosing the signature finish(&mut self) -> Result<W>, you promise to be able to produce a value of type W and the only way to do is to take one from Controller, but you must keep a valid controller. Hence, ZlibEncoder and my comment proposes not to choose such a signature.

Now you, proposing that dropping the Controller in an error case is fine, ask:

But how would you recover from a probably corrupted PNG stream during the encoding

However, consider that most errors do not signal a corrupted PNG stream. Except for IO errors, most are checked before any write is performed. Your own added checks for APNG are of that sort. Mostly, all produced chunks will be intact and complete. Recovery would be possibly, in libpng et.al place great value in error recovery, even when not perfect or even when it means dropping a chunk. (The situation could be improved in a few places).

@Rimpampa
Copy link
Contributor Author

You did not understand my statement on owning/borrowing because you were not considering the Controller but this doesn't matter anymore because now that I understood clearly your point on the state machine and the Rust type system it's obvious that it will have to be removed.

BTW I when I wrote "And yes the public interface can easily be modified so that it consumes the Writer" I meant that the finish method could have the signature you asked because it doesn't really matter on the implementation side, I probably should have been more explicit.

Right now the main thing I'm not sure of is how to handle animations on the StreamWriter side and also how to decide when the parameters(blend op, frame delay, ...) can be changed. Because on the Writer side is pretty straghtforward: set the parameters, write the data, go on with the next frame, but on the StreamWriter there are more calls to write for the same frame. Solutions that come to my mind are:

  • keep track of the amount of data written and write fcTL when a new frame is added, and allow changing parameters only when the data written for the new frame is 0;
  • Add a method like next_frame that and takes and writes an fcTL with the parameters for that frame without considering the amount of data already written (as it does right now);

Related to the last point is why the Writer side checks that the data you provide fills the entire image while allowing to write more data after that but on the other end the StreamWriter does nothing.
Also why check the palette to be present on the write_image_data if the color type is Indexed as if one of the two can change while the image is being written, while also not performing any check on the StreamWriter.

What is confusing me here is the line between the need of checking the correctness of what it's going to be written and blaming who is using the library for using it wrong.

If you could answer those questions and the other ones I asked on the previous comment about what to do with the encode methods I could start working on it tomorrow.

@HeroicKatora
Copy link
Member

If you could answer those questions and the other ones I asked on the previous comment about what to do with the encode methods I could start working on it tomorrow.

One PR for all the encoding methods would be fine. The impls can live where each struct is defined, but that's not a strong preference. I'll get back to the rest of the response later.

@HeroicKatora
Copy link
Member

Side note on terminology: We're both discussing current semantics and wanted semantics. For obvious reason using the type names of this PR is confusing for those cases. Try to avoid renames such that semantics collide with current structs of the same name in the rework. Or we discuss the renames as a dedicated preceeding PR. If you want to do this, see: renaming in gif and its current name design. In any case I'll use the terminology of this PR for now so that we can also refer to any additional types..


Right now the main thing I'm not sure of is how to handle animations [..]

Then don't. Have each just write at most a single frame, just like it is now. The main functionality of not requiring a full allocation is still there even when one needs to call the Controller::next_image method multiple times.

Right now the main thing I'm not sure of is how to [..] decide when the parameters(blend op, frame delay, ...) can be changed.

It should be clear that changing a parameter has effect on its next usage, not immediately or retroactively. That is, there is little reason to stop it from being changed or to require it to be only changed once. So I would add setters of frame parameters to Controller and setters for parameters relating to image data—well, only FilterTypealso to Writer/StreamWriter. Then the former are used for fcTL chunks (blend, dispose, etc.) and the latter can be changed when different scanlines should prefer different filters.

What is confusing me here is the line between the need of checking the correctness of what it's going to be written and blaming who is using the library for using it wrong.

An interesting little point here regarding Writer vs. StreamWriter: It's not necessary to have them as separate types. In fact, their methods may be merged. All that is necessary is to make the methods that write full frames return an error if any partial write through the stream methods has been performed. The only 'risk' involved for the caller with StreamWriter is that they might forget writing part of the frame, which corrupts the PNG. But neither the merge nor the split types addresses this risk differently with regards to their finish call nor their implicit Drop. In both cases a partial frame, i.e. having partial pixel data or having no pixel data, is incorrect according to the PNG specification but not fatally incorrect for our own program state. It's hard to say if we should even impose a hard requirement on image correctness here as long as the chunk framing is proper—which it is when IO itself succeeds. Then a lenient decoder might just skip or zero-fill bad frames, the rest of the image is still intact.

I briefly touched on this above. I don't think it is necessary to enforce total correctness of the PNG, except where we can make a specific guarantee and the caller explicitly requests us to do so. Accidentally making something impossibleis worse than incorrectness on wrong usage. That's also the reason why the write_chunk method is exposed in the first place. After all, it would allow arbitrary incorrect chunks to be inserted at arbitrary places. But that is clear to the caller.

The write_image_data is meant as a short cut, relative to StreamWriter. As for the check on indexed, there was an idea floating around of having the palette write being delayed—not automatically done in write_header—hence the duplicate check. That didn't materialize so it's probably in a weird spot. It wouldn't hurt to check it for the StreamWriter construction, too.

@Rimpampa
Copy link
Contributor Author

Rimpampa commented Feb 4, 2021

This is taking very long, these weeks have been quite busy for me but I've got some work done, this weekend I will finish it. If that's okay with you I'll open a new pull request where I start again from the current master (as you suggested) in order to clean up the messy commit history this one has.

@Rimpampa
Copy link
Contributor Author

After another week I did it even if there was no answer, I opened a new PR and I'm closing this one

@Rimpampa Rimpampa closed this Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants