-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PNG with zero sized IDAT is loading successfully as black image #1075
Comments
The bytes are definitely not undefined and zeroed, as the image buffer is created as a zeroed |
Would not be then better to start with vector of uninitialized values (after are things like this fixed)? |
Not really, it's very risky and the benefits are minimal. Allocating a large zeroed vectors is marginally slower, sometimes even not slower at all, than allocating an uninitialized vector. |
To add details, current computers can zero memory at something like ~50GiB per second so the total time to zero a vector is not going to be very large. Plus at the end of the process, the contents of the vector will now be in cache so subsequent accesses while decoding will be faster, which can further mitigate the overhead |
@HeroicKatora So lets see: // cargo-deps: rand = "0.7.3"
extern crate rand;
use std::alloc::{alloc, alloc_zeroed, dealloc, Layout};
use std::time::Instant;
use rand::Rng;
fn main()
{
for alloc_func in &[alloc, alloc_zeroed]
{
let mut rng = rand::thread_rng();
let mut useit: usize = 0;
let start = Instant::now();
for _ in 0..1000_000
{
let layout = Layout::from_size_align(rng.gen_range(1_000_000, 10_000_000), 1).unwrap();
let ptr = unsafe {alloc_func(layout)};
useit = useit.wrapping_add(ptr as usize);
unsafe {dealloc(ptr, layout)};
}
println!("{} seconds (sum {:X})", start.elapsed().as_secs_f32(), useit);
}
} Example output on my machine:
Allocations are not so slow. Allocating large zeroed memory is much slower than just allocation itself. In this test few thousand times slower. I used allocations from 1 to 10 MB which is about range for typical color images with resolution from about 640x480 to 1920x1080. I understand that using excess unsafe blocks or giving to user vector of MaybeUninit may not be the best thing. But this can be accomplished without using unsafe blocks for example construct vector with_capacity and push pixels into it (or maybe use size hint with collect). @fintelia Zeroing memory is nothing special. It is as fast as filling it with any other value. In worst case writing decompressed values can be as fast as writing zeros so in worst case this could mean 50% performance penalty. If it fit into L1 or L2 cache then this lost may be lesser but not because we first wrote zeros into cache. Writing just decompressed data without first writing zeros will be still faster. But if processor has 512 kB of L2 cache per core then it cannot fit even 640x480 color image. And L3 cache is not much faster than memory. The best would be to do some benchmark to see how big is this overhead actually. |
@misos1 Marginally slower in the total of reading the image, of course. We are talking about trading som The much more effective change would be to build on the current system to avoid the allocation by allowing reading into a pre-allocated buffer. |
@HeroicKatora Yes this makes sense. But 50 GB/s is probably more like multi-core performance. I would expect more like half of that for single-core performance on high end systems. I do not think this could be compensated by filing zeros acting as prefetching for which there are prefetch instructions. But you are right that it is better to fill it with zeros rather than to use uninitialized values. But why not to use |
@HeroicKatora It would be like safe "uninitialized" values. If is currently not possible to push individual pixels into vector then its length can be raised row by row and each row filled first with zeros then with decoded image data. Row of image is much smaller and can fit into L1 cache so such zeroing would be really negligible. |
@misos1 Because it would commit to |
@HeroicKatora Did you mean |
I'm interested in how you think |
@HeroicKatora Seems I just misunderstood something. You probably meant container type not element type. |
No, I did mean the element type. The decoder will dynamically find the channel format which might turn out to be several channels of type
|
Ok I see now. Currently are all images loaded into right |
The benchmark posted above exhibits really weird effects because it doesn't actually touch the memory that was allocated. On my machine for instance, increasing the allocation size 100x or 1000x actually makes the zeroed case several orders of magnitude faster, and the two functions end up performing the same. Not to mention that the allocation pattern is so simple that the allocator only ever returns a half dozen or so distinct pointers |
Try to change second parameter of |
And I want to remark that using |
I am on kernel 4.15:
So it has somehow prepared already zeroed pages? |
Or it will lazily zero those pages. Maybe that means that it is actually slower if one tries to fill them very rapidly. It also means that there is no concrete better and profiling is much more useful then speculating on the exact effects of eliding some initialization. |
This is the magic of |
Yeah, Linux does lazy allocation for calls to mmap. It just records that the allocation happens but doesn't do any work or actually map the page until the application tries to access it. When/if the application does an access to the memory it will get a page fault, at which point the kernel will zero a page of memory and map it at the given address (this is why performance is always the same: Linux requires the page will be zero so there is no extra work for alloc_zeroed vs plain alloc) |
|
Specifically |
Yes right now I found it. I did not check before I wrote that comment. I was wondering why is code with |
Expected
Program should panic at
image::load_from_memory(bad_png).unwrap()
. Loading invalid image should fail. If there is no IDAT or size of decompressed data is too small to fill whole image then loading fails but not with zero sized IDAT.Actual behaviour
Program runs successfully.
image::load_from_memory(bad_png)
returns Ok. And resulting image is black. Raw bytes are all zero and maybe actually undefined.Reproduction steps
The text was updated successfully, but these errors were encountered: