New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few blitting changes for better BB8 performance #816
Conversation
I quickly tested this on a Kobo in 32bpp & 8bpp mode (surprise! :D), haven't managed to break anything yet. Might be worth checking the emulator and/or 16bpp, though ;). I still have no idea why nightmode is broken with the C blitter on Kobo, while it appears to work on Android & SDL. |
NOTE: That re-order was probably unwise as far as making the diff readable goes. Here are the actual changes (i.e., diff --git a/blitbuffer.c b/blitbuffer.c
index 7e84a34..6e51a6a 100644
--- a/blitbuffer.c
+++ b/blitbuffer.c
@@ -850,6 +850,68 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
}
o_y += 1;
}
+ } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BB8A) {
+ Color8A *srcptr;
+ Color8 *dstptr;
+ o_y = offs_y;
+ for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+ o_x = offs_x;
+ for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+ BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+ BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
+ alpha = srcptr->alpha;
+ ainv = 0xFF - alpha;
+ dstptr->a = dstptr->a * ainv + srcptr->a * alpha;
+ o_x += 1;
+ }
+ o_y += 1;
+ }
+ } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB16) {
+ ColorRGB16 *srcptr;
+ Color8 *dstptr;
+ o_y = offs_y;
+ for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+ o_x = offs_x;
+ for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+ BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+ BB_GET_PIXEL(src, sbb_rotation, ColorRGB16, o_x, o_y, &srcptr);
+ dstptr->a = ColorRGB16_To_A(srcptr->v);
+ o_x += 1;
+ }
+ o_y += 1;
+ }
+ } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB24) {
+ ColorRGB24 *srcptr;
+ Color8 *dstptr;
+ o_y = offs_y;
+ for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+ o_x = offs_x;
+ for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+ BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+ BB_GET_PIXEL(src, sbb_rotation, ColorRGB24, o_x, o_y, &srcptr);
+ dstptr->a = RGB_To_A(srcptr->r, srcptr->g, srcptr->b);
+ o_x += 1;
+ }
+ o_y += 1;
+ }
+ } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB32) {
+ ColorRGB32 *srcptr;
+ Color8 *dstptr;
+ uint8_t srca;
+ o_y = offs_y;
+ for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+ o_x = offs_x;
+ for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+ BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+ BB_GET_PIXEL(src, sbb_rotation, ColorRGB32, o_x, o_y, &srcptr);
+ alpha = srcptr->alpha;
+ ainv = 0xFF - alpha;
+ srca = RGB_To_A(srcptr->r, srcptr->g, srcptr->b);
+ dstptr->a = DIV_255(dstptr->a * ainv + srca * alpha);
+ o_x += 1;
+ }
+ o_y += 1;
+ }
} else if (dbb_type == TYPE_BB8A && sbb_type == TYPE_BB8A) {
Color8A *dstptr, *srcptr;
o_y = offs_y;
@@ -860,7 +922,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
alpha = srcptr->alpha;
ainv = 0xFF - alpha;
- dstptr->a = dstptr->a * ainv + srcptr->a * alpha;
+ dstptr->a = DIV_255(dstptr->a * ainv + srcptr->a * alpha);
o_x += 1;
}
o_y += 1;
@@ -882,13 +944,18 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
} else if (dbb_type == TYPE_BBRGB16 && sbb_type == TYPE_BB8A) {
Color8A *srcptr;
ColorRGB16 *dstptr;
+ uint8_t dsta, bdsta;
o_y = offs_y;
for (d_y = dest_y; d_y < dest_y + h; d_y++) {
o_x = offs_x;
for (d_x = dest_x; d_x < dest_x + w; d_x++) {
BB_GET_PIXEL(dst, dbb_rotation, ColorRGB16, d_x, d_y, &dstptr);
BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
- dstptr->v = RGB_To_RGB16(srcptr->a, srcptr->a, srcptr->a);
+ alpha = srcptr->alpha;
+ ainv = 0xFF - alpha;
+ dsta = ColorRGB16_To_A(dstptr->v);
+ bdsta = DIV_255(dsta * ainv + srcptr->a * alpha);
+ dstptr->v = RGB_To_RGB16(bdsta, bdsta, bdsta);
o_x += 1;
}
o_y += 1;
@@ -966,6 +1033,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
dstptr->r = DIV_255(dstptr->r * ainv + srcptr->r * alpha);
dstptr->g = DIV_255(dstptr->g * ainv + srcptr->g * alpha);
dstptr->b = DIV_255(dstptr->b * ainv + srcptr->b * alpha);
+ //dstptr->alpha = dstptr->alpha;
o_x += 1;
}
o_y += 1;
@@ -982,7 +1050,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
dstptr->r = srcptr->r;
dstptr->g = srcptr->g;
dstptr->b = srcptr->b;
- dstptr->alpha = 0xFF;
+ //dstptr->alpha = dstptr->alpha;
o_x += 1;
}
o_y += 1;
@@ -999,7 +1067,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
dstptr->r = srcptr->a;
dstptr->g = srcptr->a;
dstptr->b = srcptr->a;
- dstptr->alpha = 0xFF;
+ //dstptr->alpha = dstptr->alpha;
o_x += 1;
}
o_y += 1;
@@ -1007,16 +1075,21 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
} else if (dbb_type == TYPE_BBRGB32 && sbb_type == TYPE_BB8A) {
ColorRGB32 *dstptr;
Color8A *srcptr;
+ uint8_t dsta, bdsta;
o_y = offs_y;
for (d_y = dest_y; d_y < dest_y + h; d_y++) {
o_x = offs_x;
for (d_x = dest_x; d_x < dest_x + w; d_x++) {
BB_GET_PIXEL(dst, dbb_rotation, ColorRGB32, d_x, d_y, &dstptr);
BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
- dstptr->r = srcptr->a;
- dstptr->g = srcptr->a;
- dstptr->b = srcptr->a;
- dstptr->alpha = srcptr->alpha; // if bad result, try: 0xFF - srcptr->alpha
+ alpha = srcptr->alpha;
+ ainv = 0xFF - alpha;
+ dsta = RGB_To_A(dstptr->r, dstptr->g, dstptr->b);
+ bdsta = DIV_255(dsta * ainv + srcptr->a * alpha);
+ dstptr->r = bdsta;
+ dstptr->g = bdsta;
+ dstptr->b = bdsta;
+ //dstptr->alpha = dstptr->alpha;
o_x += 1;
}
o_y += 1; |
Trusting you on this :) (the comments in mupdf.lua sounds like they are mine, but I don't remember any of that :|) |
BB8 is Kindle, anything else? (Kobo is/was BB8 ?! I thought it was BB16 on old FW, BB32 on newest) |
Yep, those were yours, but the MuPDF API was making it impossible to do some of this when you originally implemented it, IIRC ;). I'm working on making BB8 usable on Kobo ;). But, yeah, historically, it's been Kindle >= 5 & most PB. Earlier devices were using BB4. |
I'm not quite sure MuPDF actually knows how to handle Y8A (i.e., BB8A), but on the other hand, I don't think anything uses this anywhere either ;). |
Hmm, managed to segfault somewhere somewhat reproducibly @ 8bpp, might want to hold off on that until I identify the culprit, but it's image related, so I smell a rat ;). Might take a few days, as I'm going to be busy in the next few days ;). EDIT: I'd forgotten to push the latest commit to my device. Might not be it, but that's something, at least :D. |
(That's from mupdf.scaleBlitBuffer from BBRGB32) Yaaaaaaaaaay. -_-" It's of course semi-random, and not a hard-crash on a specific image. |
I smell a debug build in my near future... |
Oh, joy, it's crashing in inline asm -_-". |
Still crashes with the pure C version. -_-". Completely unrelated to this PR, AFAICT, though... |
It's potentially crashy, and impossible to debug. c.f., https://bugs.ghostscript.com/show_bug.cgi?id=698879 koreader#816 Also, honor debug builds in MuPDF (as it enforces CFLAGS).
Okay, reproduced with a vanilla nightly @ 32bpp, so, not my bug \o/. I'm thinking of eventually moving to a real image processing library for scaling (something like zimg). |
ffi/png.lua
Outdated
local width = ffi.new("int[1]") | ||
local height = ffi.new("int[1]") | ||
local ptr = ffi.new("unsigned char*[1]") | ||
local err = lodepng.lodepng_decode32_file(ptr, width, height, filename) | ||
local err = lodepng.lodepng_decode_file(ptr, width, height, filename, fmt, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's that 8 at the end? (It sounds suspiciously low compared to 32 for color screens, at a quick glance without looking up the API/headers. :-P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the per channel bitdepth ;).
(i.e., bpp = ncomp * 8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're definitely not doing HDR ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww, 10 bit ftw! :-P
And it's because of the setAllocated call in pic/openPNGDocument. Commenting it out fixes the crash, which I can indeed fairly quickly reproduce by alternating between "opening" a PNG, and "View full size cover" on that same PNG. |
Which is problematic, since that memory is indeed allocated by lodepng, and does indeed need to be free'd ;). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, it's always something like that :-D
…ffer Now that we can tell MuPDF the exact stride and whether an alpha channel is present, this *should* behave. We still have a few BB types that needs the temporary copy, mostly because they're terrible to begin with, so we can't blame mupdf for not supporting 'em ;). (4bpp, because it's, err, fun to address...) (RGB565, because HAHAHAHAHAHAHAHAHAH).
<Insert Star Wars joke here>
anything. Mainly with a BB8A source. Also stopped 32bpp from overwriting dst->alpha, as I don't think the Lua code does it?
That's obviously a NOP, so, don't ;p.
At least that made me check the code, and confirm, that, yes, it enforces an alpha channel if w or h are floats. https://github.com/ArtifexSoftware/mupdf/blob/f31336616ac256a54a567c76528a0b3e7ee4e55c/source/fitz/draw-scale-simple.c#L1580-L1582
number of circumstances.
No idea why it only crashes in the emulator, though.
This reverts commit f5f47a0.
Grmbl, ffi-cdecl...
Well, the most obtuse bug ever goes to ffi-cdecl, for fucking up a size_t typedef, leading to fun & interesting memory corruption on platforms where it's a long uint. (My guess is the palettesize one in LodePNGState). |
Which means I'm now going through every ffi header, because that's not the first time I've seen it screwing with typedefs... |
About:
It shouldn't crash anywhere... It looks to me there are more risks with the other bb like |
@poire-z : Yeah, that was definitely a false alarm because of the size_t induced memory corruption ;). (Which also neatly explains why I had no issues on ARM, because size_t == uint there, and that's what they were defined as in the ffi stuff). In this specific case, if I understood this correctly, that wasn't really a memleak, the gc would eventually have taken care of it, right? |
Yeah, the BBs that host the raw image data from the decoder are definitely left to the GC to handle, AFAICT (and they're all flagged allocated, either explicitly, or via new, so provided someone or the gc takes care of the bb, the cdata goes with it). |
Looks like so. But then, why did I took care of not missing a call to bb:free() in so many places ? :| Just to have them done early at a deterministic moment to reclaim that memory quickly ? I thought it was required - or was it just the GC to slow to take care of it. A bit confused :)
On this, I'm even more confused. I'd say it depends: on where it come from, on the kind of lua thingy (userdata vs lightuserdata if I remember) used to store it... |
I'm definitely with you on the "deterministic" approach ;). Plus, I just find it easier to keep track of when reading the code. (Either you're in C land and you do explicit memory management, or you're not and you don't. The maybe yes, maybe not feeling you get with a GC makes me queasy and never quite sure of what's happening ^^). |
Okay, this should (finally!) be ready to go :). |
@@ -249,7 +249,7 @@ function framebuffer:clear(w, h) | |||
-- and regions of memory outside of the visible screen *may* be used by other things in the system, | |||
-- for performance reasons, and we do not want to screw them over ;). | |||
-- NOTE: This should be equivalent to line_length * yres | |||
C.memset(self.data, 0xFF, w * (self.fb_bpp / 8) * h) | |||
ffi.fill(self.data, w * (self.fb_bpp / 8) * h, 0xFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about this before.
https://luajit.org/ext_ffi_api.html
Performance notice: ffi.fill() may be used as a faster (inlinable) replacement for the C library function memset(dst, c, len). Please note the different order of arguments!
local fh = io.open(filename, "r") | ||
assert(fh, "couldn't open file") | ||
local fh = io.open(filename, "rb") | ||
assert(fh, "couldn't open JPG file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all pcalled in front, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, picdocument's pcalling that (which initially threw me for a loop, as errors get thrown into the void) ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to think of pcalls without catching the error as a code smell, also see #827
* Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe Required for koreader#4629. * A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
* Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe Required for #4629. * A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
…4631) * Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe Required for koreader#4629. * A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
Update mupdf.scaleBlitBuffer to avoid internal bb copies/conversions as much as possible (i.e., handle BB8 & BBRGB24 natively).
Update the C blitter to fix BB_alpha_blit_from to: