Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few blitting changes for better BB8 performance #816

Merged
merged 51 commits into from Feb 21, 2019

Conversation

NiLuJe
Copy link
Member

@NiLuJe NiLuJe commented Feb 14, 2019

  • Update mupdf.scaleBlitBuffer to avoid internal bb copies/conversions as much as possible (i.e., handle BB8 & BBRGB24 natively).

  • Update the C blitter to fix BB_alpha_blit_from to:

    • Actually correctly alpha-blend with some source formats
    • Fully handle BB8 as a destination format

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 14, 2019

I quickly tested this on a Kobo in 32bpp & 8bpp mode (surprise! :D), haven't managed to break anything yet.

Might be worth checking the emulator and/or 16bpp, though ;).

I still have no idea why nightmode is broken with the C blitter on Kobo, while it appears to work on Android & SDL.
Not that I care about it, but, still, mysterious :D.

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 14, 2019

NOTE: That re-order was probably unwise as far as making the diff readable goes.

Here are the actual changes (i.e., git diff dddeead2661035fa6d880272c204535ae648cf95)

diff --git a/blitbuffer.c b/blitbuffer.c
index 7e84a34..6e51a6a 100644
--- a/blitbuffer.c
+++ b/blitbuffer.c
@@ -850,6 +850,68 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
             }
             o_y += 1;
         }
+    } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BB8A) {
+        Color8A *srcptr;
+        Color8 *dstptr;
+        o_y = offs_y;
+        for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+            o_x = offs_x;
+            for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+                BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+                BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
+                alpha = srcptr->alpha;
+                ainv = 0xFF - alpha;
+                dstptr->a = dstptr->a * ainv + srcptr->a * alpha;
+                o_x += 1;
+            }
+            o_y += 1;
+        }
+    } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB16) {
+        ColorRGB16 *srcptr;
+        Color8 *dstptr;
+        o_y = offs_y;
+        for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+            o_x = offs_x;
+            for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+                BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+                BB_GET_PIXEL(src, sbb_rotation, ColorRGB16, o_x, o_y, &srcptr);
+                dstptr->a = ColorRGB16_To_A(srcptr->v);
+                o_x += 1;
+            }
+            o_y += 1;
+        }
+    } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB24) {
+        ColorRGB24 *srcptr;
+        Color8 *dstptr;
+        o_y = offs_y;
+        for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+            o_x = offs_x;
+            for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+                BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+                BB_GET_PIXEL(src, sbb_rotation, ColorRGB24, o_x, o_y, &srcptr);
+                dstptr->a = RGB_To_A(srcptr->r, srcptr->g, srcptr->b);
+                o_x += 1;
+            }
+            o_y += 1;
+        }
+    } else if (dbb_type == TYPE_BB8 && sbb_type == TYPE_BBRGB32) {
+        ColorRGB32 *srcptr;
+        Color8 *dstptr;
+        uint8_t srca;
+        o_y = offs_y;
+        for (d_y = dest_y; d_y < dest_y + h; d_y++) {
+            o_x = offs_x;
+            for (d_x = dest_x; d_x < dest_x + w; d_x++) {
+                BB_GET_PIXEL(dst, dbb_rotation, Color8, d_x, d_y, &dstptr);
+                BB_GET_PIXEL(src, sbb_rotation, ColorRGB32, o_x, o_y, &srcptr);
+                alpha = srcptr->alpha;
+                ainv = 0xFF - alpha;
+                srca = RGB_To_A(srcptr->r, srcptr->g, srcptr->b);
+                dstptr->a = DIV_255(dstptr->a * ainv + srca * alpha);
+                o_x += 1;
+            }
+            o_y += 1;
+        }
     } else if (dbb_type == TYPE_BB8A && sbb_type == TYPE_BB8A) {
         Color8A *dstptr, *srcptr;
         o_y = offs_y;
@@ -860,7 +922,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
                 BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
                 alpha = srcptr->alpha;
                 ainv = 0xFF - alpha;
-                dstptr->a = dstptr->a * ainv + srcptr->a * alpha;
+                dstptr->a = DIV_255(dstptr->a * ainv + srcptr->a * alpha);
                 o_x += 1;
             }
             o_y += 1;
@@ -882,13 +944,18 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
     } else if (dbb_type == TYPE_BBRGB16 && sbb_type == TYPE_BB8A) {
         Color8A *srcptr;
         ColorRGB16 *dstptr;
+        uint8_t dsta, bdsta;
         o_y = offs_y;
         for (d_y = dest_y; d_y < dest_y + h; d_y++) {
             o_x = offs_x;
             for (d_x = dest_x; d_x < dest_x + w; d_x++) {
                 BB_GET_PIXEL(dst, dbb_rotation, ColorRGB16, d_x, d_y, &dstptr);
                 BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
-                dstptr->v = RGB_To_RGB16(srcptr->a, srcptr->a, srcptr->a);
+                alpha = srcptr->alpha;
+                ainv = 0xFF - alpha;
+                dsta = ColorRGB16_To_A(dstptr->v);
+                bdsta = DIV_255(dsta * ainv + srcptr->a * alpha);
+                dstptr->v = RGB_To_RGB16(bdsta, bdsta, bdsta);
                 o_x += 1;
             }
             o_y += 1;
@@ -966,6 +1033,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
                 dstptr->r = DIV_255(dstptr->r * ainv + srcptr->r * alpha);
                 dstptr->g = DIV_255(dstptr->g * ainv + srcptr->g * alpha);
                 dstptr->b = DIV_255(dstptr->b * ainv + srcptr->b * alpha);
+                //dstptr->alpha = dstptr->alpha;
                 o_x += 1;
             }
             o_y += 1;
@@ -982,7 +1050,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
                 dstptr->r = srcptr->r;
                 dstptr->g = srcptr->g;
                 dstptr->b = srcptr->b;
-                dstptr->alpha = 0xFF;
+                //dstptr->alpha = dstptr->alpha;
                 o_x += 1;
             }
             o_y += 1;
@@ -999,7 +1067,7 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
                 dstptr->r = srcptr->a;
                 dstptr->g = srcptr->a;
                 dstptr->b = srcptr->a;
-                dstptr->alpha = 0xFF;
+                //dstptr->alpha = dstptr->alpha;
                 o_x += 1;
             }
             o_y += 1;
@@ -1007,16 +1075,21 @@ void BB_alpha_blit_from(BlitBuffer *dst, BlitBuffer *src,
     } else if (dbb_type == TYPE_BBRGB32 && sbb_type == TYPE_BB8A) {
         ColorRGB32 *dstptr;
         Color8A *srcptr;
+        uint8_t dsta, bdsta;
         o_y = offs_y;
         for (d_y = dest_y; d_y < dest_y + h; d_y++) {
             o_x = offs_x;
             for (d_x = dest_x; d_x < dest_x + w; d_x++) {
                 BB_GET_PIXEL(dst, dbb_rotation, ColorRGB32, d_x, d_y, &dstptr);
                 BB_GET_PIXEL(src, sbb_rotation, Color8A, o_x, o_y, &srcptr);
-                dstptr->r = srcptr->a;
-                dstptr->g = srcptr->a;
-                dstptr->b = srcptr->a;
-                dstptr->alpha = srcptr->alpha; // if bad result, try: 0xFF - srcptr->alpha
+                alpha = srcptr->alpha;
+                ainv = 0xFF - alpha;
+                dsta = RGB_To_A(dstptr->r, dstptr->g, dstptr->b);
+                bdsta = DIV_255(dsta * ainv + srcptr->a * alpha);
+                dstptr->r = bdsta;
+                dstptr->g = bdsta;
+                dstptr->b = bdsta;
+                //dstptr->alpha = dstptr->alpha;
                 o_x += 1;
             }
             o_y += 1;

@poire-z
Copy link
Contributor

poire-z commented Feb 15, 2019

Trusting you on this :) (the comments in mupdf.lua sounds like they are mine, but I don't remember any of that :|)

@poire-z
Copy link
Contributor

poire-z commented Feb 15, 2019

BB8 is Kindle, anything else? (Kobo is/was BB8 ?! I thought it was BB16 on old FW, BB32 on newest)

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 15, 2019

Yep, those were yours, but the MuPDF API was making it impossible to do some of this when you originally implemented it, IIRC ;).

I'm working on making BB8 usable on Kobo ;).

But, yeah, historically, it's been Kindle >= 5 & most PB. Earlier devices were using BB4.

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 15, 2019

I'm not quite sure MuPDF actually knows how to handle Y8A (i.e., BB8A), but on the other hand, I don't think anything uses this anywhere either ;).

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 15, 2019

Hmm, managed to segfault somewhere somewhat reproducibly @ 8bpp, might want to hold off on that until I identify the culprit, but it's image related, so I smell a rat ;).

Might take a few days, as I'm going to be busy in the next few days ;).

EDIT: I'd forgotten to push the latest commit to my device. Might not be it, but that's something, at least :D.

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 15, 2019

(gdb) bt full
#0  0x2ba7fd28 in ?? () from libs/libmupdf.so
No symbol table info available.
#1  0x2ba8113a in fz_scale_pixmap_cached () from libs/libmupdf.so
No symbol table info available.
#2  0x2d36d530 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(That's from mupdf.scaleBlitBuffer from BBRGB32)

Yaaaaaaaaaay. -_-"

It's of course semi-random, and not a hard-crash on a specific image.
I can reproduce it, somewhat, but it's random-ish, and involves opening a bunch of PicViewer, ImageViewer & CRe in a somewhat specific order... And it of course crashes when re-opening the one I first opened, quite successfully, at that. >_<"

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 15, 2019

I smell a debug build in my near future...

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 16, 2019

Oh, joy, it's crashing in inline asm -_-".

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 16, 2019

Still crashes with the pure C version. -_-".

Completely unrelated to this PR, AFAICT, though...

NiLuJe added a commit to NiLuJe/koreader-base that referenced this pull request Feb 16, 2019
It's potentially crashy, and impossible to debug.
c.f., https://bugs.ghostscript.com/show_bug.cgi?id=698879
      koreader#816

Also, honor debug builds in MuPDF (as it enforces CFLAGS).
@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 18, 2019

Okay, reproduced with a vanilla nightly @ 32bpp, so, not my bug \o/.

I'm thinking of eventually moving to a real image processing library for scaling (something like zimg).

ffi/png.lua Outdated
local width = ffi.new("int[1]")
local height = ffi.new("int[1]")
local ptr = ffi.new("unsigned char*[1]")
local err = lodepng.lodepng_decode32_file(ptr, width, height, filename)
local err = lodepng.lodepng_decode_file(ptr, width, height, filename, fmt, 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's that 8 at the end? (It sounds suspiciously low compared to 32 for color screens, at a quick glance without looking up the API/headers. :-P)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the per channel bitdepth ;).

(i.e., bpp = ncomp * 8).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're definitely not doing HDR ;).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww, 10 bit ftw! :-P

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 18, 2019

And it's because of the setAllocated call in pic/openPNGDocument.

Commenting it out fixes the crash, which I can indeed fairly quickly reproduce by alternating between "opening" a PNG, and "View full size cover" on that same PNG.

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 18, 2019

Which is problematic, since that memory is indeed allocated by lodepng, and does indeed need to be free'd ;).

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, it's always something like that :-D

…ffer

Now that we can tell MuPDF the exact stride and whether an alpha channel
is present, this *should* behave.

We still have a few BB types that needs the temporary copy, mostly
because they're terrible to begin with, so we can't blame mupdf for not
supporting 'em ;).
(4bpp, because it's, err, fun to address...)
(RGB565, because HAHAHAHAHAHAHAHAHAH).
<Insert Star Wars joke here>
anything.

Mainly with a BB8A source.

Also stopped 32bpp from overwriting dst->alpha, as I don't think the Lua
code does it?
That's obviously a NOP, so, don't ;p.
At least that made me check the code, and confirm, that, yes, it
enforces an alpha channel if w or h are floats.
https://github.com/ArtifexSoftware/mupdf/blob/f31336616ac256a54a567c76528a0b3e7ee4e55c/source/fitz/draw-scale-simple.c#L1580-L1582
@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 20, 2019

Well, the most obtuse bug ever goes to ffi-cdecl, for fucking up a size_t typedef, leading to fun & interesting memory corruption on platforms where it's a long uint.

(My guess is the palettesize one in LodePNGState).

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 20, 2019

Which means I'm now going through every ffi header, because that's not the first time I've seen it screwing with typedefs...

@poire-z
Copy link
Contributor

poire-z commented Feb 20, 2019

About:

Noooope, that's a double free. No idea why it only crashes in the emulator, though.

It shouldn't crash anywhere...
It looks to me that: scaled_bb = self.image_bb:scale(...) calls BB_mt.__index:scale() which does scaled_bb = BB.new(new_width, new_height, self:getType()). And BB.new(), when no dataptr provided does C.malloc() it, and bb:setAllocated(1), so when BB_mt.__index:free() or gc() is called, it knows it can C.free() it.
BB_mt.__index:free() resets that allocated flag, so one should be able to call it multiple times without risk.

It looks to me there are more risks with the other bb like doc.image_bb = BB.new(re.width, re.height, bbtype, re.data)

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 20, 2019

@poire-z : Yeah, that was definitely a false alarm because of the size_t induced memory corruption ;).

(Which also neatly explains why I had no issues on ARM, because size_t == uint there, and that's what they were defined as in the ffi stuff).


In this specific case, if I understood this correctly, that wasn't really a memleak, the gc would eventually have taken care of it, right?

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 20, 2019

Yeah, the BBs that host the raw image data from the decoder are definitely left to the GC to handle, AFAICT (and they're all flagged allocated, either explicitly, or via new, so provided someone or the gc takes care of the bb, the cdata goes with it).

@poire-z
Copy link
Contributor

poire-z commented Feb 20, 2019

In this specific case, if I understood this correctly, that wasn't really a memleak, the gc would eventually have taken care of it, right?

Looks like so. But then, why did I took care of not missing a call to bb:free() in so many places ? :| Just to have them done early at a deterministic moment to reclaim that memory quickly ? I thought it was required - or was it just the GC to slow to take care of it. A bit confused :)

Yeah, the BBs that host the raw image data from the decoder are definitely left to the GC to handle

On this, I'm even more confused. I'd say it depends: on where it come from, on the kind of lua thingy (userdata vs lightuserdata if I remember) used to store it...
See koreader/koreader#3905 (comment) were things changed with a MuPDF bump.

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 20, 2019

I'm definitely with you on the "deterministic" approach ;). Plus, I just find it easier to keep track of when reading the code.

(Either you're in C land and you do explicit memory management, or you're not and you don't. The maybe yes, maybe not feeling you get with a GC makes me queasy and never quite sure of what's happening ^^).

@NiLuJe
Copy link
Member Author

NiLuJe commented Feb 21, 2019

Okay, this should (finally!) be ready to go :).

@@ -249,7 +249,7 @@ function framebuffer:clear(w, h)
-- and regions of memory outside of the visible screen *may* be used by other things in the system,
-- for performance reasons, and we do not want to screw them over ;).
-- NOTE: This should be equivalent to line_length * yres
C.memset(self.data, 0xFF, w * (self.fb_bpp / 8) * h)
ffi.fill(self.data, w * (self.fb_bpp / 8) * h, 0xFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this before.

https://luajit.org/ext_ffi_api.html

Performance notice: ffi.fill() may be used as a faster (inlinable) replacement for the C library function memset(dst, c, len). Please note the different order of arguments!

local fh = io.open(filename, "r")
assert(fh, "couldn't open file")
local fh = io.open(filename, "rb")
assert(fh, "couldn't open JPG file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all pcalled in front, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, picdocument's pcalling that (which initially threw me for a loop, as errors get thrown into the void) ;).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think of pcalls without catching the error as a code smell, also see #827

@Frenzie Frenzie merged commit 7233305 into koreader:master Feb 21, 2019
Frenzie added a commit to Frenzie/koreader that referenced this pull request Feb 21, 2019
* Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe
  Required for koreader#4629.
* A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
Frenzie added a commit to koreader/koreader that referenced this pull request Feb 21, 2019
* Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe
  Required for #4629.
* A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
mwoz123 pushed a commit to mwoz123/koreader that referenced this pull request Mar 29, 2020
…4631)

* Leave bitness-dependent types alone in FFI stuff koreader/koreader-base#825 @NiLuJe
  Required for koreader#4629.
* A few blitting changes for better BB8 performance koreader/koreader-base#816 @NiLuJe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants