Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly less crappy screenshot performance #1206

Merged
merged 15 commits into from
Oct 5, 2020
Merged

Conversation

NiLuJe
Copy link
Member

@NiLuJe NiLuJe commented Oct 4, 2020

By using our existing blitting methods if possible, and outputting in a pixel-format as close to the input one as possible.

(LodePNG itself appears to have been smart enough to squash grayscale-as-RGBA32 to grayscale all on its own in the encoding stage, so, no space savings on the encoded files).

re: koreader/koreader#6736


This change is Reviewable

Defer the actual copy to our usual blit methods
And do that in a pixel format closest to the input format
@NiLuJe

This comment has been minimized.

@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 4, 2020

Also squashed the crap fallback to RGB24 instead of RGBA32. We drop the alpha anyway, and we don't gain anything by pixels being pointer-aligned here, since we're doing byte by byte load/stores anyway.

@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 4, 2020

Thankfully, no-one should be using said crap fallback anyway, so BB8 & BB32 fbs (AKA. the sane ones) should enjoy significantly less crappy performance.

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me, saves a pass?

@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 4, 2020

Also benefits from the blitting fast-paths if applicable (i.e., type matches, and source is not rotated), and uses the C blitter if enabled.

@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 4, 2020

Actually unbroke it in !BB8 (wrong stride :s), and made it zero-copy if possible (which, granted, will be rare. Basically the same cases as the block copy blit, e.g., an unrotated Forma).

Turned out to be massively slower (like, three times), for... reasons?
Reading the fb memory is slow?
Bad cache interactions with LodePNG's read patterns?

Who knows.
@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 4, 2020

And testing the zero-copy thingy on a Forma revealed that it was hilariously slower, because CPUs are weird, I guess?

So I scrapped it ;).

@poire-z
Copy link
Contributor

poire-z commented Oct 5, 2020

(Might add the proper ffiutil.strcoll stuff in this PR ? :) Just mentionning it because you're familiar with ffi and I don't want to have to do it, so ... :)

@NiLuJe
Copy link
Member Author

NiLuJe commented Oct 5, 2020

Sure, I can do that ;).

@NiLuJe NiLuJe merged commit 5cc289e into koreader:master Oct 5, 2020
NiLuJe added a commit to koreader/koreader that referenced this pull request Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants