New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(draw-sw): add simple Helium acceleration #5596
Conversation
Should be able to work together with arm2d. @GorgonMeducer please help review and test it! :) |
@PeterBee97 ALL LOOKS GOOD! (I haven't verified them on hardware yet. I will do it soon.) Thank you. Do you consider adding other Helium acceleration? For example, for transform, rotate screen etc. By the way, would you please provide a macro option to only use arm-2d acceleration without the ASM version you added? (There is an option only to use your helium assembly version without arm-2d). It is hard for me to compare the performance differences... |
I'll look into transforms later.
I've moved arm2d header inclusion out of lv_blend_helium.h since arm2d already has its individual config LV_USE_DRAW_ARM2D_SYNC, so now the two accelerator can be toggled independently. Is that OK? @GorgonMeducer |
Amazing! However it's not entirely clear how to select between pure ASM and Arm2D. I guess you need to set |
That's what I previously thought, but since arm2d cannot override all of asm functions right now enabling LV_DRAW_SW_ASM_HELIUM means asm functions cannot be shutdown entirely (which is bad for Gabriel's test). Having the two options decoupled means we can have arm2d, or asm, or both (where asm is the backup). Or is there better idea to let arm2d work by itself? Need some help here. @kisvegabor |
The original solution is the best one. I only need a macro option for testing purposes only. So please add an internal option but not a one used in Once you finished this part. I will send a PR to update cmsis-pack and add a small change to
No need to select. When There is no need to allow ordinary users to disable the native helium assembly but keep the arm-2d part. I only need an option to disable the native helium assembly code for debugging purposes. I am eager to see this PR getting merged. |
Similar to Neon, this patch added Helium acceleration to all blending functions with minimal code. Signed-off-by: Peter Bee <bijunda1@xiaomi.com>
Signed-off-by: Peter Bee <bijunda1@xiaomi.com>
fix crash when resolution is larger than default 800*480 Signed-off-by: Peter Bee <bijunda1@xiaomi.com>
Added macro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear, looks good to me. We need to document these somewhere though.
I'll also do some tests today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just found something.
#include LV_DRAW_SW_HELIUM_CUSTOM_INCLUDE | ||
#endif | ||
|
||
#define USE_NATIVE_ASSEMBLY 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should come from a compiler define.
Please also change the name to LV_USE_NATIVE_ASSEMBLY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, moved to lv_conf.
Signed-off-by: Peter Bee <bijunda1@xiaomi.com>
Looks good for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I'm not sure #define LV_USE_DRAW_SW_ASM LV_DRAW_SW_ASM_...
is still required. As ASM implementation can overlap probably we should enable them differently.
Let's improve it in an other PR later.
Description of the feature or fix
Similar to Neon, this patch added Helium acceleration to all blending functions with minimal code. Still, the assembly code will produce slightly different result (normally 1bit error) from C rendering results due to difference in implementation, but visually similar to the eye:
The argb8888 dst is also supported with parallel division by integer newton iteration method. Thanks to tail predication and scatter-gathering functions, the code is conciser than Neon version and should also be more efficient. However, speed is not yet tested as the development was done on QEMU.
Notes
lv_conf_template.h
run lv_conf_internal_gen.py and update Kconfig.scripts/code-format.py
(astyle needs to be installed) and follow the Code Conventions