perf(vg_lite): add asynchronous rendering support #5398

FASTSHIFT · 2024-01-19T09:54:55Z

Help us review this PR! Anyone can approve it or request changes.

Description of the feature or fix

Add VG-Lite flush commit trigger threshold. GPU will try to batch these many draw tasks. No need to enable independent rendering threads.

Checkpoints

Update the Documentation if needed.
Add Examples if relevant.
Add Tests if applicable.
If you added new options to lv_conf_template.h run lv_conf_internal_gen.py and update Kconfig.
Run scripts/code-format.py (astyle needs to be installed) and follow the Code Conventions

src/draw/vg_lite/lv_draw_vg_lite.c

kisvegabor · 2024-01-19T23:15:30Z

src/draw/vg_lite/lv_draw_vg_lite.c

-#endif
-
-    LV_VG_LITE_CHECK_ERROR(vg_lite_finish());
+    lv_vg_lite_flush(draw_unit);


What if flush_count == 3 and there are no more draw tasks. What will trigger the flushing of these 3 comamnds?

kisvegabor · 2024-01-19T23:16:34Z

lv_conf_template.h

@@ -176,6 +176,9 @@
 /* Enable VG-Lite assert. */
 #define LV_VG_LITE_USE_ASSERT 0

+/* VG-Lite flush commit trigger threshold */


Suggested change

/* VG-Lite flush commit trigger threshold */

/* VG-Lite flush commit trigger threshold. VGlLite will try to batch these many draw tasks */

How can we determine the ideal value? Can it be 256 too?

How large the command buffer is?

How can we determine the ideal value? Can it be 256 too?

8 is an experience value.
We plan to implement this set of logic in the underlying driver in the future. Check whether the GPU is in idle state in vg_lite_flush: if the GPU is in idle, the rendering will be submitted immediately. If the GPU is busy, the rendering task will be queued to wait for the next time examine.
In LVGL, just simply call vg_lite_flush.

How large the command buffer is?

Currently we configure the cmd buffer to be 64K x2. These two buffers are ping-pong buffers. When the GPU is reading one of the buffers, the user can write to the other buffer and swap at the appropriate time. The cmd buf length occupied by different drawing commands is different, ranging from tens to hundreds of bytes.

FASTSHIFT · 2024-01-20T04:09:05Z

src/draw/vg_lite/lv_draw_vg_lite.c

@@ -157,6 +152,7 @@ static int32_t draw_dispatch(lv_draw_unit_t * draw_unit, lv_layer_t * layer)

    /* Return 0 is no selection, some tasks can be supported by other units. */
    if(!t || t->preferred_draw_unit_id != VG_LITE_DRAW_UNIT_ID) {
+        lv_vg_lite_finish(draw_unit);
        return -1;
    }


@kisvegabor
If there are no more drawing tasks, it will wait for the GPU drawing to complete and clear flush_count.

Signed-off-by: pengyiqiang <pengyiqiang@xiaomi.com> Signed-off-by: FASTSHIFT <vifextech@foxmail.com>

kisvegabor

I'll merge it after the release.

FASTSHIFT · 2024-01-22T13:32:23Z

I'll merge it after the release.

In fact, merge now also works, does not affect the main line function :)

nicusorcitu · 2024-01-24T13:56:52Z

src/draw/vg_lite/lv_draw_vg_lite.c

@@ -157,6 +152,7 @@ static int32_t draw_dispatch(lv_draw_unit_t * draw_unit, lv_layer_t * layer)

    /* Return 0 is no selection, some tasks can be supported by other units. */
    if(!t || t->preferred_draw_unit_id != VG_LITE_DRAW_UNIT_ID) {
+        lv_vg_lite_finish(draw_unit);


why to wait for GPU completion here?

When all the drawing tasks are finished, it will go here (usually about to be sent), and you can wait here for the GPU to finish the unrendered work.

You can go here also if there are tasks which are not supported by VGlite but only by the CPU. So you do not have to wait for GPU completion.

We basically implement full GPU rendering, and the CPU basically does not need to participate in the rendering work.

If there is a single thread, then it does not really matter if everything can be drawn by vglite.
The main lvgl thread will call the vglite dispatcher. If the task is supported, will be passed to vglite. Else, the CPU dispatcher will get called and CPU will take care of it.

I will try to run your implementation on my side and see how it behaves.

nicusorcitu · 2024-01-24T14:06:31Z

src/draw/vg_lite/lv_draw_vg_lite.c

-#endif
-
-    LV_VG_LITE_CHECK_ERROR(vg_lite_finish());
+    lv_vg_lite_flush(draw_unit);


Calling vg_lite_flush() after each task does not necessary mean that command gets batched. Instead, after each vg_lite_flush() you will force the other command buffer to take over. As GPU have only 2 command buffers, if a 3th vg_lite_flush() comes too fast while the first vg_lite_flush() is not yet complete - e.g. the command buffer 1 is not complete - then the 3'th vg_lite_flush() will act as a vg_lite_finish().

This mechanism you implemented does not work as you intend to.
What is the improvement with this new code?

Remember that vg_lite_flush() will switch the two GPU command buffers.

This function is a double encapsulation of vg_lite_flush.
In lv_vg_lite_flush I added the count. When lv_vg_lite_flush is called more than a certain number of times, vg_lite_flush is called to tell the GPU to start rendering, so that the GPU can process several small tasks together to improve efficiency.

The next time vg_lite_flush is called, usually the GPU has already processed it and can immediately start a new rendering.

I understand now.

But you batch by force a number of LV_VG_LITE_FLUSH_MAX_COUNT tasks! Even the first time.

In my tests (running the benchmark v9) I found that GPU gets busy for a couple of 4-5 tasks only when it has to deal with filling the screen area - big task. Otherwise the GPU completes any task right away.

So the GPU gets IDLE most of the time and you do not have to batch anymore commands. This is why I suggest the solution I implemented here:

lvgl/src/draw/nxp/vglite/lv_vglite_utils.c

Line 80 in 3199231

void vglite_run(void)

In this way I am making sure the GPU is busy as much as possible. If the GPU is idle then makes no sense to queue even more commands.

What if we implement this logic inside vg_lite_flush? Let the driver determine the GPU status by itself. If the GPU is busy, it will join the queue. If it is idle, it will start rendering immediately.

That is an API change behavior. In documentation it says it "submits the command to the GPU without waiting for completion". You can not just return because it might not be come another vg_lite_flush() to really submit it to the GPU.

There are other applications beside lvgl on top of vglite or in some cases customers can use vglite api directly to design the application. I would not ask for a change.

Hmm...that makes sense. May I ask if you added the VG_LITE_GPU_IDLE_STATE enumeration in vglite_run yourself? Or is it already implemented in the SDK?

We asked for the API implementation to the VSI. You should see it as well in the next driver releases. I use 4.0.58 version.

vg_lite_get_parameter(VG_LITE_GPU_IDLE_STATE, 1, (vg_lite_pointer)&gpu_idle);

Great! I'll switch to the same plan..

I would also let the VGLite thread implementation with the possibility to run it single thread (v8 like).

lvgl/src/draw/nxp/vglite/lv_draw_vglite.c

Line 113 in 152dc0b

#if LV_USE_OS

And check all the places where the #if LV_USE_OS is used in this file. Simply setting the LV_USE_OS to LV_OS_NONE will make it run single thread (as you have it now).

VELAPLATFO-22913 Signed-off-by: pengyiqiang <pengyiqiang@xiaomi.com> Signed-off-by: FASTSHIFT <vifextech@foxmail.com> Co-authored-by: pengyiqiang <pengyiqiang@xiaomi.com>

FASTSHIFT marked this pull request as draft January 19, 2024 09:55

FASTSHIFT changed the title ~~Feat vg lite thread render~~ feat(vg_lite): add thread render support Jan 19, 2024

W-Mai reviewed Jan 19, 2024

View reviewed changes

src/draw/vg_lite/lv_draw_vg_lite.c Outdated Show resolved Hide resolved

FASTSHIFT changed the title ~~feat(vg_lite): add thread render support~~ perf(vg_lite): add asynchronous rendering support Jan 19, 2024

FASTSHIFT force-pushed the feat_vg_lite_thread_render branch 6 times, most recently from 75f2a53 to 2cce05f Compare January 19, 2024 11:04

kisvegabor reviewed Jan 19, 2024

View reviewed changes

FASTSHIFT commented Jan 20, 2024

View reviewed changes

perf(vg_lite): add asynchronous rendering support

fe44a9e

Signed-off-by: pengyiqiang <pengyiqiang@xiaomi.com> Signed-off-by: FASTSHIFT <vifextech@foxmail.com>

FASTSHIFT force-pushed the feat_vg_lite_thread_render branch from 2cce05f to fe44a9e Compare January 20, 2024 04:34

FASTSHIFT marked this pull request as ready for review January 22, 2024 05:51

FASTSHIFT requested a review from kisvegabor January 22, 2024 07:05

kisvegabor approved these changes Jan 22, 2024

View reviewed changes

FASTSHIFT merged commit b125d1b into lvgl:master Jan 23, 2024
16 checks passed

FASTSHIFT deleted the feat_vg_lite_thread_render branch January 23, 2024 04:24

FASTSHIFT mentioned this pull request Jan 24, 2024

How to make full use GPU command list #5329

Closed

nicusorcitu reviewed Jan 24, 2024

View reviewed changes

FASTSHIFT mentioned this pull request Feb 2, 2024

feat(vg_lite): add gpu idle flush #5571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vg_lite): add asynchronous rendering support #5398

perf(vg_lite): add asynchronous rendering support #5398

FASTSHIFT commented Jan 19, 2024 •

edited

Loading

kisvegabor Jan 19, 2024

kisvegabor Jan 19, 2024

FASTSHIFT Jan 20, 2024 •

edited

Loading

FASTSHIFT Jan 20, 2024

kisvegabor left a comment

FASTSHIFT commented Jan 22, 2024

nicusorcitu Jan 24, 2024

FASTSHIFT Jan 25, 2024

nicusorcitu Jan 26, 2024

FASTSHIFT Jan 26, 2024

nicusorcitu Jan 26, 2024

nicusorcitu Jan 24, 2024 •

edited

Loading

nicusorcitu Jan 24, 2024

FASTSHIFT Jan 25, 2024 •

edited

Loading

nicusorcitu Jan 26, 2024 •

edited

Loading

FASTSHIFT Jan 26, 2024

nicusorcitu Jan 26, 2024

FASTSHIFT Jan 26, 2024

nicusorcitu Jan 26, 2024 •

edited

Loading

FASTSHIFT Jan 28, 2024 •

edited

Loading

nicusorcitu Jan 29, 2024

	/* VG-Lite flush commit trigger threshold */
	/* VG-Lite flush commit trigger threshold. VGlLite will try to batch these many draw tasks */

perf(vg_lite): add asynchronous rendering support #5398

perf(vg_lite): add asynchronous rendering support #5398

Conversation

FASTSHIFT commented Jan 19, 2024 • edited Loading

Description of the feature or fix

Checkpoints

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FASTSHIFT Jan 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kisvegabor left a comment

Choose a reason for hiding this comment

FASTSHIFT commented Jan 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicusorcitu Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FASTSHIFT Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

nicusorcitu Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicusorcitu Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

FASTSHIFT Jan 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FASTSHIFT commented Jan 19, 2024 •

edited

Loading

FASTSHIFT Jan 20, 2024 •

edited

Loading

nicusorcitu Jan 24, 2024 •

edited

Loading

FASTSHIFT Jan 25, 2024 •

edited

Loading

nicusorcitu Jan 26, 2024 •

edited

Loading

nicusorcitu Jan 26, 2024 •

edited

Loading

FASTSHIFT Jan 28, 2024 •

edited

Loading