Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(vg_lite): add asynchronous rendering support #5398

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

FASTSHIFT
Copy link
Collaborator

@FASTSHIFT FASTSHIFT commented Jan 19, 2024

Help us review this PR! Anyone can approve it or request changes.

Description of the feature or fix

Add VG-Lite flush commit trigger threshold. GPU will try to batch these many draw tasks. No need to enable independent rendering threads.

Checkpoints

@FASTSHIFT FASTSHIFT marked this pull request as draft January 19, 2024 09:55
@FASTSHIFT FASTSHIFT changed the title Feat vg lite thread render feat(vg_lite): add thread render support Jan 19, 2024
@FASTSHIFT FASTSHIFT changed the title feat(vg_lite): add thread render support perf(vg_lite): add asynchronous rendering support Jan 19, 2024
@FASTSHIFT FASTSHIFT force-pushed the feat_vg_lite_thread_render branch 6 times, most recently from 75f2a53 to 2cce05f Compare January 19, 2024 11:04
#endif

LV_VG_LITE_CHECK_ERROR(vg_lite_finish());
lv_vg_lite_flush(draw_unit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if flush_count == 3 and there are no more draw tasks. What will trigger the flushing of these 3 comamnds?

@@ -176,6 +176,9 @@
/* Enable VG-Lite assert. */
#define LV_VG_LITE_USE_ASSERT 0

/* VG-Lite flush commit trigger threshold */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* VG-Lite flush commit trigger threshold */
/* VG-Lite flush commit trigger threshold. VGlLite will try to batch these many draw tasks */
  • How can we determine the ideal value? Can it be 256 too?
  • How large the command buffer is?

Copy link
Collaborator Author

@FASTSHIFT FASTSHIFT Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we determine the ideal value? Can it be 256 too?

8 is an experience value.
We plan to implement this set of logic in the underlying driver in the future. Check whether the GPU is in idle state in vg_lite_flush: if the GPU is in idle, the rendering will be submitted immediately. If the GPU is busy, the rendering task will be queued to wait for the next time examine.
In LVGL, just simply call vg_lite_flush.

How large the command buffer is?

Currently we configure the cmd buffer to be 64K x2. These two buffers are ping-pong buffers. When the GPU is reading one of the buffers, the user can write to the other buffer and swap at the appropriate time. The cmd buf length occupied by different drawing commands is different, ranging from tens to hundreds of bytes.

@@ -157,6 +152,7 @@ static int32_t draw_dispatch(lv_draw_unit_t * draw_unit, lv_layer_t * layer)

/* Return 0 is no selection, some tasks can be supported by other units. */
if(!t || t->preferred_draw_unit_id != VG_LITE_DRAW_UNIT_ID) {
lv_vg_lite_finish(draw_unit);
return -1;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisvegabor
If there are no more drawing tasks, it will wait for the GPU drawing to complete and clear flush_count.

Signed-off-by: pengyiqiang <pengyiqiang@xiaomi.com>
Signed-off-by: FASTSHIFT <vifextech@foxmail.com>
@FASTSHIFT FASTSHIFT marked this pull request as ready for review January 22, 2024 05:51
Copy link
Member

@kisvegabor kisvegabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge it after the release.

@FASTSHIFT
Copy link
Collaborator Author

I'll merge it after the release.

In fact, merge now also works, does not affect the main line function :)

@FASTSHIFT FASTSHIFT merged commit b125d1b into lvgl:master Jan 23, 2024
16 checks passed
@FASTSHIFT FASTSHIFT deleted the feat_vg_lite_thread_render branch January 23, 2024 04:24
@@ -157,6 +152,7 @@ static int32_t draw_dispatch(lv_draw_unit_t * draw_unit, lv_layer_t * layer)

/* Return 0 is no selection, some tasks can be supported by other units. */
if(!t || t->preferred_draw_unit_id != VG_LITE_DRAW_UNIT_ID) {
lv_vg_lite_finish(draw_unit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to wait for GPU completion here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all the drawing tasks are finished, it will go here (usually about to be sent), and you can wait here for the GPU to finish the unrendered work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can go here also if there are tasks which are not supported by VGlite but only by the CPU. So you do not have to wait for GPU completion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically implement full GPU rendering, and the CPU basically does not need to participate in the rendering work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a single thread, then it does not really matter if everything can be drawn by vglite.
The main lvgl thread will call the vglite dispatcher. If the task is supported, will be passed to vglite. Else, the CPU dispatcher will get called and CPU will take care of it.

I will try to run your implementation on my side and see how it behaves.

#endif

LV_VG_LITE_CHECK_ERROR(vg_lite_finish());
lv_vg_lite_flush(draw_unit);
Copy link
Contributor

@nicusorcitu nicusorcitu Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling vg_lite_flush() after each task does not necessary mean that command gets batched. Instead, after each vg_lite_flush() you will force the other command buffer to take over. As GPU have only 2 command buffers, if a 3th vg_lite_flush() comes too fast while the first vg_lite_flush() is not yet complete - e.g. the command buffer 1 is not complete - then the 3'th vg_lite_flush() will act as a vg_lite_finish().

This mechanism you implemented does not work as you intend to.
What is the improvement with this new code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that vg_lite_flush() will switch the two GPU command buffers.

Copy link
Collaborator Author

@FASTSHIFT FASTSHIFT Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a double encapsulation of vg_lite_flush.
In lv_vg_lite_flush I added the count. When lv_vg_lite_flush is called more than a certain number of times, vg_lite_flush is called to tell the GPU to start rendering, so that the GPU can process several small tasks together to improve efficiency.

The next time vg_lite_flush is called, usually the GPU has already processed it and can immediately start a new rendering.

Copy link
Contributor

@nicusorcitu nicusorcitu Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now.

But you batch by force a number of LV_VG_LITE_FLUSH_MAX_COUNT tasks! Even the first time.

In my tests (running the benchmark v9) I found that GPU gets busy for a couple of 4-5 tasks only when it has to deal with filling the screen area - big task. Otherwise the GPU completes any task right away.

So the GPU gets IDLE most of the time and you do not have to batch anymore commands. This is why I suggest the solution I implemented here:

void vglite_run(void)

In this way I am making sure the GPU is busy as much as possible. If the GPU is idle then makes no sense to queue even more commands.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we implement this logic inside vg_lite_flush? Let the driver determine the GPU status by itself. If the GPU is busy, it will join the queue. If it is idle, it will start rendering immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an API change behavior. In documentation it says it "submits the command to the GPU without waiting for completion". You can not just return because it might not be come another vg_lite_flush() to really submit it to the GPU.

There are other applications beside lvgl on top of vglite or in some cases customers can use vglite api directly to design the application. I would not ask for a change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...that makes sense. May I ask if you added the VG_LITE_GPU_IDLE_STATE enumeration in vglite_run yourself? Or is it already implemented in the SDK?

Copy link
Contributor

@nicusorcitu nicusorcitu Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We asked for the API implementation to the VSI. You should see it as well in the next driver releases. I use 4.0.58 version.

vg_lite_get_parameter(VG_LITE_GPU_IDLE_STATE, 1, (vg_lite_pointer)&gpu_idle);

Copy link
Collaborator Author

@FASTSHIFT FASTSHIFT Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll switch to the same plan..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also let the VGLite thread implementation with the possibility to run it single thread (v8 like).


And check all the places where the #if LV_USE_OS is used in this file. Simply setting the LV_USE_OS to LV_OS_NONE will make it run single thread (as you have it now).

lion2tomato pushed a commit to lion2tomato/lvgl that referenced this pull request Mar 19, 2024
VELAPLATFO-22913

Signed-off-by: pengyiqiang <pengyiqiang@xiaomi.com>
Signed-off-by: FASTSHIFT <vifextech@foxmail.com>
Co-authored-by: pengyiqiang <pengyiqiang@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants