-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add OpenMP support for widget rendering (#118) #189
Conversation
- Modified src/Makefile.am and added compiler option. - Modified src/display.c and added preprocessor directives for OpenMP. - To check whether multithreading is possible, the string is output to standard output. However, I will remove it before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The openMP feature should give the user the option to turn it on or off, for example:
# OpenMP is enabled by default
./configure
# Disable OpenMP
./configure --with-openmp=no
@@ -30,6 +30,7 @@ | |||
#include <time.h> | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include <omp.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a test program to compare the performance before and after OpenMP is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with a suitable test:
- Set window size to 1920x1080
- Create 21384 widgets, they're all 10x10 in size
- Render 1000 frames
- For each frame, set the background color of each widget randomly
This is an optional task, if you are interested in it, you can try do it.
src/Makefile.am
Outdated
@@ -1,5 +1,5 @@ | |||
AUTOMAKE_OPTIONS=foreign | |||
AM_CFLAGS = -I$(abs_top_srcdir)/include $(CODE_COVERAGE_CFLAGS) | |||
AM_CFLAGS = -I$(abs_top_srcdir)/include $(CODE_COVERAGE_CFLAGS) -fopenmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be an option for the configure script, you can refer to this:
Lines 62 to 75 in 460354c
# pthread | |
want_thread=no | |
thread_name=pthread | |
AC_ARG_WITH(pthread, AC_HELP_STRING([--with-pthread], [use pthread (default)])) | |
if test "x$with_pthread" != "xno"; then | |
AX_PTHREAD([ | |
want_thread=yes | |
CC="$PTHREAD_CC" | |
CFLAGS="$CFLAGS $PTHREAD_CFLAGS" | |
PACKAGE_LIBS="$PACKAGE_LIBS $PTHREAD_LIBS" | |
AC_DEFINE_UNQUOTED([LCUI_THREAD_PTHREAD], 1, [Define to 1 if you are using pthread support.]) | |
], [AC_MSG_ERROR([The support could not be configured for the POSIX thread programming interface.])]) | |
fi | |
Thank you for your review and suggestions for solutions!
|
@d4yvector This task is simple, I'll take the time to do it. |
I pushed a test program into your branch, you can run it to check rendering performance. |
test/test_render.c
Outdated
Widget_Append(root, box); | ||
Widget_Append(root, status); | ||
|
||
#ifdef WITH_WINDOW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to experience the actual rendering effect, you can define the WITH_WINDOW
macro
The current test program is not suitable for testing OpenMP performance, when there is only one render area of 1920x1080 size, only one thread is working. I 'll make some changes to split it into several small areas, for example, split 1920x1080 area into four areas: |
Hi, @lc-soft
I modified
Thank you for your test program.
I will wait for your changes. |
src/display.c
Outdated
for (LinkedList_Each(rn, &rects)) { | ||
#ifdef USE_OPENMP | ||
#pragma omp task firstprivate(rn) | ||
printf("thread_num: %d\n", omp_get_thread_num()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printf() takes a long time to call, you should remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed printf()
in d5cf404.
src/display.c
Outdated
@@ -27,9 +27,14 @@ | |||
* POSSIBILITY OF SUCH DAMAGE. | |||
*/ | |||
|
|||
#include <LCUI/config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.h file is for internal use, you should use #include "config.h"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed include path of config.h
in d5cf404.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling with vs2017 will output an error:
error C3001: 'taskwait' : expected an OpenMP directive name
Maybe you should refer to this: https://stackoverflow.com/questions/23545930/openmp-tasks-in-visual-studio
src/display.c
Outdated
#endif | ||
/* Repaint dirty rectangles of surface */ | ||
for (count = 0, i = 0; i < 4; ++i) { | ||
count += LCUIDisplay_RenderSurfaceEx(record, &rects_group[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d4yvector the changes have been pushed, check here.
First performance test resultsRun the test three times OpenMP disabled:
OpenMP enabled:
The effect is not good |
I have a question about a for loops to be parallelized. Lines 186 to 198 in 05fff83
I think the characteristics of loops that are effective for parallelization are:
However, the number of loops is 4 in this loop, so case1 is not satisfied. Similarly for the loop pointed out in #118, I think the performance test ( |
t = clock(); | ||
for (i = 0; i < 120; ++i) { | ||
UpdateFrame(box); | ||
LCUIWidget_Update(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d4yvector
I found that the performance of LCUiWidget_Update()
is very low.
...
UpdateFrame(box);
LCUIWidget_Update();
+ continue;
LCUIDisplay_Update();
...
After adding the continue statement, the program took 11.48 seconds to run.
rendered 120 frames in 11.48s, rendering speed is 10.45 fps
Please wait for me to optimize it.
Performance test resultsI updated the test code:
OpenMP disabled:
OpenMP enabled:
It looks the same, maybe there are still problems to be solved. |
|
||
rectArray = (LCUI_Rect **)malloc(sizeof(LCUI_Rect*) * rects.length); | ||
i = 0; | ||
for (LinkedList_Each(node, &rects)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier I was curious how OpenMP would traverse the nodes in the linked list. Do your changes mean that the array is more suitable for OpenMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
LinkedList_Each()
macro's loop form does not follow "Canonical Loop Form" defined in OpenMP.
I rewritten the loop form to "Canonical Loop Form" by converting the rects
linked list to an array.
This is a change to address the error in the previous review (#189 (review)) and is not related to performance.
Would an you try compiling with vs2017?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some debug message output, OpenMP does work.
rects: 4
rect: (0,0,800,450), thread: 0
rect: (800,0,800,450), thread: 1
rect: (0,450,800,450), thread: 2
rect: (800,450,800,450), thread: 3
count: 3609
rects: 4
rect: (800,0,800,450), thread: 1
rect: (0,0,800,450), thread: 0
rect: (0,450,800,450), thread: 2
rect: (800,450,800,450), thread: 3
count: 3609
rects: 4
rect: (0,0,800,450), thread: 0
rect: (800,450,800,450), thread: 3
rect: (800,0,800,450), thread: 1
rect: (0,450,800,450), thread: 2
count: 3609
rects: 4
rect: (0,0,800,450), thread: 0
rect: (800,450,800,450), thread: 3
rect: (800,0,800,450), thread: 1
rect: (0,450,800,450), thread: 2
count: 3609
Maybe some code in the renderer is affecting performance, I will debug the performance of the renderer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
To check if the parallelized loop is working as expected, I displayed the thread number, loop number, and time taken for one loop in each thread.
I think parallelized loop is working as expected from the thread number and loop number. However, as the loop number increases, the time required for one loop increases. Therefore, execution time may be uneven depending on the drawing area each thread is responsible for.
However, it was found that even if the loop number increased, the time taken for one loop was not biased. |
@d4yvector Does malloc() and free() make threads wait? |
static size_t LCUIDisplay_RenderSurface(SurfaceRecord record) | ||
{ | ||
size_t count = 0; | ||
int i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move int i
to top, and change to size_t i = 0
LinkedList rects; | ||
LinkedListNode *node; | ||
LCUI_BOOL can_render; | ||
LCUI_Rect **rectArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename rectArray
to rect_array
LCUI_PaintContext paint; | ||
LCUI_SysEventRec ev; | ||
LinkedList rects; | ||
LinkedListNode *node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move LinkedList
to bottom, like this:
LCUI_BOOL can_render;
LCUI_Rect **rectArray;
LinkedList rects;
LinkedListNode *node;
SurfaceRecord_DumpRects(record, &rects); | ||
|
||
rectArray = (LCUI_Rect **)malloc(sizeof(LCUI_Rect*) * rects.length); | ||
i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line, because the initial value of the `i' variable is already 0.
- i = 0;
Although it is different from the performance improvement of If this problem is not solved, it will not be possible to improve performance by parallelizing the requested in #118. |
* perf: Add OpenMP support for widget rendering (#118) - Modified src/Makefile.am and added compiler option. - Modified src/display.c and added preprocessor directives for OpenMP. - To check whether multithreading is possible, the string is output to standard output. However, I will remove it before merging. * test: add rendering performance test * build: add OpenMP configure option * refactor: change include path of `config.h` and remove printf() * perf: split the dirty rectangles into four parts for rendering * fix(linux): missing surface size access method * refactor(display): update dirty rectangle collection method * test: update test_render.c * build: add vs project file for test render * fix(display): Convert `rects` list to array and follow a "Canonical Loop Form" defined in OpenMP * refactor: Change where variable i is initialized * fix: Widget_GenerateHash() not work * test: improved widget update performance Co-authored-by: Liu <lc-soft@live.cn>
Purpose
Add OpenMP support for widget rendering (#118) .
Changes
Screenshots
When OpenMP is not supported(change before)
When OpenMP is supported(change after)
I printed
thread_num
to confirm that multiple threads were used.If there are no problems with the review, I will remove the
thread_num
output.To reproduce
$ make $ make test
In
make test
, there is only one rectangle inrects
, the execution time was slightly longer due to overhead.I measured it with
time make test
.IssueHunt Summary
Referenced issues
This pull request has been submitted to:
IssueHunt has been backed by the following sponsors. Become a sponsor