perf: Add OpenMP support for widget rendering (#118) #189

hyugabokko · 2019-12-11T13:47:02Z

Purpose

Add OpenMP support for widget rendering (#118) .

Changes

Modified src/Makefile.am and added compiler option.
Modified src/display.c and added preprocessor directives for OpenMP.
To check whether multithreading is possible, the string is output to standard output. However, I will remove it before merging.

Screenshots

When OpenMP is not supported(change before)
When OpenMP is supported(change after)

I printed thread_num to confirm that multiple threads were used.
If there are no problems with the review, I will remove the thread_num output.

To reproduce

$ make
$ make test

In make test, there is only one rectangle in rects, the execution time was slightly longer due to overhead.
I measured it with time make test.

IssueHunt Summary

Referenced issues

This pull request has been submitted to:

#118: Add OpenMP support for widget rendering

IssueHunt has been backed by the following sponsors. Become a sponsor

- Modified src/Makefile.am and added compiler option. - Modified src/display.c and added preprocessor directives for OpenMP. - To check whether multithreading is possible, the string is output to standard output. However, I will remove it before merging.

lc-soft

The openMP feature should give the user the option to turn it on or off, for example:

# OpenMP is enabled by default
./configure

# Disable OpenMP
./configure --with-openmp=no

lc-soft · 2019-12-11T14:10:05Z

src/display.c

@@ -30,6 +30,7 @@
 #include <time.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <omp.h>


I think we need a test program to compare the performance before and after OpenMP is enabled.

I came up with a suitable test:

Set window size to 1920x1080

Create 21384 widgets, they're all 10x10 in size

Render 1000 frames

For each frame, set the background color of each widget randomly

This is an optional task, if you are interested in it, you can try do it.

lc-soft · 2019-12-11T14:17:30Z

src/Makefile.am

@@ -1,5 +1,5 @@
 AUTOMAKE_OPTIONS=foreign
-AM_CFLAGS = -I$(abs_top_srcdir)/include $(CODE_COVERAGE_CFLAGS)
+AM_CFLAGS = -I$(abs_top_srcdir)/include $(CODE_COVERAGE_CFLAGS) -fopenmp


I think it should be an option for the configure script, you can refer to this:

LCUI/configure.ac

Lines 62 to 75 in 460354c

# pthread

want_thread=no

thread_name=pthread

AC_ARG_WITH(pthread, AC_HELP_STRING([--with-pthread], [use pthread (default)]))

if test "x$with_pthread" != "xno"; then

AX_PTHREAD([

want_thread=yes

CC="$PTHREAD_CC"

CFLAGS="$CFLAGS $PTHREAD_CFLAGS"

PACKAGE_LIBS="$PACKAGE_LIBS $PTHREAD_LIBS"

AC_DEFINE_UNQUOTED([LCUI_THREAD_PTHREAD], 1, [Define to 1 if you are using pthread support.])

], [AC_MSG_ERROR([The support could not be configured for the POSIX thread programming interface.])])

fi

hyugabokko · 2019-12-11T16:04:48Z

Thank you for your review and suggestions for solutions!
I am a beginner in OSS development, so it would be very helpful if you could make such a review!

Enabling or disabling OpenMP with the configure option
I have an outlook on modifying. Please wait for a little while longer.
Compare the performance before and after OpenMP is enabled
I think it will take quite a while to implement.
It would be helpful if you submit a new issue so that anyone other than me could implement it.

lc-soft · 2019-12-12T02:45:43Z

I think it will take quite a while to implement.
It would be helpful if you submit a new issue so that anyone other than me could implement it.

@d4yvector This task is simple, I'll take the time to do it.

lc-soft · 2019-12-12T07:55:00Z

I pushed a test program into your branch, you can run it to check rendering performance.

hyugabokko@d42a4e8

lc-soft · 2019-12-12T07:59:57Z

test/test_render.c

+	Widget_Append(root, box);
+	Widget_Append(root, status);
+
+#ifdef WITH_WINDOW


If you want to experience the actual rendering effect, you can define the WITH_WINDOW macro

lc-soft · 2019-12-13T08:36:51Z

The current test program is not suitable for testing OpenMP performance, when there is only one render area of 1920x1080 size, only one thread is working.

I 'll make some changes to split it into several small areas, for example, split 1920x1080 area into four areas: (0, 0, 960, 540), (960, 0, 960, 540), (0, 540, 960, 540), (960, 540, 960, 540), and assign them to each thread for rendering.

hyugabokko · 2019-12-13T09:29:19Z

Hi, @lc-soft

The openMP feature should give the user the option to turn it on or off, for example:
# OpenMP is enabled by default
./configure

# Disable OpenMP
./configure --with-openmp=no

I modified configure.ac to add OpenMP configure option (b82315b).

I pushed a test program into your branch, you can run it to check rendering performance.

d4yvector@d42a4e8

Thank you for your test program.

I 'll make some changes to split it into several small areas, for example, split 1920x1080 area into four areas: (0, 0, 960, 540), (960, 0, 960, 540), (0, 540, 960, 540), (960, 540, 960, 540), and assign them to each thread for rendering.

I will wait for your changes.

lc-soft · 2019-12-13T10:51:33Z

src/display.c

 		for (LinkedList_Each(rn, &rects)) {
+#ifdef USE_OPENMP
+#pragma omp task firstprivate(rn)
+			printf("thread_num: %d\n", omp_get_thread_num());


printf() takes a long time to call, you should remove it

I removed printf() in d5cf404.

lc-soft · 2019-12-13T10:53:10Z

src/display.c

@@ -27,9 +27,14 @@
 * POSSIBILITY OF SUCH DAMAGE.
 */

+#include <LCUI/config.h>


config.h file is for internal use, you should use #include "config.h"

I changed include path of config.h in d5cf404.

lc-soft

Compiling with vs2017 will output an error:

error C3001: 'taskwait' : expected an OpenMP directive name

Maybe you should refer to this: https://stackoverflow.com/questions/23545930/openmp-tasks-in-visual-studio

lc-soft · 2019-12-14T14:33:32Z

src/display.c

+#endif
+	/* Repaint dirty rectangles of surface */
+	for (count = 0, i = 0; i < 4; ++i) {
+		count += LCUIDisplay_RenderSurfaceEx(record, &rects_group[i]);


@d4yvector the changes have been pushed, check here.

lc-soft · 2019-12-16T04:49:29Z

First performance test results

Run the test three times

OpenMP disabled:

rendered 120 frames in 17.10s, rendering speed is 7.02 fps
rendered 120 frames in 17.73s, rendering speed is 6.77 fps
rendered 120 frames in 17.20s, rendering speed is 6.98 fps

OpenMP enabled:

rendered 120 frames in 23.50s, rendering speed is 5.11 fps
rendered 120 frames in 24.32s, rendering speed is 4.93 fps
rendered 120 frames in 25.01s, rendering speed is 4.80 fps

The effect is not good

hyugabokko · 2019-12-17T07:13:50Z

I have a question about a for loops to be parallelized.

LCUI/src/display.c

Lines 186 to 198 in 05fff83

    
           	LinkedList_Init(&rects_group[0]); 
        
           	LinkedList_Init(&rects_group[1]); 
        
           	LinkedList_Init(&rects_group[2]); 
        
           	LinkedList_Init(&rects_group[3]); 
        
           	SurfaceRecord_DumpRects(record, rects_group); 
        
           #ifdef USE_OPENMP 
        
           #pragma omp parallel 
        
           #pragma omp single 
        
           #endif 
        
           	/* Repaint dirty rectangles of surface */ 
        
           	for (count = 0, i = 0; i < 4; ++i) { 
        
           		count += LCUIDisplay_RenderSurfaceEx(record, &rects_group[i]); 
        
           	}

I think the characteristics of loops that are effective for parallelization are:

case1. Huge number of loops
case2. Although the number of loops is small, one loop takes a lot of time

However, the number of loops is 4 in this loop, so case1 is not satisfied.
Also, it does not take much time for one loop because rects_group[x].length is all 1, so case2 is not satisfied.

Similarly for the loop pointed out in #118, I think the performance test (test/test_render.c) can't measure the effect of parallelizing these loops.

lc-soft · 2019-12-17T08:22:56Z

test/test_render.c

+	t = clock();
+	for (i = 0; i < 120; ++i) {
+		UpdateFrame(box);
+		LCUIWidget_Update();


@d4yvector

I found that the performance of LCUiWidget_Update() is very low.

... UpdateFrame(box); LCUIWidget_Update(); + continue; LCUIDisplay_Update(); ...

After adding the continue statement, the program took 11.48 seconds to run.

rendered 120 frames in 11.48s, rendering speed is 10.45 fps

Please wait for me to optimize it.

lc-soft · 2019-12-18T02:14:26Z

Performance test results

I updated the test code:

Number of widgets reduced to 3600
Resize window to 1600x900

OpenMP disabled:

running rendering performance test
rendered 120 frames in 6.47s, rendering speed is 18.55 fps

OpenMP enabled:

running rendering performance test
rendered 120 frames in 6.48s, rendering speed is 18.52 fps

It looks the same, maybe there are still problems to be solved.

…oop Form" defined in OpenMP

lc-soft · 2019-12-18T06:20:03Z

src/display.c

+
+	rectArray = (LCUI_Rect **)malloc(sizeof(LCUI_Rect*) * rects.length);
+	i = 0;
+	for (LinkedList_Each(node, &rects)) {


Earlier I was curious how OpenMP would traverse the nodes in the linked list. Do your changes mean that the array is more suitable for OpenMP?

Yes, it is.
LinkedList_Each() macro's loop form does not follow "Canonical Loop Form" defined in OpenMP.
I rewritten the loop form to "Canonical Loop Form" by converting the rects linked list to an array.
This is a change to address the error in the previous review (#189 (review)) and is not related to performance.
Would an you try compiling with vs2017?

I added some debug message output, OpenMP does work.

rects: 4 rect: (0,0,800,450), thread: 0 rect: (800,0,800,450), thread: 1 rect: (0,450,800,450), thread: 2 rect: (800,450,800,450), thread: 3 count: 3609 rects: 4 rect: (800,0,800,450), thread: 1 rect: (0,0,800,450), thread: 0 rect: (0,450,800,450), thread: 2 rect: (800,450,800,450), thread: 3 count: 3609 rects: 4 rect: (0,0,800,450), thread: 0 rect: (800,450,800,450), thread: 3 rect: (800,0,800,450), thread: 1 rect: (0,450,800,450), thread: 2 count: 3609 rects: 4 rect: (0,0,800,450), thread: 0 rect: (800,450,800,450), thread: 3 rect: (800,0,800,450), thread: 1 rect: (0,450,800,450), thread: 2 count: 3609

Maybe some code in the renderer is affecting performance, I will debug the performance of the renderer.

hyugabokko · 2019-12-18T07:31:09Z

To check if the parallelized loop is working as expected, I displayed the thread number, loop number, and time taken for one loop in each thread.

$ ./test/test_render 
running rendering performance test
thread_num: 0, i: 0, time: 10
thread_num: 2, i: 2, time: 18
thread_num: 3, i: 3, time: 26
thread_num: 1, i: 1, time: 34

thread_num: 0, i: 0, time: 8
thread_num: 2, i: 2, time: 19
thread_num: 3, i: 3, time: 27
thread_num: 1, i: 1, time: 35

thread_num: 2, i: 2, time: 5
thread_num: 1, i: 1, time: 24
thread_num: 3, i: 3, time: 31
thread_num: 0, i: 0, time: 36

I think parallelized loop is working as expected from the thread number and loop number.

However, as the loop number increases, the time required for one loop increases.
The variable for time measurement is not shared by each thread, so this is not the accumulated time in all loops but the time in one loop.

Therefore, execution time may be uneven depending on the drawing area each thread is responsible for.
So, the same display was made with parallelization disabled.

$ ./test/test_render 
running rendering performance test
thread_num: 0, i: 0, time: 9
thread_num: 0, i: 1, time: 8
thread_num: 0, i: 2, time: 7
thread_num: 0, i: 3, time: 6

thread_num: 0, i: 0, time: 8
thread_num: 0, i: 1, time: 8
thread_num: 0, i: 2, time: 8
thread_num: 0, i: 3, time: 6

thread_num: 0, i: 0, time: 7
thread_num: 0, i: 1, time: 9
thread_num: 0, i: 2, time: 5
thread_num: 0, i: 3, time: 6

However, it was found that even if the loop number increased, the time taken for one loop was not biased.
One possible cause of this result is waiting in the next thread until the previous thread terminates.
Do you have any idea what causes the waiting time?

lc-soft · 2019-12-18T08:33:59Z

@d4yvector Does malloc() and free() make threads wait?

lc-soft · 2019-12-18T13:01:53Z

LCUIWidget_Update() has low performance.

lc-soft · 2019-12-18T13:02:50Z

src/display.c

+static size_t LCUIDisplay_RenderSurface(SurfaceRecord record)
+{
+	size_t count = 0;
+	int i;


Move int i to top, and change to size_t i = 0

lc-soft · 2019-12-18T13:04:05Z

src/display.c

+	LinkedList rects;
+	LinkedListNode *node;
+	LCUI_BOOL can_render;
+	LCUI_Rect **rectArray;


Rename rectArray to rect_array

lc-soft · 2019-12-18T13:04:46Z

src/display.c

+	LCUI_PaintContext paint;
+	LCUI_SysEventRec ev;
+	LinkedList rects;
+	LinkedListNode *node;


Move LinkedList to bottom, like this:

LCUI_BOOL can_render; LCUI_Rect **rectArray; LinkedList rects; LinkedListNode *node;

lc-soft · 2019-12-18T13:11:59Z

src/display.c

+	SurfaceRecord_DumpRects(record, &rects);
+
+	rectArray = (LCUI_Rect **)malloc(sizeof(LCUI_Rect*) * rects.length);
+	i = 0;


Remove this line, because the initial value of the `i' variable is already 0.

- i = 0;

test/test_render.c

hyugabokko · 2019-12-20T07:52:23Z

Although it is different from the performance improvement of LCUIWidget_Update(),
I found that the cause of waiting for other threads in OpenMP parallelization is in Surface_BeginPaint().
I guess that the actual function executed is X11Surface_BeginPaint() in the case of Ubuntu.
I think that the cause of waiting other threads is that LCUIMutex_Lock() in X11Surface_BeginPaint() locks the surface.

If this problem is not solved, it will not be possible to improve performance by parallelizing the requested in #118.
If you split a surface into multiples, such as rects, you might be able to assign separate surfaces to different threads to avoid locking (though I don't know how to achieve this).
Do you have any other ideas?

* perf: Add OpenMP support for widget rendering (#118) - Modified src/Makefile.am and added compiler option. - Modified src/display.c and added preprocessor directives for OpenMP. - To check whether multithreading is possible, the string is output to standard output. However, I will remove it before merging. * test: add rendering performance test * build: add OpenMP configure option * refactor: change include path of `config.h` and remove printf() * perf: split the dirty rectangles into four parts for rendering * fix(linux): missing surface size access method * refactor(display): update dirty rectangle collection method * test: update test_render.c * build: add vs project file for test render * fix(display): Convert `rects` list to array and follow a "Canonical Loop Form" defined in OpenMP * refactor: Change where variable i is initialized * fix: Widget_GenerateHash() not work * test: improved widget update performance Co-authored-by: Liu <lc-soft@live.cn>

lc-soft requested changes Dec 11, 2019

View reviewed changes

test: add rendering performance test

d42a4e8

lc-soft reviewed Dec 12, 2019

View reviewed changes

build: add OpenMP configure option

b82315b

lc-soft requested changes Dec 13, 2019

View reviewed changes

refactor: change include path of config.h and remove printf()

d5cf404

hyugabokko marked this pull request as ready for review December 13, 2019 11:29

lc-soft requested changes Dec 14, 2019

View reviewed changes

perf: split the dirty rectangles into four parts for rendering

05fff83

lc-soft reviewed Dec 14, 2019

View reviewed changes

fix(linux): missing surface size access method

1703ad5

lc-soft reviewed Dec 17, 2019

View reviewed changes

lc-soft added 3 commits December 17, 2019 23:22

refactor(display): update dirty rectangle collection method

3da7d38

test: update test_render.c

e4fad38

build: add vs project file for test render

f73b79f

hyugabokko added 2 commits December 18, 2019 14:57

fix(display): Convert rects list to array and follow a "Canonical L…

6c2d002

…oop Form" defined in OpenMP

refactor: Change where variable i is initialized

2269867

lc-soft reviewed Dec 18, 2019

View reviewed changes

lc-soft requested changes Dec 18, 2019

View reviewed changes

lc-soft added 2 commits December 18, 2019 21:55

fix: Widget_GenerateHash() not work

3b92c3f

test: improved widget update performance

4591ce1

lc-soft reviewed Dec 18, 2019

View reviewed changes

test/test_render.c Show resolved Hide resolved

lc-soft merged commit f08bb7e into lc-soft:develop Dec 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add OpenMP support for widget rendering (#118) #189

perf: Add OpenMP support for widget rendering (#118) #189

hyugabokko commented Dec 11, 2019 •

edited by issuehunt-oss bot

Loading

lc-soft left a comment

lc-soft Dec 11, 2019

lc-soft Dec 11, 2019

lc-soft Dec 11, 2019

hyugabokko commented Dec 11, 2019

lc-soft commented Dec 12, 2019

lc-soft commented Dec 12, 2019

lc-soft Dec 12, 2019

lc-soft commented Dec 13, 2019 •

edited

Loading

hyugabokko commented Dec 13, 2019 •

edited

Loading

lc-soft Dec 13, 2019

hyugabokko Dec 13, 2019

lc-soft Dec 13, 2019 •

edited

Loading

hyugabokko Dec 13, 2019

lc-soft left a comment •

edited

Loading

lc-soft Dec 14, 2019

lc-soft commented Dec 16, 2019

hyugabokko commented Dec 17, 2019

lc-soft Dec 17, 2019

lc-soft commented Dec 18, 2019 •

edited

Loading

lc-soft Dec 18, 2019

hyugabokko Dec 18, 2019

lc-soft Dec 18, 2019

hyugabokko Dec 18, 2019

hyugabokko commented Dec 18, 2019

lc-soft commented Dec 18, 2019

lc-soft commented Dec 18, 2019

lc-soft Dec 18, 2019 •

edited

Loading

lc-soft Dec 18, 2019

lc-soft Dec 18, 2019 •

edited

Loading

lc-soft Dec 18, 2019

hyugabokko commented Dec 20, 2019

	# pthread
	want_thread=no
	thread_name=pthread
	AC_ARG_WITH(pthread, AC_HELP_STRING([--with-pthread], [use pthread (default)]))
	if test "x$with_pthread" != "xno"; then
	AX_PTHREAD([
	want_thread=yes
	CC="$PTHREAD_CC"
	CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
	PACKAGE_LIBS="$PACKAGE_LIBS $PTHREAD_LIBS"
	AC_DEFINE_UNQUOTED([LCUI_THREAD_PTHREAD], 1, [Define to 1 if you are using pthread support.])
	], [AC_MSG_ERROR([The support could not be configured for the POSIX thread programming interface.])])
	fi

perf: Add OpenMP support for widget rendering (#118) #189

perf: Add OpenMP support for widget rendering (#118) #189

Conversation

hyugabokko commented Dec 11, 2019 • edited by issuehunt-oss bot Loading

Purpose

Changes

Screenshots

To reproduce

Referenced issues

lc-soft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyugabokko commented Dec 11, 2019

lc-soft commented Dec 12, 2019

lc-soft commented Dec 12, 2019

Choose a reason for hiding this comment

lc-soft commented Dec 13, 2019 • edited Loading

hyugabokko commented Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc-soft Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc-soft left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc-soft commented Dec 16, 2019

First performance test results

hyugabokko commented Dec 17, 2019

Choose a reason for hiding this comment

lc-soft commented Dec 18, 2019 • edited Loading

Performance test results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyugabokko commented Dec 18, 2019

lc-soft commented Dec 18, 2019

lc-soft commented Dec 18, 2019

lc-soft Dec 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc-soft Dec 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyugabokko commented Dec 20, 2019

hyugabokko commented Dec 11, 2019 •

edited by issuehunt-oss bot

Loading

lc-soft commented Dec 13, 2019 •

edited

Loading

hyugabokko commented Dec 13, 2019 •

edited

Loading

lc-soft Dec 13, 2019 •

edited

Loading

lc-soft left a comment •

edited

Loading

lc-soft commented Dec 18, 2019 •

edited

Loading

lc-soft Dec 18, 2019 •

edited

Loading

lc-soft Dec 18, 2019 •

edited

Loading