HTTPS clone URL
Subversion checkout URL
master release-0_0_11 release-0_0_12 release-0_0_13 release-0_1_final release-0_1 release-0_2 release-0_3 release-0_4 release-0_5rc1 release-0_5 release-0_6rc1 release-0_6_win32 release-0_6 release-0_7rc1 release-0_8_0 release-0_8_1 release-0_9_0 release-0_9_1 release-0_9_2rc1 release-0_10_0 release-0_11_0 release-1_0_0 release-1_0_1 release-1_0_2 release-1_1_0 release-1_2_0 release-1_3_0 release-1_4_0 release-1_5_0 release-1_6_0 release-1_7 release-1_7_0 release-1_7_1 release-1_7_2 release-1_7_3 release-1_7_4 release-1_8 release-2_0 release-2_4 release-2_6 release-2_8 release-2_10 release-2_12
Nothing to show
Nothing to show
Fetching latest commit...
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Time measurement ------------------ For usable benchmarking we need a precise, reliable and repeatable time measurement. It seems to be easy? We are on computer right? Unfortunetly I found it almost impossible! Hopefully I am totaly wrong. Several "kind of time" exist: real (wall clock), user CPU time, system CPU time. What I believe we have to use is task user + system CPU time. There are varions commands/functions which can be used to measure time, e.g.: 1) time command (real, user, sys time): time COMMAND 2) getrusage function (user, sys time): #include <sys/time.h> #include <sys/resource.h> struct rusage usage; getrusage( RUSAGE_SELF, &start); // measured code getrusage( RUSAGE_SELF, &end); double user_elapsed = end.ru_utime.tv_sec + end.ru_utime.tv_usec/1000000. - start.ru_utime.tv_sec - start.ru_utime.tv_usec/1000000.; double sys_elapsed = end.ru_stime.tv_sec + end.rs_utime.tv_usec/1000000. - start.ru_stime.tv_sec -start.ru_stime.tv_usec/1000000.; 3) times function (user, sys time): #include <sys/times.h> struct tms start, end; times(&start); // measured code times(&end); double user_elapsed = ((double)(end.tms_utime - start.tms_utime))/sysconf(_SC_CLK_TCK); double sys_elapsed = ((double)(end.tms_stime - start.tms_stime))/sysconf(_SC_CLK_TCK); 4) clock function (user + sys time ?): #include <time.h> clock_t start = clock(); // measured code clock_t end = clock(); double elapsed = ((double) (end - start)) / CLOCKS_PER_SEC; 5) clock_gettime functions (user + sys time ?): #include <time.h> struct timespec start, end; clock_gettime( CLOCK_PROCESS_CPUTIME_ID, &start ); // measured code clock_gettime( CLOCK_PROCESS_CPUTIME_ID, &end ); double elapsed = end.tv_sec + end.tv_nsec / 1000000000. - start.tv_sec - start.tv_nsec / 1000000000.; 6) QTime class (QElapsedTimer Qt >= 4.7) (real time): QTime time; time.start(); // measured code double elapsed = time.elapsed() / 1000.; I tried all of them and all are giving the same mostly useless values. If the same piece of code is measured more times, the results differ. It seems that all those functions read user time from the same place, kernel task_struct.utime. The problem is, if I understand correctly, how the task_struct.utime is updated. Whenever timer interrupt comes (every 1/HZ) to scheduler, it calls update_process_times->account_process_tick->irqtime_account_process_tick->account_user_time and increases task_struct.utime by cputime_one_jiffy (1/HZ). See kernel/sched.c and kernel/timer.c. It means, that utime is not increased by pure time when the the task code is running, but by fixed interval which includes some task switching overhead??? In a simple test, I could observe 30% increas of utime if another application was running at the same time. Unfortunately I don't see anything better than user time + sys time, running test with highest priority and avoiding other use of computer when tests are running, e.g.: sudo chrt -f 99 COMMAND To be sure that the measured values are correct, it is necessary to run more cycles and check some standard deviation or so. There is also high level benchmark support available in QTestLib, it is possible to use QBENCHMARK macro + QTEST_MAIN to create easily test executable. Such test may be run with various options, some notes on modes/options: -tickcounter - reads rdtsc register (on Linux), thus it counts real time, result is not constant -callgrind - reruns the command with callgrind, number of 'instr. loads' is constant for constant number of iterations, number of instructions per iterarion decreases with number of iterarions (for small numbers of iterations) for a simple function the number of instractions of the second iteration may be 40% of the first one - cache, prediction??? Callgrind is realy very slow. I am not sure what 'instr. loads' exactly means and if it can be somehow converted to time, but I don't believe so. AFAIK each instruction need a different number of cycles and it may be different even for the same instruction because of CPU cache, then there are instruction pipelines etc. Build options ------------- CMAKE_BUILD_TYPE should be RelWithDebInfo so that it compiles with optimisations but also adds debug information so that it can be profiled with callgrind and visualized with kcachegrind.