[Core] Optimize typedefs.h functions #91664

Nazarwadim · 2024-05-07T15:26:27Z

Optimized typedefs functions. Their performance improved 1.5 - 8 times.

AThousandShips · 2024-05-07T16:52:02Z

Can you please provide some details on what and how this was tested, under what conditions and what parts improved how

Nazarwadim · 2024-05-07T17:01:48Z

Benchmark function I used. Compiler gcc13.2.

int arr[10000];
static void Bench(benchmark::State& state) {
  for (auto _ : state) {
	for(int i = 0; i < 10000; i++) {
		arr[i] = func(arr[i]);
	}
    
  }
}
BENCHMARK(Bench);

86x difference in performance is achieved for nearest_shift function.

For next/prev_pow2 it is 1.5 - 1.8.

AThousandShips · 2024-05-07T17:07:06Z

What is arr initialized with? You're calling the method with the array but your code doesn't show how it's initialized

Also at least some of the implementations aren't correct, see the windows run, so would be good to ensure all work correctly before benchmarking it

Also would need benchmarking on all platforms where the code has changed, so we can tell it hasn't worsened

Nazarwadim · 2024-05-07T20:00:51Z

What is arr initialized with? You're calling the method with the array but your code doesn't show how it's initialized

Oh. Made new benchmarks where I initialized rand values.
Made benchmarks on Windows, Linux and another Linux, but an old device from 2011. MSVC compiler was used on Windows.

It turns out that before next/prev_power_of_2 was 1.5 (Linux) - 3.5 (Windows) times faster. For everything else, my changes are better.
For next_power_of_2_64 1.5 times on all devices.
nearest_shift 10 times old linux, 5 times linux, 8 times Windows.
get_shift_from_power_of_2 (made random data that is a power of two) 7 times linux 10 times old, 7 times Windows.

AThousandShips · 2024-05-07T20:07:04Z

Please provide the full benchmarking code to replicate

Nazarwadim · 2024-05-07T20:30:50Z

Please provide the full benchmarking code to replicate

Edit: Oops, I made a mistake in the next_power_of_2_64 benchmark. Updated file:
benchmark_typedefs.zip

arkology · 2024-05-08T05:05:39Z

If you get different results with uninitialised values (and we don't know what values they are, depends on compiler and you code - assuming they are zeros because are global) and with randomly initialised values, 10k test pattern may be not enough to measure before-after difference for different inputs for before-after.
Could you measure difference with the same pattern of input values? And it may be useful (or not) to increase pattern size.

Nazarwadim · 2024-05-08T07:50:22Z

If you get different results with uninitialised values

It was just due to this check.

if (p_number == 0) {
	return 0;
}

And it may be useful (or not) to increase pattern size.

Increased to 1 000 000.

Benchmark next_power of 2.
Time difference before = 488768[ns]
Time difference after = 920565[ns]
Benchmark next_power_of_2_64.
Time difference before = 1489775[ns]
Time difference after = 923259[ns]
Benchmark previous_power_of_2.
Time difference before = 401549[ns]
Time difference after = 671888[ns]
Benchmark nearest_shift.
Time difference before = 6685049[ns]
Time difference after = 682140[ns]
Benchmark get_shift_from_power_of_2.
Time difference before = 21778665[ns]
Time difference after = 525608[ns]

arkology · 2024-05-08T18:16:04Z

Please provide the full benchmarking code to replicate

Edit: Oops, I made a mistake in the next_power_of_2_64 benchmark. Updated file: benchmark_typedefs.zip

Ran the provided code - I'm sure I might have done something wrong, but I got the following results:

If results are actually valid, standard deviation for next_power_of_2_64 and previous_power_of_2 is enormous, so tests might be invalid. If so, samples amount should actually be increased and the same input data should be used to test same before/after functions.

Nazarwadim · 2024-05-08T20:20:27Z

I think you had a system calls or something in the next_power_of_2_64 and previous_power_of_2 methods in some tests, which slowed down the testing. Therefore, it is necessary to use more samples. I have a const test_count variable there, you can change it by 1,000,000.

Nazarwadim · 2024-05-08T20:25:14Z

the same input data should be used to test same before/after functions

I don't think so, since these functions are independent of the data they are given.

AThousandShips · 2024-05-08T20:55:57Z

I don't think so, since these functions are independent of the data they are given.

Have you confirmed this by the compiled code and the definitions of the operations? Just because they lack explicit branch statements doesn't mean they're data independent

arkology · 2024-05-08T20:56:50Z

the same input data should be used to test same before/after functions

I don't think so, since these functions are independent of the data they are given.

As I know, in general using different data for comparison tests is bad practice (because, for example, it could be used to get fake results (of course it's not our case)). Maybe it doesn't matter in this specific situation.
I think that functions are still dependent of input data, at least "before" functions with for loops and return inside loops. So you may get different iterations amount depending on input number.

Nazarwadim requested a review from a team as a code owner May 7, 2024 15:26

AThousandShips added enhancement needs testing performance labels May 7, 2024

AThousandShips added this to the 4.x milestone May 7, 2024

akien-mga added the topic:core label May 7, 2024

Nazarwadim force-pushed the typedefs_optimize branch 2 times, most recently from 16023fa to 575aa87 Compare May 25, 2024 21:12

Optimize typedefs.h functions

2da5b91

Nazarwadim force-pushed the typedefs_optimize branch from 575aa87 to 2da5b91 Compare May 25, 2024 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Optimize typedefs.h functions #91664

[Core] Optimize typedefs.h functions #91664

Nazarwadim commented May 7, 2024 •

edited

Loading

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024 •

edited

Loading

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024 •

edited

Loading

arkology commented May 8, 2024 •

edited

Loading

Nazarwadim commented May 8, 2024 •

edited

Loading

arkology commented May 8, 2024

Nazarwadim commented May 8, 2024

Nazarwadim commented May 8, 2024

AThousandShips commented May 8, 2024

arkology commented May 8, 2024 •

edited

Loading

[Core] Optimize typedefs.h functions #91664

Are you sure you want to change the base?

[Core] Optimize typedefs.h functions #91664

Conversation

Nazarwadim commented May 7, 2024 • edited Loading

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024 • edited Loading

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024

AThousandShips commented May 7, 2024

Nazarwadim commented May 7, 2024 • edited Loading

arkology commented May 8, 2024 • edited Loading

Nazarwadim commented May 8, 2024 • edited Loading

arkology commented May 8, 2024

Nazarwadim commented May 8, 2024

Nazarwadim commented May 8, 2024

AThousandShips commented May 8, 2024

arkology commented May 8, 2024 • edited Loading

Nazarwadim commented May 7, 2024 •

edited

Loading

Nazarwadim commented May 7, 2024 •

edited

Loading

Nazarwadim commented May 7, 2024 •

edited

Loading

arkology commented May 8, 2024 •

edited

Loading

Nazarwadim commented May 8, 2024 •

edited

Loading

arkology commented May 8, 2024 •

edited

Loading