Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Optimize typedefs.h functions #91664

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Nazarwadim
Copy link
Contributor

@Nazarwadim Nazarwadim commented May 7, 2024

Optimized typedefs functions. Their performance improved 1.5 - 8 times.

@AThousandShips
Copy link
Member

Can you please provide some details on what and how this was tested, under what conditions and what parts improved how

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented May 7, 2024

Benchmark function I used. Compiler gcc13.2.

int arr[10000];
static void Bench(benchmark::State& state) {
  for (auto _ : state) {
	for(int i = 0; i < 10000; i++) {
		arr[i] = func(arr[i]);
	}
    
  }
}
BENCHMARK(Bench);

86x difference in performance is achieved for nearest_shift function.
image
For next/prev_pow2 it is 1.5 - 1.8.

@AThousandShips
Copy link
Member

What is arr initialized with? You're calling the method with the array but your code doesn't show how it's initialized

Also at least some of the implementations aren't correct, see the windows run, so would be good to ensure all work correctly before benchmarking it

Also would need benchmarking on all platforms where the code has changed, so we can tell it hasn't worsened

@Nazarwadim
Copy link
Contributor Author

What is arr initialized with? You're calling the method with the array but your code doesn't show how it's initialized

Oh. Made new benchmarks where I initialized rand values.
Made benchmarks on Windows, Linux and another Linux, but an old device from 2011. MSVC compiler was used on Windows.

It turns out that before next/prev_power_of_2 was 1.5 (Linux) - 3.5 (Windows) times faster. For everything else, my changes are better.
For next_power_of_2_64 1.5 times on all devices.
nearest_shift 10 times old linux, 5 times linux, 8 times Windows.
get_shift_from_power_of_2 (made random data that is a power of two) 7 times linux 10 times old, 7 times Windows.

@AThousandShips
Copy link
Member

Please provide the full benchmarking code to replicate

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented May 7, 2024

Please provide the full benchmarking code to replicate

Edit: Oops, I made a mistake in the next_power_of_2_64 benchmark. Updated file:
benchmark_typedefs.zip

@arkology
Copy link
Contributor

arkology commented May 8, 2024

If you get different results with uninitialised values (and we don't know what values they are, depends on compiler and you code - assuming they are zeros because are global) and with randomly initialised values, 10k test pattern may be not enough to measure before-after difference for different inputs for before-after.
Could you measure difference with the same pattern of input values? And it may be useful (or not) to increase pattern size.

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented May 8, 2024

If you get different results with uninitialised values

It was just due to this check.

if (p_number == 0) {
	return 0;
}

And it may be useful (or not) to increase pattern size.

Increased to 1 000 000.

Benchmark next_power of 2.
Time difference before = 488768[ns]
Time difference after = 920565[ns]
Benchmark next_power_of_2_64.
Time difference before = 1489775[ns]
Time difference after = 923259[ns]
Benchmark previous_power_of_2.
Time difference before = 401549[ns]
Time difference after = 671888[ns]
Benchmark nearest_shift.
Time difference before = 6685049[ns]
Time difference after = 682140[ns]
Benchmark get_shift_from_power_of_2.
Time difference before = 21778665[ns]
Time difference after = 525608[ns]

@arkology
Copy link
Contributor

arkology commented May 8, 2024

Please provide the full benchmarking code to replicate

Edit: Oops, I made a mistake in the next_power_of_2_64 benchmark. Updated file: benchmark_typedefs.zip

Ran the provided code - I'm sure I might have done something wrong, but I got the following results:
image

If results are actually valid, standard deviation for next_power_of_2_64 and previous_power_of_2 is enormous, so tests might be invalid. If so, samples amount should actually be increased and the same input data should be used to test same before/after functions.

@Nazarwadim
Copy link
Contributor Author

I think you had a system calls or something in the next_power_of_2_64 and previous_power_of_2 methods in some tests, which slowed down the testing. Therefore, it is necessary to use more samples. I have a const test_count variable there, you can change it by 1,000,000.

@Nazarwadim
Copy link
Contributor Author

the same input data should be used to test same before/after functions

I don't think so, since these functions are independent of the data they are given.

@AThousandShips
Copy link
Member

I don't think so, since these functions are independent of the data they are given.

Have you confirmed this by the compiled code and the definitions of the operations? Just because they lack explicit branch statements doesn't mean they're data independent

@arkology
Copy link
Contributor

arkology commented May 8, 2024

the same input data should be used to test same before/after functions

I don't think so, since these functions are independent of the data they are given.

As I know, in general using different data for comparison tests is bad practice (because, for example, it could be used to get fake results (of course it's not our case)). Maybe it doesn't matter in this specific situation.
I think that functions are still dependent of input data, at least "before" functions with for loops and return inside loops. So you may get different iterations amount depending on input number.

@Nazarwadim Nazarwadim force-pushed the typedefs_optimize branch 2 times, most recently from 16023fa to 575aa87 Compare May 25, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants