Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add String::replace_char(s) methods for performance and convenience #92475

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

AThousandShips
Copy link
Member

@AThousandShips AThousandShips commented May 28, 2024

Will try do some performance comparison soon, but these replace in-place essentially, avoiding any COW or other manipulation, and also avoids creating any temporaries etc.

The multi-character one is optional but added for completeness for a few cases where multiple cases are replaced at once

Can expose to scripting if desired, but generally using characters is a bit more complicated in scripting so haven't added yet

Kept as separate commits for ease of editing until approval

See #92433 (comment)

A few more complex cases like String::validate_filename, OS::get_safe_dir_name, and get_csharp_project_name could be replaced using this, instead using a vector of characters, but for right now I keep this to the simple cases (mostly, see AnimationLibrary::validate_library_name for an exception)

See also:

@AThousandShips AThousandShips added this to the 4.x milestone May 28, 2024
@AThousandShips AThousandShips changed the title Add String::replace_char(s) methods for performance Add String::replace_char(s) methods for performance and convenience May 28, 2024
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 989058a to d029832 Compare July 18, 2024 14:28
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 611a214 to 6a49079 Compare August 19, 2024 13:09
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 92cb3e7 to 4b78ad7 Compare August 28, 2024 12:45
@AThousandShips AThousandShips marked this pull request as ready for review August 28, 2024 12:55
@AThousandShips AThousandShips requested review from a team as code owners August 28, 2024 12:55
const char32_t *old_ptr = ptr();

while (*old_ptr) {
if (p_keys.has(*old_ptr)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have the keys be copied and then sorted here and use bsearch instead but not sure it'll be worth it generally, not expecting large arrays here so the difference between O(n) and O(log n) will be marginal compared to the copying and sorting

++old_ptr;
}

*new_ptrw = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a const _null for the zero car.

@Hilderin
Copy link
Contributor

Impressive work!!

I did some quick performance testing. I'm concern with the fact that each characters are copied one by one in the replace_char function even if the character is never found. With some testing, if the character is not found, the new replace_char function is slower then the current replace string. I put some suggestions in a version below for replace_char (note: I did not do extensive optimization on this code).

Test 1 (character replaced 14 times):

Source string: "C:\Projects\godot-jolt\build\windows-msvc-x64\External\Build\jolt\CMakeFiles_CMakeLTOTest-CXX\bin\CMakeFiles\boo.dir\Debug\main.cpp.obj"
Iterations: 1000000 times
Replacing \ for X

Using replace (string): 1200ms
Your version of replace_char: 320ms
Alternative version (see below): 300ms

Test 2 (character not replaced):

Source string: "C:\Projects\godot-jolt\build\windows-msvc-x64\External\Build\jolt\CMakeFiles_CMakeLTOTest-CXX\bin\CMakeFiles\boo.dir\Debug\main.cpp.obj"
Iterations: 1000000 times
Replacing * for X

Using replace (string): 160ms
Your version of replace_char: 320ms
Alternative version (see below): 140ms

Test 3 (small string 1 replace):

Source string: "te_st"
Iterations: 1000000 times
Replacing _ for X

Using replace (string): 425ms
Your version of replace_char: 150ms
Alternative version (see below): 140ms

Test 4 (small string and character not replaced):

Source string: "test"
Iterations: 1000000 times
Replacing _ for X

Using replace (string): 52ms
Your version of replace_char: 150ms
Alternative version (see below): 52ms

Alternative suggested version that does not update the string if search char not found and using memcpy:

String String::replace_char(char32_t p_key, char32_t p_with) const {
	ERR_FAIL_COND_V_MSG(p_with == 0, String(), "`with` must not be null.");

	int len = length();
	if (p_key == 0 || len == 0) {
		return *this;
	}

	int index = 0;
	const char32_t *old_ptr = ptr();
	for (; index < len; index++) {
		if (*old_ptr == p_key) {
			break;
		}
		++old_ptr;
	}

	if (index == len) {
		return *this;
	}

	String new_string;
	new_string.resize(len + 1);
	char32_t *new_ptrw = new_string.ptrw();

	memcpy(new_ptrw, ptr(), len * sizeof(char32_t));

	new_ptrw += index;
	*new_ptrw = p_with;

	++new_ptrw;
	++old_ptr;

	while (*old_ptr) {
		if (*old_ptr == p_key) {
			*new_ptrw = p_with;
		}
		++new_ptrw;
		++old_ptr;
	}

	*new_ptrw = _null;

	return new_string;
}

@AThousandShips
Copy link
Member Author

I'll take a look and measure some between these, I have some theories, I feel a method checking for existence and then using my method for replacement might be faster as it accesses twice not thrice, but will see and compare

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants