Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse mashed word String #3

Merged
merged 1 commit into from
Apr 25, 2021
Merged

Reuse mashed word String #3

merged 1 commit into from
Apr 25, 2021

Conversation

wezm
Copy link
Contributor

@wezm wezm commented Apr 25, 2021

The profile shows a good bit of time spent allocating the mashed word string. If we instead reuse it each time we shave another 20% off the benchmark.

@wezm wezm changed the title Reused mashed word String Reuse mashed word String Apr 25, 2021
@sts10 sts10 merged commit b768125 into sts10:main Apr 25, 2021
@sts10
Copy link
Owner

sts10 commented Apr 25, 2021

Interesting -- so it's the let that's costly?

Also, can you point me to some tips on the way you're profiling the code?

@wezm
Copy link
Contributor Author

wezm commented Apr 25, 2021

Interesting -- so it's the let that's costly?

It's this that's expensive:

root_word.to_owned().to_owned() + second_word;

Ignoring the double to_owned, (which might have been cloning the string twice, making the impact worse) it is doing the following:

  • Make a call to the OS to allocate memory to hold the copy of root_word, let's call this mashed_word.
  • Copy the contents of the root_word into the newly allocated memory.

Then the + does:

  • Check if mashed_word has enough capacity to append second_word (it probably doesn't)
  • Make a call to the OS to expand the original memory allocation to be large enough to hold second_word as well
  • Copy second_word into the newly expanded allocation.

All of that is quite fast but it's happening in a nested loop so it's happening a lot. Calls to the OS (system calls) are slower than regular function calls as there's typically a context switch amongst other things, so avoiding these can help.

The new code will mean that mashed_word will be allocated and expanded on the first pass through the loop but then it will only make a call to the system allocator if the length of root_word + second_word is larger than any other previously seen combination. So this means that for many words the operations are only:

  • Check if there's capacity for root_word in mashed_word (there probably is)
  • Copy root_word into mashed_word
  • Check if there's capacity for second word in mashed_word (there probably is)
  • Copy second_word into mashed_word

The .clear() call sets the length of mashed_word back to 0 (empty string) but it retains the allocated memory so it can be reused next time.

Also, can you point me to some tips on the way you're profiling the code?

I have added the following to the Cargo.toml so that debug symbols are present in release builds (so that the profile contains meaningful stack traces):

[profile.release]
debug = true

And then I'm profiling with perf (on Linux). I use the following to capture the profile data:

perf record --call-graph dwarf -F99 ./target/release/csafe word_lists/eff-word-list.txt

Note, I don't wait for that to finish, just let it for for a 5–10 secs then Ctrl-C to stop.

Then this to produce a report from it:

perf report -g graph | mdemangle

@wezm wezm deleted the reuse-mashed-word branch April 25, 2021 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants