Reuse mashed word String #3

wezm · 2021-04-25T03:26:11Z

The profile shows a good bit of time spent allocating the mashed word string. If we instead reuse it each time we shave another 20% off the benchmark.

sts10 · 2021-04-25T04:13:36Z

Interesting -- so it's the let that's costly?

Also, can you point me to some tips on the way you're profiling the code?

wezm · 2021-04-25T04:34:18Z

Interesting -- so it's the let that's costly?

It's this that's expensive:

root_word.to_owned().to_owned() + second_word;

Ignoring the double to_owned, (which might have been cloning the string twice, making the impact worse) it is doing the following:

Make a call to the OS to allocate memory to hold the copy of root_word, let's call this mashed_word.
Copy the contents of the root_word into the newly allocated memory.

Then the + does:

Check if mashed_word has enough capacity to append second_word (it probably doesn't)
Make a call to the OS to expand the original memory allocation to be large enough to hold second_word as well
Copy second_word into the newly expanded allocation.

All of that is quite fast but it's happening in a nested loop so it's happening a lot. Calls to the OS (system calls) are slower than regular function calls as there's typically a context switch amongst other things, so avoiding these can help.

The new code will mean that mashed_word will be allocated and expanded on the first pass through the loop but then it will only make a call to the system allocator if the length of root_word + second_word is larger than any other previously seen combination. So this means that for many words the operations are only:

Check if there's capacity for root_word in mashed_word (there probably is)
Copy root_word into mashed_word
Check if there's capacity for second word in mashed_word (there probably is)
Copy second_word into mashed_word

The .clear() call sets the length of mashed_word back to 0 (empty string) but it retains the allocated memory so it can be reused next time.

Also, can you point me to some tips on the way you're profiling the code?

I have added the following to the Cargo.toml so that debug symbols are present in release builds (so that the profile contains meaningful stack traces):

[profile.release]
debug = true

And then I'm profiling with perf (on Linux). I use the following to capture the profile data:

perf record --call-graph dwarf -F99 ./target/release/csafe word_lists/eff-word-list.txt

Note, I don't wait for that to finish, just let it for for a 5–10 secs then Ctrl-C to stop.

Then this to produce a report from it:

perf report -g graph | mdemangle

Reused mashed word String

9a22941

wezm changed the title ~~Reused mashed word String~~ Reuse mashed word String Apr 25, 2021

sts10 merged commit b768125 into sts10:main Apr 25, 2021

wezm deleted the reuse-mashed-word branch April 25, 2021 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse mashed word String #3

Reuse mashed word String #3

wezm commented Apr 25, 2021

sts10 commented Apr 25, 2021

wezm commented Apr 25, 2021

Reuse mashed word String #3

Reuse mashed word String #3

Conversation

wezm commented Apr 25, 2021

sts10 commented Apr 25, 2021

wezm commented Apr 25, 2021