Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve WeakList performance during staggered removals #3893

Merged
merged 14 commits into from Sep 25, 2020

Conversation

smoogipoo
Copy link
Contributor

@smoogipoo smoogipoo commented Sep 23, 2020

Description

"Staggered removals" are removals of items appearing mostly in the middle of the list and rarely at the ends of the list - they do not follow a linear pattern, and are hard to optimise. WeakList already employs optimisations when items are removed at the ends of the list.

For WeakList, such staggered cases require most elements in the list to be iterated over for subsequent removals to take place, including elements that have lost their references and/or have been previously Remove()d.
When this happens, WeakList calls TryGetTarget on all elements that haven't been Remove()d, which turns out to be quite expensive if done too much - retrieving objects through WeakReference is ~6x more expensive than via reference in my benchmarking.

Solution

The only two cases where this comes up are Remove() and Contains(), both of which don't really need to know the exact object stored - only that the given object is present somewhere in the list. The object is guaranteed to be alive due to the nature of these two methods.

So for the purpose of the above two methods, this change stores the object's hash code to perform comparisons against, eliminating calls to TryGetTarget.

Benchmarks

Framework-side benchmarks

Before

|               Method | ItemCount |           Mean |       Error |      StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |---------- |---------------:|------------:|------------:|------:|--------:|-------:|------:|------:|----------:|
|                  Add |         1 |       234.2 ns |     0.92 ns |     0.77 ns |  1.00 |    0.00 | 0.0043 |     - |     - |     144 B |
|            RemoveOne |         1 |       268.6 ns |     2.94 ns |     2.75 ns |  1.15 |    0.01 | 0.0043 |     - |     - |     144 B |
| RemoveAllIteratively |         1 |       277.3 ns |     1.23 ns |     1.15 ns |  1.18 |    0.00 | 0.0043 |     - |     - |     144 B |
|   RemoveAllStaggered |         1 |       285.4 ns |     5.36 ns |     5.02 ns |  1.21 |    0.02 | 0.0043 |     - |     - |     144 B |
|                Clear |         1 |       233.9 ns |     1.27 ns |     1.18 ns |  1.00 |    0.01 | 0.0043 |     - |     - |     144 B |
|             Contains |         1 |       266.5 ns |     1.81 ns |     1.69 ns |  1.14 |    0.01 | 0.0043 |     - |     - |     144 B |
|      AddAndEnumerate |         1 |       458.0 ns |     3.41 ns |     3.02 ns |  1.96 |    0.02 | 0.0081 |     - |     - |     280 B |
|    ClearAndEnumerate |         1 |       350.6 ns |     2.91 ns |     2.73 ns |  1.50 |    0.01 | 0.0057 |     - |     - |     192 B |
|                      |           |                |             |             |       |         |        |       |       |           |
|                  Add |        10 |     1,932.7 ns |    28.76 ns |    24.02 ns |  1.00 |    0.00 | 0.0153 |     - |     - |     600 B |
|            RemoveOne |        10 |     2,067.5 ns |    24.56 ns |    21.77 ns |  1.07 |    0.02 | 0.0153 |     - |     - |     600 B |
| RemoveAllIteratively |        10 |     2,243.6 ns |    15.67 ns |    14.65 ns |  1.16 |    0.02 | 0.0153 |     - |     - |     600 B |
|   RemoveAllStaggered |        10 |     2,533.8 ns |    11.20 ns |     9.36 ns |  1.31 |    0.01 | 0.0153 |     - |     - |     600 B |
|                Clear |        10 |     1,931.4 ns |    31.11 ns |    29.10 ns |  1.00 |    0.02 | 0.0153 |     - |     - |     600 B |
|             Contains |        10 |     2,058.9 ns |    30.45 ns |    28.48 ns |  1.06 |    0.02 | 0.0153 |     - |     - |     600 B |
|      AddAndEnumerate |        10 |     2,631.9 ns |    19.72 ns |    18.44 ns |  1.36 |    0.02 | 0.0267 |     - |     - |     984 B |
|    ClearAndEnumerate |        10 |     2,226.4 ns |    32.64 ns |    28.93 ns |  1.15 |    0.02 | 0.0191 |     - |     - |     648 B |
|                      |           |                |             |             |       |         |        |       |       |           |
|                  Add |       100 |    17,710.5 ns |   277.36 ns |   259.44 ns |  1.00 |    0.00 | 0.1221 |     - |     - |    4624 B |
|            RemoveOne |       100 |    18,739.0 ns |   270.38 ns |   239.69 ns |  1.06 |    0.02 | 0.1221 |     - |     - |    4624 B |
| RemoveAllIteratively |       100 |    20,714.2 ns |   209.64 ns |   185.84 ns |  1.17 |    0.02 | 0.1221 |     - |     - |    4624 B |
|   RemoveAllStaggered |       100 |    48,036.4 ns |   479.15 ns |   424.76 ns |  2.72 |    0.03 | 0.1221 |     - |     - |    4624 B |
|                Clear |       100 |    18,009.5 ns |   259.10 ns |   242.37 ns |  1.02 |    0.02 | 0.1221 |     - |     - |    4624 B |
|             Contains |       100 |    18,911.1 ns |   236.12 ns |   220.87 ns |  1.07 |    0.02 | 0.1221 |     - |     - |    4624 B |
|      AddAndEnumerate |       100 |    20,285.2 ns |   396.45 ns |   389.37 ns |  1.15 |    0.03 | 0.1831 |     - |     - |    6752 B |
|    ClearAndEnumerate |       100 |    17,859.7 ns |   253.31 ns |   224.55 ns |  1.01 |    0.02 | 0.1221 |     - |     - |    4672 B |
|                      |           |                |             |             |       |         |        |       |       |           |
|                  Add |      1000 |   169,754.8 ns | 3,257.73 ns | 4,235.97 ns |  1.00 |    0.00 | 0.9766 |     - |     - |   40632 B |
|            RemoveOne |      1000 |   177,412.8 ns | 3,509.32 ns | 5,359.10 ns |  1.05 |    0.04 | 0.9766 |     - |     - |   40632 B |
| RemoveAllIteratively |      1000 |   201,369.7 ns | 3,461.56 ns | 3,237.94 ns |  1.19 |    0.04 | 0.9766 |     - |     - |   40633 B |
|   RemoveAllStaggered |      1000 | 2,709,139.5 ns | 1,899.45 ns | 1,683.81 ns | 16.00 |    0.44 |      - |     - |     - |   40650 B |
|                Clear |      1000 |   174,270.9 ns | 3,410.76 ns | 3,190.43 ns |  1.03 |    0.03 | 0.9766 |     - |     - |   40634 B |
|             Contains |      1000 |   178,983.4 ns | 3,567.36 ns | 3,162.37 ns |  1.06 |    0.04 | 0.9766 |     - |     - |   40632 B |
|      AddAndEnumerate |      1000 |   196,082.2 ns | 2,264.67 ns | 2,118.37 ns |  1.16 |    0.03 | 1.7090 |     - |     - |   57289 B |
|    ClearAndEnumerate |      1000 |   171,380.5 ns | 1,762.46 ns | 1,562.38 ns |  1.01 |    0.03 | 0.9766 |     - |     - |   40680 B |

After

|               Method | ItemCount |         Mean |       Error |      StdDev | Ratio | RatioSD |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|--------------------- |---------- |-------------:|------------:|------------:|------:|--------:|-------:|-------:|------:|----------:|
|                  Add |         1 |     255.1 ns |     1.17 ns |     1.04 ns |  1.00 |    0.00 | 0.0052 |      - |     - |     176 B |
|            RemoveOne |         1 |     262.3 ns |     1.92 ns |     1.80 ns |  1.03 |    0.01 | 0.0052 |      - |     - |     176 B |
| RemoveAllIteratively |         1 |     288.0 ns |     1.15 ns |     1.02 ns |  1.13 |    0.01 | 0.0052 |      - |     - |     176 B |
|   RemoveAllStaggered |         1 |     280.9 ns |     3.21 ns |     2.84 ns |  1.10 |    0.01 | 0.0052 |      - |     - |     176 B |
|                Clear |         1 |     252.8 ns |     3.93 ns |     3.68 ns |  0.99 |    0.01 | 0.0052 |      - |     - |     176 B |
|             Contains |         1 |     263.8 ns |     2.27 ns |     2.12 ns |  1.03 |    0.01 | 0.0052 |      - |     - |     176 B |
|      AddAndEnumerate |         1 |     490.6 ns |     4.41 ns |     3.91 ns |  1.92 |    0.02 | 0.0086 |      - |     - |     304 B |
|    ClearAndEnumerate |         1 |     377.5 ns |     3.41 ns |     3.19 ns |  1.48 |    0.02 | 0.0062 |      - |     - |     216 B |
|                      |           |              |             |             |       |         |        |        |       |           |
|                  Add |        10 |   2,132.1 ns |    10.91 ns |     9.67 ns |  1.00 |    0.00 | 0.0229 |      - |     - |     824 B |
|            RemoveOne |        10 |   2,116.5 ns |     9.14 ns |     8.10 ns |  0.99 |    0.01 | 0.0229 |      - |     - |     824 B |
| RemoveAllIteratively |        10 |   2,301.3 ns |    20.34 ns |    19.03 ns |  1.08 |    0.01 | 0.0229 |      - |     - |     824 B |
|   RemoveAllStaggered |        10 |   2,348.6 ns |    22.81 ns |    21.33 ns |  1.10 |    0.01 | 0.0229 |      - |     - |     824 B |
|                Clear |        10 |   2,105.9 ns |    13.85 ns |    11.57 ns |  0.99 |    0.01 | 0.0229 |      - |     - |     824 B |
|             Contains |        10 |   2,122.7 ns |    15.04 ns |    14.07 ns |  0.99 |    0.01 | 0.0229 |      - |     - |     824 B |
|      AddAndEnumerate |        10 |   2,755.3 ns |    17.44 ns |    16.31 ns |  1.29 |    0.01 | 0.0343 |      - |     - |    1200 B |
|    ClearAndEnumerate |        10 |   2,257.0 ns |    22.97 ns |    20.36 ns |  1.06 |    0.01 | 0.0229 |      - |     - |     864 B |
|                      |           |              |             |             |       |         |        |        |       |           |
|                  Add |       100 |  19,576.5 ns |   265.03 ns |   247.91 ns |  1.00 |    0.00 | 0.1831 |      - |     - |    6640 B |
|            RemoveOne |       100 |  19,835.4 ns |   317.30 ns |   264.96 ns |  1.01 |    0.02 | 0.1831 |      - |     - |    6640 B |
| RemoveAllIteratively |       100 |  22,151.3 ns |   415.82 ns |   388.96 ns |  1.13 |    0.03 | 0.1831 |      - |     - |    6640 B |
|   RemoveAllStaggered |       100 |  29,841.6 ns |   340.86 ns |   318.84 ns |  1.52 |    0.03 | 0.1831 |      - |     - |    6640 B |
|                Clear |       100 |  19,742.8 ns |   383.56 ns |   410.41 ns |  1.01 |    0.02 | 0.1831 |      - |     - |    6640 B |
|             Contains |       100 |  19,618.4 ns |   196.54 ns |   183.84 ns |  1.00 |    0.02 | 0.1831 |      - |     - |    6640 B |
|      AddAndEnumerate |       100 |  22,945.1 ns |   221.00 ns |   206.72 ns |  1.17 |    0.01 | 0.2441 |      - |     - |    8760 B |
|    ClearAndEnumerate |       100 |  20,149.2 ns |   276.06 ns |   244.72 ns |  1.03 |    0.02 | 0.1831 |      - |     - |    6680 B |
|                      |           |              |             |             |       |         |        |        |       |           |
|                  Add |      1000 | 190,514.3 ns | 3,707.17 ns | 4,413.12 ns |  1.00 |    0.00 | 1.4648 |      - |     - |   56984 B |
|            RemoveOne |      1000 | 189,230.3 ns | 3,669.94 ns | 4,507.01 ns |  0.99 |    0.04 | 1.4648 |      - |     - |   56984 B |
| RemoveAllIteratively |      1000 | 205,037.5 ns | 1,441.21 ns | 1,203.47 ns |  1.07 |    0.03 | 1.4648 |      - |     - |   56985 B |
|   RemoveAllStaggered |      1000 | 819,562.0 ns | 1,894.96 ns | 1,772.55 ns |  4.30 |    0.10 | 0.9766 |      - |     - |   56985 B |
|                Clear |      1000 | 187,795.9 ns | 3,722.48 ns | 4,707.74 ns |  0.99 |    0.03 | 1.4648 |      - |     - |   56984 B |
|             Contains |      1000 | 188,888.6 ns | 3,745.33 ns | 4,458.55 ns |  0.99 |    0.03 | 1.4648 |      - |     - |   56984 B |
|      AddAndEnumerate |      1000 | 216,565.4 ns | 2,040.07 ns | 1,908.28 ns |  1.14 |    0.03 | 2.1973 | 0.2441 |     - |   73632 B |
|    ClearAndEnumerate |      1000 | 186,243.5 ns | 3,706.45 ns | 3,640.23 ns |  0.98 |    0.04 | 1.4648 |      - |     - |   57024 B |

osu!-side testing

Tested exiting from gameplay on Centipede:

Before

Screenshot_2020-09-23_22-32-07

After

image

@peppy
Copy link
Sponsor Member

peppy commented Sep 23, 2020

Is the optimised case common enough to warrant 5-18% decrease across all other benchmarks? Probably worth touching on that for the record.

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Sep 23, 2020

I made a few more optimisations that bring it closer, and updated the benchmarks. The difference is coming from the additional size of InvalidatableWeakReference, but I don't know how to do anything about that besides not making this change in the first place...
Keep in mind that all the individual ratios are better than current master - raw time is a little bit deceiving in the benchmarks because it includes the time for populating the list (time for Add).

On a profiling of loading of Centipede (the non-optimised version from earlier today), which creates all those bindables, I see the following:
image
Which means assuming a 10% difference (Add), master should be getting ~690ms instead of 762ms in the same scenario - a 100ms difference. I think this is pretty irrelevant considering the sheer load this map's putting on and compared to the >1s improvement on disposal with these changes.

Also, in most of our usages, bindables are loaded async and then have their value changed event bound to in LoadComplete. Whereas right now we have to have bindables unbinding synchronously, which could even occur on the finalizer. It's possible that this changes in the future, say by maybe bindable having a property that stops it from receiving value changes, but that's a whole different thing with its own considerations imo.

// Check whether the reference exists.
if (weakReference == null || !weakReference.TryGetTarget(out var obj))
{
// If the reference doesn't exist, it must have previously been removed and can be skipped.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I've removed the RemoveAt() code from here since the enumerators have been split out and it was unnecessary overhead.

This enumerator is only used by GetEnumerator() which does its own pre-trimming, so most of its use is negated.

osu.Framework/Lists/WeakList_AllItemsEnumerator.cs Outdated Show resolved Hide resolved
{
Reference = new WeakReference<T>(reference);
ObjectHashCode = reference == null ? 0 : EqualityComparer<T>.Default.GetHashCode(reference);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this is correct, unfortunately... For one thing, this can break if the class implements IEquatable<T>, but even if you use the more-correct-for-that-case RuntimeHelpers.GetHashCode(), a docs note states:

Note that GetHashCode always returns identical hash codes for equal object references. However, the reverse is not true: equal hash codes do not indicate equal object references. A particular hash code value is not unique to a particular object reference; different object references can generate identical hash codes.

Unless I'm reading this wrong, this indicates that collisions are potentially possible and that the hash code is not an uniquely-identifying invertible function. Sure, that possibility is probably slim, but I don't know that we ever want to debug one of those...

I suppose for full correctness this would be salvageable by keeping hash buckets, iterating over those rather than stopping at first match, and using ReferenceEquals for absolute certainty, but that might kneecap the optimisation completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's little cost associated with using object equality as absolute truth. It's technically already using "hash buckets", so it'll degenerate back to the original performance with many hash collisions, but that's a rare case.

I'll keep using EqualityComparer though - want to avoid boxing.

Updated the o!f benchmarks, only about 100us difference in RemoveAllStaggered(1000).

@smoogipoo
Copy link
Contributor Author

I'd also recommend testing with https://osu.ppy.sh/beatmapsets/121591#osu/311269, for me exiting goes from 10s -> 1s after this change.

@bdach
Copy link
Collaborator

bdach commented Sep 24, 2020

I think the code looks good now and can reproduce performance improvements.

I can get behind the enabling of nullable reference types done in these files in the last commit, but I'm not entirely sure we'll all be on the same page in that matter. @peppy is that ok by you?

@peppy
Copy link
Sponsor Member

peppy commented Sep 25, 2020

Seems local enough to not be an issue.

@peppy peppy merged commit 65424f4 into ppy:master Sep 25, 2020
@peppy peppy deleted the weaklist-improvement branch September 25, 2020 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants