-
Notifications
You must be signed in to change notification settings - Fork 52
Performance improvement #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement #236
Conversation
Windows .Net Framework Tests FAILed. |
Linux Net Core Tests FAILed. |
Windows Net Core Tests FAILed. |
verify |
Windows .Net Framework Tests FAILed. |
036c96f
to
f91f491
Compare
Linux Net Core Tests FAILed. |
Windows Net Core Tests FAILed. |
object value; | ||
if (_nearCache.TryGetValue(keyData, out value)) | ||
{ | ||
keyList.Remove(keyData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are iterating over the keyList with indexes and calling remove in the middle. It looks like we will have array index out of bound exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the iteration
Linux Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
} | ||
|
||
internal class ResponseParameters | ||
internal static void DecodeResponse(IClientMessage clientMessage, ConcurrentQueue<KeyValuePair<IData, object>> result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the optimization using a concrete implementation instead of the interface? Can you share the performance numbers how this improves performance?
What about elimination of the class ResponseParameters class with the static method?
I see that only 5 codecs are changed, should this be applied to all the other codecs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a IMap optimization that we can backport in 4.0 series to all codecs
dataSize += ParameterUtil.CalculateDataSize(name); | ||
dataSize += Bits.IntSizeInBytes; | ||
foreach (var keysItem in keys) | ||
for (int i = 0; i < keys.Count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change affect performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on .Net foreach is slower than for loop
var newCapacity = QuickMath.NextPowerOfTwo(requiredCapacity); | ||
var newBuffer = new byte[newCapacity]; | ||
Array.Copy(_protocolBuffer.ByteArray(), 0, newBuffer, 0, _capacity); | ||
Buffer.BlockCopy(_protocolBuffer.ByteArray(), 0, newBuffer, 0, _capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the performance improvement? :
Array.Copy iterates each element while Buffer.BlockCopy performs block of memcpy? So, can we assume that we always need to prefer memcpy to iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer.BlockCopy is faster than Array.Copy. both are native calls and I'm not sure of the reason. I can measure improvement.
} | ||
|
||
public virtual void PutLong(int index, long value) | ||
public void PutLong(int index, long value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. So, virtual methods and interfaces seem to decrease performance for .Net. I am surprised that the .NET virtual machine does not optimize these usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes virtual methods and interface calls add extra overhead.
public virtual void GetBytes(int index, byte[] dst, int offset, int length) | ||
public void GetBytes(int index, byte[] dst, int offset, int length) | ||
{ | ||
BoundsCheck(index, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be also unneeded even if previously when Array.Copy
was used, since Array.Copy
method also does bounds check and throw ArgumentOutOfRangeException
when length was not sufficient, right? I wonder how much performance affect this had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it as it's unnecessary. This is bound check is on all of the hot path so every peny count on this kink of hot paths.
for (var i = 0; i < resultingKeyValuePairs.Count;) | ||
}); | ||
base.GetAllInternal(partitionToKeyData, resultingKeyValuePairs); | ||
Parallel.ForEach(resultingKeyValuePairs, kvp => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting optimization. does this speed up when number of entries is over a certain number or is it better to use even if the number of keys is small like say 5 (since I assume there will be a cost of splitting the data to different threads and merging) ?
ExceptionUtil.Rethrow(e); | ||
} | ||
return result; | ||
return new ReadOnlyLazyDictionary<TKey, TValue>(resultingKeyValuePairs, GetContext().GetSerializationService()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a performance number for how much this lazy usage improve the performance in case where the object is always used and deserialized finally event when lazy?
var partitions = new ArrayList(partitionCount); | ||
for (var i = 0; i < partitionCount; i++) | ||
{ | ||
partitions.Add(ArrayList.Synchronized(new ArrayList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a Synchronized list here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the next parallel loop will use it but when I benchmark it, I saw itS even faster than concurrentQueue as most of the time is used on the serialization and threads rarely make sync calls.
} | ||
else | ||
{ | ||
if (itemData.Equals(serializationService.ToData(dValue))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you benefit if you cache this deserialized object? A second call to Contains will be much faster using the cache, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second call to Contains is quite a rare case for this list. 99% will just loop over it. But if I cache it, that will be a memory overhead for every case.
Moreover contains method is just implemented just not to leave it empty.
f91f491
to
a8b605c
Compare
Linux Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
a8b605c
to
2f45345
Compare
Linux Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
2f45345
to
1cdb044
Compare
Linux Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
1cdb044
to
0d4ae8c
Compare
Linux Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
0f08169
to
b719af2
Compare
Windows .Net Framework Tests FAILed. |
Windows Net Core Tests FAILed. |
Linux Net Core Tests FAILed. |
1 similar comment
Linux Net Core Tests FAILed. |
Windows Net Core Tests FAILed. |
Windows .Net Framework Tests FAILed. |
verify |
Linux Net Core Tests FAILed. |
Windows Net Core Tests PASSed. |
Windows .Net Framework Tests FAILed. |
run-net |
Windows .Net Framework Tests FAILed. |
b719af2
to
052dc34
Compare
Linux Net Core Tests PASSed. |
Windows .Net Framework Tests PASSed. |
Windows Net Core Tests PASSed. |
performance improvement on serialization bits, codecs and lazyDictioary introduced