-
Notifications
You must be signed in to change notification settings - Fork 341
Faster parsing for multiple rows #1330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -113,6 +317,13 @@ private ColumnDefinitionPayload(ResizableArraySegment<byte> originalData, Charac | |||
Decimals = decimals; | |||
} | |||
|
|||
public abstract object GetValueCore(ReadOnlySpan<byte> data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public
APIs should not have a Core
method name suffix; that's the suffix for a protected
method (that will be overridden by a derived type) to provide the "core" of the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I like the approach and it's not one I had considered. Looks like it replaces one virtual method call (to Row.GetValueCore
) with one to ColumnDefinitionPayload.GetValueCore
, so I don't think it will be adding any additional virtual dispatch overhead, although the former was probably much more likely to be optimised because there were only two possible derived types.
I don't like mixing the "business logic" of reading a column's value with the ColumnDefinitionPayload
class; the Payload
objects are simple "bundles of data" that represent exactly what gets received/sent on the wire. Instead, these should be a new hierarchy of ColumnReader
classes (possibly in a new MySqlConnector.ColumnReaders
namespace/folder) with ReadValue
, ReadInt32
, etc. methods. A factory method would create the correct derived type based on the data in the ColumnDefinitionPayload
.
right, i'll change PR accordingly |
force push correction. |
When result-set contains multiple rows, in order to get field value, the columns details are checked for each field parsing. This PR permit to set parsing method once for the result-set. This doesn't change much in terms of performance, but still not negligible. Performance results: ``` [Benchmark] public async Task getInt64() { using var cmd = Connection.CreateCommand(); cmd.CommandText = "SELECT * FROM seq_1_to_100000"; using var reader = await cmd.ExecuteReaderAsync(); long total = 0; do { while (await reader.ReadAsync()) { total += reader.GetInt64(0); } } while (await reader.NextResultAsync()); } ``` Initial results: ``` | Method | Library | Mean | Error | |--------- |--------------- |---------:|---------:| | getInt64 | MySqlConnector | 17.91 ms | 0.149 ms | ``` PR results: ``` | Method | Library | Mean | Error | |--------- |--------------- |---------:|---------:| | getInt64 | MySqlConnector | 17.59 ms | 0.125 ms | ```
btw, good job compare to MySql.Data 8.0.33 :
|
MySqlConnector doesn't implement lossy type conversions in The exception is for |
} | ||
|
||
if (!Session.SupportsDeprecateEof) | ||
{ | ||
payload = await Session.ReceiveReplyAsync(ioBehavior, CancellationToken.None).ConfigureAwait(false); | ||
EofPayload.Create(payload.Span); | ||
} | ||
if (!Session.SupportsDeprecateEof) | ||
{ | ||
payload = await Session.ReceiveReplyAsync(ioBehavior, CancellationToken.None).ConfigureAwait(false); | ||
EofPayload.Create(payload.Span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change correct/intentional?
It seems like it could require an additional EOF packet for the case when metadata from a prepared statement is reused. If so, it feels like integration tests shouldn't pass both before and after this change. Perhaps MARIADB_CLIENT_CACHE_METADATA
always implies CLIENT_DEPRECATE_EOF
so that this condition is always false
and the if
block is never entered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed https://mariadb.com/kb/en/result-set-packets/ and confirmed that this change is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this might have been done through another PR. This is a correction that might normally never occurs (there is no version that support skipping metadata without supporting EOF) but better to correct that just in case.
|
||
internal sealed class BinaryBooleanColumnReader : IColumnReader | ||
{ | ||
internal static BinaryBooleanColumnReader Instance { get; } = new BinaryBooleanColumnReader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Singleton implementations of these classes is a good idea to reduce allocations 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Instance
properties (or constructors for some types) are still logically part of the public
API and should be public
(even on an internal
type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also use target-typed new
here.
Will fix in 4d05604.
private bool allowZeroDateTime; | ||
private bool convertZeroDateTime; | ||
private DateTimeKind dateTimeKind; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the style of existing classes:
- fields are placed at the end of the class definition
- private fields are prefixed with
m_
Additionally, these fields should be readonly
.
Will fix in 10a3c08.
@@ -567,4 +518,5 @@ private static void CheckBufferArguments<T>(long dataOffset, T[] buffer, int buf | |||
private readonly int[] m_dataOffsets; | |||
private readonly int[] m_dataLengths; | |||
private ReadOnlyMemory<byte> m_data; | |||
private IColumnReader[] columnReaders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private readonly IColumnReader[] m_columnReaders;
Will fix in 1cc81ef.
case MySqlGuidFormat.Binary16: | ||
return Guid16ColumnReader.Instance; | ||
case MySqlGuidFormat.TimeSwapBinary16: | ||
return TimeSwapBinary16ColumnReader.Instance; | ||
default: | ||
return GuidBytesColumnReader.Instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These classes have somewhat random and inconsistent names. It would make more sense for them to be named consistently with the MySqlGuidFormat
enum values, e.g., GuidBinary16ColumnReader
, GuidTimeSwapBinary16ColumnReader
, GuidLittleEndianBinary16ColumnReader
.
Will fix in e400f96.
return Guid36ColumnReader.Instance; | ||
if (connection.GuidFormat == MySqlGuidFormat.Char32 | ||
&& columnDefinition.ColumnLength / ProtocolUtility.GetBytesPerCharacter(columnDefinition.CharacterSet) == 32) | ||
return Guid32ColumnReader.Instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, should be GuidChar36ColumnReader
and GuidChar32ColumnReader
.
namespace MySqlConnector.ColumnReaders; | ||
using MySqlConnector.Protocol.Payloads; | ||
|
||
internal interface IColumnReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an abstract
base class would be more useful here; it could contain the commonly repeated throw new InvalidCastException
implementation for GetInt32
.
Additionally, if a (say) GetInt64
method were added in the future, a common implementation could be added once in the ABC, instead of having to be repeated in each concrete implementation.
Will fix in e130a7d.
if ((columnDefinition.ColumnFlags & ColumnFlags.Binary) == 0) | ||
{ | ||
// when the Binary flag IS NOT set, the BIT column is transmitted as MSB byte array | ||
ulong bitValue = 0; | ||
for (int i = 0; i < data.Length; i++) | ||
bitValue = bitValue * 256 + data[i]; | ||
return checked((int) bitValue); | ||
} | ||
else if (columnDefinition.ColumnLength <= 5 && data.Length == 1 && data[0] < (byte) (1 << (int) columnDefinition.ColumnLength)) | ||
{ | ||
// a server bug may return the data as binary even when we expect text: https://github.com/mysql-net/MySqlConnector/issues/713 | ||
// in this case, the data can't possibly be an ASCII digit, so assume it's the binary serialisation of BIT(n) where n <= 5 | ||
return checked((int) data[0]); | ||
} | ||
else | ||
{ | ||
// when the Binary flag IS set, the BIT column is transmitted as text | ||
if (!Utf8Parser.TryParse(data, out long value, out var bytesConsumed) || bytesConsumed != data.Length) | ||
{ | ||
throw new FormatException(); | ||
} | ||
return checked((int) value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some significant code duplication happening here. This should be refactored into a helper method.
Will fix in 34bf3aa.
if (!Utf8Parser.TryParse(data, out decimal decimalValue, out int bytesConsumed) || bytesConsumed != data.Length) | ||
{ | ||
throw new FormatException(); | ||
} | ||
return (int) decimalValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code duplication could be refactored into a helper method, too.
This restores the content of the exception message to what it was before mysql-net#1330. Signed-off-by: Bradley Grainger <bgrainger@gmail.com>
|
||
public int ReadInt32(ReadOnlySpan<byte> data, ColumnDefinitionPayload columnDefinition) | ||
{ | ||
throw new InvalidCastException($"Can't convert {columnDefinition.ColumnType} to Int32"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the internal ColumnType
enum. The previous code used a MySqlDbType
value:
throw new InvalidCastException($"Can't convert {ResultSet.GetColumnType(ordinal)} to Int32");
Will fix in 88a54a6.
var isUnsigned = (columnDefinition.ColumnFlags & ColumnFlags.Unsigned) != 0; | ||
if (binary) | ||
{ | ||
switch (columnDefinition.ColumnType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a bit of duplicated logic between the binary
and !binary
switch
statements that could be consolidated.
Will fix in 808f91f.
columnReaders = Array.ConvertAll(ResultSet.ColumnDefinitions, | ||
new Converter<ColumnDefinitionPayload, IColumnReader>(column => ColumnReaderFactory.GetReader(binary, column, resultSet.Connection))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Array.ConvertAll
and a delegate allocation seems needlessly inefficient compared to a foreach
loop.
Will fix in 3e48b93.
@@ -11,7 +11,7 @@ namespace MySqlConnector.Core; | |||
internal sealed class BinaryRow : Row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's so little code left in these classes they could probably be eliminated.
Will fix in acd9b94.
When result-set contains multiple rows, in order to get field value, the columns details are checked for each field parsing.
This PR permit to set parsing method once for the result-set.
This doesn't change much in terms of performance, but still not negligible.
Performance results:
Initial results:
PR results: