Skip to content

Faster parsing for multiple rows #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2023
Merged

Faster parsing for multiple rows #1330

merged 1 commit into from
Jun 16, 2023

Conversation

rusher
Copy link
Contributor

@rusher rusher commented Jun 15, 2023

When result-set contains multiple rows, in order to get field value, the columns details are checked for each field parsing.

This PR permit to set parsing method once for the result-set.
This doesn't change much in terms of performance, but still not negligible.

Performance results:

[Benchmark]
public async Task getInt64()
{
	using var cmd = Connection.CreateCommand();
	cmd.CommandText = "SELECT * FROM seq_1_to_100000";
	using var reader = await cmd.ExecuteReaderAsync();
	long total = 0;
	do
	{
		while (await reader.ReadAsync())
		{
			total += reader.GetInt64(0);
		}
	} while (await reader.NextResultAsync());
}

Initial results:

|   Method |        Library |     Mean |    Error |
|--------- |--------------- |---------:|---------:|
| getInt64 | MySqlConnector | 18.01 ms | 0.171 ms |

PR results:

|   Method |        Library |     Mean |    Error |
|--------- |--------------- |---------:|---------:|
| getInt64 | MySqlConnector | 17.67 ms | 0.171 ms |

@@ -113,6 +317,13 @@ private ColumnDefinitionPayload(ResizableArraySegment<byte> originalData, Charac
Decimals = decimals;
}

public abstract object GetValueCore(ReadOnlySpan<byte> data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public APIs should not have a Core method name suffix; that's the suffix for a protected method (that will be overridden by a derived type) to provide the "core" of the implementation.

Copy link
Member

@bgrainger bgrainger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like the approach and it's not one I had considered. Looks like it replaces one virtual method call (to Row.GetValueCore) with one to ColumnDefinitionPayload.GetValueCore, so I don't think it will be adding any additional virtual dispatch overhead, although the former was probably much more likely to be optimised because there were only two possible derived types.

I don't like mixing the "business logic" of reading a column's value with the ColumnDefinitionPayload class; the Payload objects are simple "bundles of data" that represent exactly what gets received/sent on the wire. Instead, these should be a new hierarchy of ColumnReader classes (possibly in a new MySqlConnector.ColumnReaders namespace/folder) with ReadValue, ReadInt32, etc. methods. A factory method would create the correct derived type based on the data in the ColumnDefinitionPayload.

@rusher
Copy link
Contributor Author

rusher commented Jun 15, 2023

right, i'll change PR accordingly

@rusher
Copy link
Contributor Author

rusher commented Jun 16, 2023

force push correction.
there is something that tickles me: I left it as it was before, but why the columns of type FLOAT and DOUBLE are not allowed to retrieve a value for reader.GetInt32() since it's allowed for DECIMAL ?

When result-set contains multiple rows, in order to get field value, the columns details are checked for each field parsing.

This PR permit to set parsing method once for the result-set.
This doesn't change much in terms of performance, but still not negligible.

Performance results:

```
[Benchmark]
public async Task getInt64()
{
	using var cmd = Connection.CreateCommand();
	cmd.CommandText = "SELECT * FROM seq_1_to_100000";
	using var reader = await cmd.ExecuteReaderAsync();
	long total = 0;
	do
	{
		while (await reader.ReadAsync())
		{
			total += reader.GetInt64(0);
		}
	} while (await reader.NextResultAsync());
}
```

Initial results:

```
|   Method |        Library |     Mean |    Error |
|--------- |--------------- |---------:|---------:|
| getInt64 | MySqlConnector | 17.91 ms | 0.149 ms |
```

PR results:

```
|   Method |        Library |     Mean |    Error |
|--------- |--------------- |---------:|---------:|
| getInt64 | MySqlConnector | 17.59 ms | 0.125 ms |
```
@rusher
Copy link
Contributor Author

rusher commented Jun 16, 2023

btw, good job compare to MySql.Data 8.0.33 :

|   Method |        Library |      Mean |    Error |
|--------- |--------------- |----------:|---------:|
| getInt64 |     MySql.Data | 163.46 ms | 0.625 ms |
| getInt64 | MySqlConnector |  17.60 ms | 0.256 ms |

@bgrainger
Copy link
Member

why the columns of type FLOAT and DOUBLE are not allowed to retrieve a value for reader.GetInt32() since it's allowed for DECIMAL

MySqlConnector doesn't implement lossy type conversions in GetInt32. (As a workaround, one can call Convert.ToInt32(GetValue()) instead and potentially lose data accuracy.)

The exception is for Decimal because SUM and AVG will return a DECIMAL value even for integral data: #54 (comment). This is to avoid unexpected failure when use GetInt32 to read the SUM of an INT column.

Comment on lines +150 to +155
}

if (!Session.SupportsDeprecateEof)
{
payload = await Session.ReceiveReplyAsync(ioBehavior, CancellationToken.None).ConfigureAwait(false);
EofPayload.Create(payload.Span);
}
if (!Session.SupportsDeprecateEof)
{
payload = await Session.ReceiveReplyAsync(ioBehavior, CancellationToken.None).ConfigureAwait(false);
EofPayload.Create(payload.Span);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change correct/intentional?

It seems like it could require an additional EOF packet for the case when metadata from a prepared statement is reused. If so, it feels like integration tests shouldn't pass both before and after this change. Perhaps MARIADB_CLIENT_CACHE_METADATA always implies CLIENT_DEPRECATE_EOF so that this condition is always false and the if block is never entered?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed https://mariadb.com/kb/en/result-set-packets/ and confirmed that this change is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this might have been done through another PR. This is a correction that might normally never occurs (there is no version that support skipping metadata without supporting EOF) but better to correct that just in case.

@bgrainger bgrainger merged commit a092906 into mysql-net:master Jun 16, 2023

internal sealed class BinaryBooleanColumnReader : IColumnReader
{
internal static BinaryBooleanColumnReader Instance { get; } = new BinaryBooleanColumnReader();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singleton implementations of these classes is a good idea to reduce allocations 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Instance properties (or constructors for some types) are still logically part of the public API and should be public (even on an internal type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use target-typed new here.

Will fix in 4d05604.

Comment on lines +14 to +16
private bool allowZeroDateTime;
private bool convertZeroDateTime;
private DateTimeKind dateTimeKind;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the style of existing classes:

  • fields are placed at the end of the class definition
  • private fields are prefixed with m_

Additionally, these fields should be readonly.

Will fix in 10a3c08.

@@ -567,4 +518,5 @@ private static void CheckBufferArguments<T>(long dataOffset, T[] buffer, int buf
private readonly int[] m_dataOffsets;
private readonly int[] m_dataLengths;
private ReadOnlyMemory<byte> m_data;
private IColumnReader[] columnReaders;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private readonly IColumnReader[] m_columnReaders;

Will fix in 1cc81ef.

Comment on lines +55 to +60
case MySqlGuidFormat.Binary16:
return Guid16ColumnReader.Instance;
case MySqlGuidFormat.TimeSwapBinary16:
return TimeSwapBinary16ColumnReader.Instance;
default:
return GuidBytesColumnReader.Instance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes have somewhat random and inconsistent names. It would make more sense for them to be named consistently with the MySqlGuidFormat enum values, e.g., GuidBinary16ColumnReader, GuidTimeSwapBinary16ColumnReader, GuidLittleEndianBinary16ColumnReader.

Will fix in e400f96.

Comment on lines +131 to +134
return Guid36ColumnReader.Instance;
if (connection.GuidFormat == MySqlGuidFormat.Char32
&& columnDefinition.ColumnLength / ProtocolUtility.GetBytesPerCharacter(columnDefinition.CharacterSet) == 32)
return Guid32ColumnReader.Instance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, should be GuidChar36ColumnReader and GuidChar32ColumnReader.

namespace MySqlConnector.ColumnReaders;
using MySqlConnector.Protocol.Payloads;

internal interface IColumnReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an abstract base class would be more useful here; it could contain the commonly repeated throw new InvalidCastException implementation for GetInt32.

Additionally, if a (say) GetInt64 method were added in the future, a common implementation could be added once in the ABC, instead of having to be repeated in each concrete implementation.

Will fix in e130a7d.

Comment on lines +37 to +59
if ((columnDefinition.ColumnFlags & ColumnFlags.Binary) == 0)
{
// when the Binary flag IS NOT set, the BIT column is transmitted as MSB byte array
ulong bitValue = 0;
for (int i = 0; i < data.Length; i++)
bitValue = bitValue * 256 + data[i];
return checked((int) bitValue);
}
else if (columnDefinition.ColumnLength <= 5 && data.Length == 1 && data[0] < (byte) (1 << (int) columnDefinition.ColumnLength))
{
// a server bug may return the data as binary even when we expect text: https://github.com/mysql-net/MySqlConnector/issues/713
// in this case, the data can't possibly be an ASCII digit, so assume it's the binary serialisation of BIT(n) where n <= 5
return checked((int) data[0]);
}
else
{
// when the Binary flag IS set, the BIT column is transmitted as text
if (!Utf8Parser.TryParse(data, out long value, out var bytesConsumed) || bytesConsumed != data.Length)
{
throw new FormatException();
}
return checked((int) value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some significant code duplication happening here. This should be refactored into a helper method.

Will fix in 34bf3aa.

Comment on lines +18 to +22
if (!Utf8Parser.TryParse(data, out decimal decimalValue, out int bytesConsumed) || bytesConsumed != data.Length)
{
throw new FormatException();
}
return (int) decimalValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code duplication could be refactored into a helper method, too.

bgrainger added a commit to bgrainger/MySqlConnector that referenced this pull request Jun 17, 2023
This restores the content of the exception message to what it was before mysql-net#1330.

Signed-off-by: Bradley Grainger <bgrainger@gmail.com>

public int ReadInt32(ReadOnlySpan<byte> data, ColumnDefinitionPayload columnDefinition)
{
throw new InvalidCastException($"Can't convert {columnDefinition.ColumnType} to Int32");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the internal ColumnType enum. The previous code used a MySqlDbType value:

throw new InvalidCastException($"Can't convert {ResultSet.GetColumnType(ordinal)} to Int32");

Will fix in 88a54a6.

var isUnsigned = (columnDefinition.ColumnFlags & ColumnFlags.Unsigned) != 0;
if (binary)
{
switch (columnDefinition.ColumnType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a bit of duplicated logic between the binary and !binary switch statements that could be consolidated.

Will fix in 808f91f.

Comment on lines +378 to +379
columnReaders = Array.ConvertAll(ResultSet.ColumnDefinitions,
new Converter<ColumnDefinitionPayload, IColumnReader>(column => ColumnReaderFactory.GetReader(binary, column, resultSet.Connection)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Array.ConvertAll and a delegate allocation seems needlessly inefficient compared to a foreach loop.

Will fix in 3e48b93.

@@ -11,7 +11,7 @@ namespace MySqlConnector.Core;
internal sealed class BinaryRow : Row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's so little code left in these classes they could probably be eliminated.

Will fix in acd9b94.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants