Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONJ-805] >1Byte UTF8 character issue if maxFieldSize is set #156

Merged
merged 2 commits into from Sep 24, 2020

Conversation

MarkusLutum
Copy link

@MarkusLutum MarkusLutum commented Jul 1, 2020

Detected at one of my clients.
They use polish special characters like ść
The length in TextRowProtocol is set to 2 for each of those characters so 4 in our example.
With this the current code
return new String(buf, pos, Math.min(maxFieldSize * 3, length), StandardCharsets.UTF_8) .substring(0, Math.min(maxFieldSize, length));
results into an ArrayOutOfBoundsError at the substring call.

I added a test case and a quick fix for it.
Travis fails on environment issues. Nothing to do with my change.

@MarkusLutum
Copy link
Author

Files a bug for this as well:
https://jira.mariadb.org/browse/CONJ-805

@MarkusLutum MarkusLutum changed the title >1Byte UTF8 character issue if maxFieldSize is set [CONJ-805] >1Byte UTF8 character issue if maxFieldSize is set Jul 9, 2020
@harawata
Copy link
Contributor

harawata commented Jul 23, 2020

Both the current implementation and the proposed fix count (or try to count) the length of the string, however, the doc says...

Sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet object produced by this Statement object.

So, it should look something like this, shouldn't it?

if (maxFieldSize > 0) {
  return new String(buf, pos, Math.min(maxFieldSize, length), StandardCharsets.UTF_8);
}

@rusher
Copy link
Collaborator

rusher commented Jul 24, 2020

good point @harawata

In JDBC from spec :

8.3.1 Silent Truncation
The Statement.setMaxFieldSize method allows a maximum size (in bytes) to
be set. This limit applies only to the BINARY, VARBINARY, LONGVARBINARY, CHAR,
VARCHAR, LONGVARCHAR, NCHAR, NVARCHAR, and LONGNVARCHAR data types.
If a limit has been set using setMaxFieldSize and there is an attempt to read data
that exceeds the limit, any truncation that occurs as a result of exceeding the set limit
will not be reported.

CHAR and VARCHAR are explicitly listed.

This is then not just a bug correction, but a change of behavior compares to current driver implementation, so has then to be in the next 2.7 version, not in a corrective patch version.

@MarkusLutum
Copy link
Author

MarkusLutum commented Jul 25, 2020

@rusher: I change my pull request to @harawata s suggestion.

rusher added a commit that referenced this pull request Sep 10, 2020
Correction of possible ArrayOutOfBoundsError when using maxFieldSize
Correct behaviour of maxFieldSize, limiting not length of the string, but byte length.
@rusher rusher merged commit 88189ff into mariadb-corporation:master Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants