Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression: fix builtin 'CharLength' for binary string input #7410

Merged
merged 6 commits into from Aug 20, 2018

Conversation

@spongedu
Copy link
Contributor

commented Aug 16, 2018

What problem does this PR solve?

Currently in TiDB, builtin CharLength's behavior is not consistent with MySQL when deal with binary strings.

In MySQL:

mysql> select char_length(binary("数 据 库"));
+------------------------------------+
| char_length(binary("数 据 库"))    |
+------------------------------------+
|                                 11 |
+------------------------------------+
1 row in set (0.00 sec)

While in TiDB:

tidb> select char_length(binary("数 据 库"));
+------------------------------------+
| char_length(binary("数 据 库"))    |
+------------------------------------+
|                                  5 |
+------------------------------------+
1 row in set (0.00 sec)

What is changed and how it works?

take special care for binary strings when create builtinSig

Check List

Tests

  • Unit test
  • Integration test

Code changes

Side effects

Related changes

Copy link
Member

left a comment

LGTM

@spongedu

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2018

/run-all-tests

if isNull || err != nil {
return 0, isNull, errors.Trace(err)
}
return int64(len([]byte(val))), false, nil

This comment has been minimized.

Copy link
@coocood

coocood Aug 17, 2018

Member

Just len(val) will do.

This comment has been minimized.

Copy link
@spongedu

spongedu Aug 17, 2018

Author Contributor

ok

@shenli

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

/run-all-tests

@coocood

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

@spongedu
In this way, If we support more charset in the future, we may need to define a signature for each charset.

Have you considered adding a field to charLengthSig that indicates if it is binary?

@spongedu

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2018

At first sight I think we need only distinguish binary strings and non-binary strings. I'm not sure about this for now. I'll look into it and make some tests today or so.

Copy link
Member

left a comment

LGTM

Copy link
Member

left a comment

LGTM

@zz-jason

This comment has been minimized.

Copy link
Member

commented Aug 20, 2018

/run-all-tests

@XuHuaiyu XuHuaiyu merged commit 5754f62 into pingcap:master Aug 20, 2018
4 checks passed
4 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.