New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression: handle empty input and improve compatibility for `format` #8797

Merged
merged 9 commits into from Jan 2, 2019

Conversation

Projects
None yet
4 participants
@eurekaka
Copy link
Contributor

eurekaka commented Dec 25, 2018

What problem does this PR solve?

Before this PR, in TiDB:

mysql> select format('', 1);
ERROR 2013 (HY000): Lost connection to MySQL server during query

mysql> select format(1, '');
ERROR 2013 (HY000): Lost connection to MySQL server during query

mysql> select length(format(1, 10240));
+--------------------------+
| length(format(1, 10240)) |
+--------------------------+
|                    10242 |
+--------------------------+
1 row in set (0.00 sec)

while in MySQL 5.7.10:

mysql> select format('', 1);
+---------------+
| format('', 1) |
+---------------+
| 0.0           |
+---------------+
1 row in set (0.00 sec)

mysql> select format(1, '');
+---------------+
| format(1, '') |
+---------------+
| 1             |
+---------------+
1 row in set, 1 warning (0.01 sec)

mysql> select format(1, 10240);
+----------------------------------+
| format(1, 10240)                 |
+----------------------------------+
| 1.000000000000000000000000000000 |
+----------------------------------+
1 row in set (0.00 sec)

mysql> select length(format(1, 10240));
+--------------------------+
| length(format(1, 10240)) |
+--------------------------+
|                       32 |
+--------------------------+
1 row in set (0.00 sec)

What is changed and how it works?

const int FORMAT_MAX_DECIMALS= 30;

String *Item_func_format::val_str_ascii(String *str)
{
  /* Number of decimal digits */
  int dec;

  dec= (int) args[1]->val_int();
  if (args[1]->null_value)
  {
    null_value=1;
    return NULL;
  }

  dec= set_zone(dec, 0, FORMAT_MAX_DECIMALS);
  ...
}
  • make result of select format(1, 4, null) compatible with MySQL.

Check List

Tests

  • Unit test: new unit tests added, and existing unit tests are updated to be compatible with MySQL.

Code changes

N/A

Side effects

N/A

Related changes

  • Need to cherry-pick to the release branch
  • Some compatibility issues still exist for function format:
    • #8796
    • warning type returned is different from MySQL, but this should be a separate and general issue for a bunch of built-in functions;

This change is Reviewable

eurekaka added some commits Dec 19, 2018

- change arg type
- change format with locale
- modify tests
- handle null in format with locale
@winoros
Copy link
Member

winoros left a comment

rest lgtm

Show resolved Hide resolved expression/builtin_string.go Outdated
// evalString evals FORMAT(X,D).
// See https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_format
func (b *builtinFormatSig) evalString(row chunk.Row) (string, bool, error) {
x, isNull, err := b.args[0].EvalString(b.ctx, row)
x, isNull, err := b.args[0].EvalReal(b.ctx, row)

This comment has been minimized.

@zz-jason

zz-jason Dec 25, 2018

Member

The argument types are all ETString: https://github.com/pingcap/tidb/pull/8797/files#diff-314e997a9df9b116e8f0aad4149df468R2920. We should not change EvalString() to EvalReal().

This comment has been minimized.

@eurekaka

eurekaka Dec 26, 2018

Contributor

This is the output of MySQL:

mysql> select format("1a","2a");
+-------------------+
| format("1a","2a") |
+-------------------+
| 1.00              |
+-------------------+
1 row in set, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1292 | Truncated incorrect INTEGER value: '2a' |
| Warning | 1292 | Truncated incorrect DOUBLE value: '1a'  |
+---------+------+-----------------------------------------+
2 rows in set (0.00 sec)

Seems MySQL believes the type of parameters to be Double and Integer, but it is not reflected in https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_format.

This comment has been minimized.

@eurekaka

eurekaka Dec 26, 2018

Contributor

In MySQL source code, for function format(X, D), X is evaluated as decimal or double type, while D is evaluated as integer type.

  dec= (int) args[1]->val_int();
  if (args[0]->result_type() == DECIMAL_RESULT ||
      args[0]->result_type() == INT_RESULT)
  {
    res= args[0]->val_decimal(&dec_val);
  }
  else
  {
    double nr= args[0]->val_real();
  }

This comment has been minimized.

@zz-jason
@eurekaka

This comment has been minimized.

Copy link
Contributor

eurekaka commented Dec 26, 2018

@zz-jason @winoros comments addressed, PTAL

@eurekaka

This comment has been minimized.

Copy link
Contributor

eurekaka commented Dec 29, 2018

@lamxTyler
Copy link
Member

lamxTyler left a comment

LGTM

}

d, isNull, err := b.args[1].EvalString(b.ctx, row)
x, d, isNull, err := evalNumDecArgsForFormat(b, row)

This comment has been minimized.

@zz-jason

zz-jason Dec 29, 2018

Member

Actually, the target to use different sigs for different input types is to reduce the branch predicate in the execution phase, to gain the same improvement as the JIT method, i.e:

if arg0.GetType().EvalType() == types.ETDecimal {
	...
} else {
	...
}

In the execution phase, the type is a constant, so the above branch predicate can be eliminated to avoid CPU branch predication and exhibit the power of CPU pipeline.

This comment has been minimized.

@eurekaka

eurekaka Dec 29, 2018

Contributor

For the first branch in line 2922, we cannot eliminate it since we need the exact type to construct builtinFunc? for the second branch in line 2950, we cannot eliminate it either?

This comment has been minimized.

@zz-jason

zz-jason Dec 29, 2018

Member

Yes, the first branch in line 2922 can not be eliminated. But the second branch in line 2950 can be eliminated. Because in this time, arg0.GetType().EvalType() is definitive if we create separate signature for FORMAT() for different input argument types, like:

builtinFormatDecimalWithLocaleSig -- FORMAT(decimal, int, string)
builtinFormatRealWithLocaleSig    -- FORMAT(float64, int, string)

This comment has been minimized.

@eurekaka

eurekaka Jan 2, 2019

Contributor

Oh I got it, but the shortcoming of different sigs is leading to a lot of redundant/duplicate code? The number of format arguments can be 2 or 3, while the first argument can be decimal or double type, so we have 4 combinations of sigs, it may look ugly IMHO... Since this function would not be called frequently, I prefer readability over eliminating branch predication for performance here?

This comment has been minimized.

@zz-jason

zz-jason Jan 2, 2019

Member

it may look ugly

Can't agree more! In the near future, expression evaluation will be refactored to the vectorized version. And as you said "this function would not be called frequently", It's fine to keep the current implementation for readability.

@zz-jason
Copy link
Member

zz-jason left a comment

LGTM

@zz-jason zz-jason added status/LGT2 and removed status/LGT1 labels Jan 2, 2019

@zz-jason

This comment has been minimized.

Copy link
Member

zz-jason commented Jan 2, 2019

/run-all-tests

@eurekaka

This comment has been minimized.

Copy link
Contributor

eurekaka commented Jan 2, 2019

/run-common-test

1 similar comment
@eurekaka

This comment has been minimized.

Copy link
Contributor

eurekaka commented Jan 2, 2019

/run-common-test

@eurekaka eurekaka merged commit 477e252 into pingcap:master Jan 2, 2019

3 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@eurekaka eurekaka deleted the eurekaka:format_empty_arg branch Jan 2, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment