Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EBNF for admin statements and add checksum details #17617

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

dveeden
Copy link
Contributor

@dveeden dveeden commented May 24, 2024

What is changed, added or deleted? (Required)

  1. Some pages about specific admin statements also had EBNF for other admin statements.

  2. Added some info to the page about ADMIN CHECKSUM TABLE about what checksum can be expected the same.

Closes #15468

  1. Formatted the AdminStmt EBNF to be more readable.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v8.0 (TiDB 8.0 versions)
  • v7.6 (TiDB 7.6 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)
  • v5.2 (TiDB 5.2 versions)
  • v5.1 (TiDB 5.1 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@dveeden dveeden requested review from kennytm and qiancai May 24, 2024 13:55
@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 24, 2024
@dveeden
Copy link
Contributor Author

dveeden commented May 24, 2024

With the example from the checksum docs:

mysql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.0.0
Edition: Community
Git Commit Hash: 43cf9a2245b91c647b46816ad3d5424ef90f1070
Git Branch: HEAD
UTC Build Time: 2024-03-27 09:37:44
GoVersion: go1.21.4
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.01 sec)

So this is v8.0.0 with TiKV (not unistore). This is on Linux with a TiUP Playground.

mysql> CREATE TABLE t1(id INT PRIMARY KEY);
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 17934567675667613934 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

The above is what the docs have in the example, but with a different checksum.

mysql> DELETE FROM t1;
Query OK, 3 rows affected (0.01 sec)

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  0 |         0 |           0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 17934567675667613934 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

So if we remove all rows then the checksum is 0 as it isn't checksumming any rows. Then if we add the rows back then the checksum is the same again.

mysql> TRUNCATE TABLE t1;
Query OK, 0 rows affected (0.11 sec)

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  0 |         0 |           0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 11772166126661881374 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

But if we truncate the table instead of deleting the data and then add the rows back then the checksum is different. So the checksum seems to depend on the prefix with the table_id.

Note that the table has one column, which is an int, which is 4 bytes, but here each row shows 25 bytes.

@dveeden
Copy link
Contributor Author

dveeden commented May 24, 2024

Btw. Looks like this is where the checksum is calculated:

https://github.com/tikv/tikv/blob/fba1fb2f65620b7ab1c03a9cf918ef3968969398/src/coprocessor/checksum.rs#L83

@dveeden dveeden requested a review from breezewish May 24, 2024 14:08
@dveeden
Copy link
Contributor Author

dveeden commented May 24, 2024

I think TiDB Lightning executes this command when using a tidb backend, but isn't using the result?

@dveeden
Copy link
Contributor Author

dveeden commented May 24, 2024

With Unistore it behaves like this:

sql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  1 |         1 |           1 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.0008 sec)

sql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|        6 |
+----------+
1 row in set (0.0013 sec)

So the values it reports are static. Should we put a note in the docs about this?

@breezewish
Copy link
Member

breezewish commented May 24, 2024

@dveeden I believe the ADMIN CHECKSUM is simply not implemented in UniStore. However this behavior difference should not be documented, as UniStore is not documented and user will be confused.

Ref: Lightning calculates the Checksum in the same way, so that Lightning could know the integrity of the imported data.

https://github.com/pingcap/tidb/blob/8f958dec25cf100370f0e5e261aa7f43aaf3d118/pkg/lightning/verification/checksum.go#L60

Considering that this SQL is only used by TiDB Lightning and doesn't seem to provide any user benefits when it is used standalone (Table ID is counted in the Checksum so that it is actually a Physical / Internal Data Checksum), I think we may even consider hiding this document? Then user could find something more useful, with less noise.

@kennytm
Copy link
Contributor

kennytm commented May 25, 2024

@breezewish Some customer chooses to skip the ADMIN CHECKSUM execution in Lightning (set checksum = "off") and perform the check later manually. I'm not sure what to mean by "hiding" but it would be beneficial for the support team to still have some kind of documentation to point to about how to execute the command.

Copy link

ti-chi-bot bot commented May 25, 2024

@kennytm: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 27, 2024
@dveeden
Copy link
Contributor Author

dveeden commented May 27, 2024

Thanks @kennytm @breezewish

  1. I agree that we don't need to document this for Unistore.

  2. I have updated the ADMIN CHECKSUM TABLE text to better reflect what TiDB Lightning does. PTAL. I hope this makes it more clear to users that ADMIN CHECKSUM TABLE is linked to TiDB Lightning and not a replacement for CHECKSUM TABLE in MySQL.

  3. It looks like we could modify this in TiKV/TiDB to only checksum the table data, providing a more stable checksum that isn't influenced by table_id etc. This could maybe be used to have something similar to CHECKSUM TABLE, even if the values are not the same as MySQL (different algorithm?). Note that MySQL checksum is dependent on the row format etc, so users are probably not expecting this to be very stable. Note that I would suggest to add this functionality instead of replacing or chaning ADMIN CHECKSUM TABLE. This would implement Feature request: Support CHECKSUM TABLE command tidb#1895 If we do this we might want to implement per-partition checksum (cc @mjonss )

@qiancai qiancai self-assigned this May 28, 2024
@qiancai qiancai added the translation/doing This PR's assignee is translating this PR. label May 28, 2024
@ti-chi-bot ti-chi-bot bot removed the missing-translation-status This PR does not have translation status info. label May 28, 2024
@qiancai qiancai added area/sql-infra Indicates that the Issue or PR belongs to the area of sql-infra and sql-metadata. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels May 28, 2024
Copy link

ti-chi-bot bot commented Jun 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from qiancai, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql-infra Indicates that the Issue or PR belongs to the area of sql-infra and sql-metadata. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/doing This PR's assignee is translating this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enrich Documentation: ADMIN CHECKSUM TABLE
5 participants