Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execution: semi join takes too long to run #47424

Closed
qw4990 opened this issue Oct 7, 2023 · 1 comment · Fixed by #47764
Closed

execution: semi join takes too long to run #47424

qw4990 opened this issue Oct 7, 2023 · 1 comment · Fixed by #47764

Comments

@qw4990
Copy link
Contributor

qw4990 commented Oct 7, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

From https://asktug.com/t/topic/1014053/4

Prepare TPCH dataset with sf=1:

go-tpc tpch --sf=1 prepare

Then the 2-way semi-join with 150000 rows takes 13s to run.

mysql> EXPLAIN ANALYZE select count(*) from customer a where exists (select 1 from customer b where a.C_NATIONKEY=b.C_NATIONKEY);
+-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+
| id                            | estRows   | actRows | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                                                     | operator info                                                               | memory   | disk    |
+-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+
| HashAgg_10                    | 1.00      | 1       | root      |               | time:13.1s, loops:2, RU:34.346561, partial_worker:{wall_time:13.1241415s, concurrency:5, task_num:148, tot_wait:1m5.616187036s, tot_exec:4.228498ms, tot_time:1m5.620451542s, max:13.124107333s, p95:13.124107333s}, final_worker:{wall_time:13.124157041s, concurrency:5, task_num:5, tot_wait:1m5.620598336s, tot_exec:19.292µs, tot_time:1m5.620619751s, max:13.124129334s, p95:13.124129334s}  | funcs:count(1)->Column#18                                                   | 165.3 KB | N/A     |
| └─HashJoin_12                 | 120000.00 | 150000  | root      |               | time:13.1s, loops:149, build_hash_table:{total:7.92ms, fetch:2.93ms, build:4.99ms}, probe:{concurrency:5, total:1m5.1s, max:13.1s, probe:1m5.1s, fetch:44.1ms}                                                                                                                                                                                                                                     | semi join, equal:[eq(test.customer.c_nationkey, test.customer.c_nationkey)] | 3.57 MB  | 0 Bytes |
|   ├─TableReader_16(Build)     | 150000.00 | 150000  | root      |               | time:2.81ms, loops:149, cop_task: {num: 11, max: 805.8µs, min: 144.5µs, avg: 330.9µs, p95: 805.8µs, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 1.19ms, tot_wait: 364µs, rpc_num: 11, rpc_time: 3.61ms, copr_cache_hit_ratio: 0.64, build_task_duration: 666ns, max_distsql_concurrency: 1}                                                                                                | data:TableFullScan_15                                                       | 657.0 KB | N/A     |
|   │ └─TableFullScan_15        | 150000.00 | 150000  | cop[tikv] | table:b       | tikv_task:{proc max:17ms, min:0s, avg: 5.36ms, p80:9ms, p95:17ms, iters:190, tasks:11}, scan_detail: {total_process_keys: 3712, total_process_keys_size: 756069, total_keys: 3716, get_snapshot_time: 188µs, rocksdb: {key_skipped_count: 3712, block: {cache_hit_count: 29}}}                                                                                                                     | keep order:false                                                            | N/A      | N/A     |
|   └─TableReader_14(Probe)     | 150000.00 | 150000  | root      |               | time:3.97ms, loops:149, cop_task: {num: 11, max: 894.9µs, min: 330.5µs, avg: 568.2µs, p95: 894.9µs, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 1.28ms, tot_wait: 684.8µs, rpc_num: 11, rpc_time: 6.14ms, copr_cache_hit_ratio: 0.64, build_task_duration: 4.29µs, max_distsql_concurrency: 1}                                                                                             | data:TableFullScan_13                                                       | 657.0 KB | N/A     |
|     └─TableFullScan_13        | 150000.00 | 150000  | cop[tikv] | table:a       | tikv_task:{proc max:17ms, min:0s, avg: 5.36ms, p80:9ms, p95:17ms, iters:190, tasks:11}, scan_detail: {total_process_keys: 3712, total_process_keys_size: 756069, total_keys: 3716, get_snapshot_time: 392.9µs, rocksdb: {key_skipped_count: 3712, block: {cache_hit_count: 29}}}                                                                                                                   | keep order:false                                                            | N/A      | N/A     |
+-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+
6 rows in set (13.12 sec)

The algorithm complexity is O(150000.00 * log(150000.00)), which shouldn't take such a long time to run.

From the profile, it seems most time is spent on probe:

image

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

mysql> select tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v7.3.0
Edition: Community
Git Commit Hash: 40b72e7a9a4fc9670d4c5d974dd503a3c6097471
Git Branch: heads/refs/tags/v7.3.0
UTC Build Time: 2023-08-08 09:27:28
GoVersion: go1.20.7
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
@qw4990 qw4990 added type/bug This issue is a bug. sig/execution SIG execution labels Oct 7, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Oct 8, 2023
@zanmato1984 zanmato1984 added type/bug This issue is a bug. sig/execution SIG execution and removed type/bug This issue is a bug. sig/execution SIG execution labels Oct 9, 2023
@zanmato1984
Copy link
Contributor

This is hardly a bug. I'm removing the bug label. But we are looking into why semi join is unexpectedly slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants