We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please answer these questions before submitting your issue. Thanks!
From https://asktug.com/t/topic/1014053/4
Prepare TPCH dataset with sf=1:
sf=1
go-tpc tpch --sf=1 prepare
Then the 2-way semi-join with 150000 rows takes 13s to run.
150000
mysql> EXPLAIN ANALYZE select count(*) from customer a where exists (select 1 from customer b where a.C_NATIONKEY=b.C_NATIONKEY); +-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+ | HashAgg_10 | 1.00 | 1 | root | | time:13.1s, loops:2, RU:34.346561, partial_worker:{wall_time:13.1241415s, concurrency:5, task_num:148, tot_wait:1m5.616187036s, tot_exec:4.228498ms, tot_time:1m5.620451542s, max:13.124107333s, p95:13.124107333s}, final_worker:{wall_time:13.124157041s, concurrency:5, task_num:5, tot_wait:1m5.620598336s, tot_exec:19.292µs, tot_time:1m5.620619751s, max:13.124129334s, p95:13.124129334s} | funcs:count(1)->Column#18 | 165.3 KB | N/A | | └─HashJoin_12 | 120000.00 | 150000 | root | | time:13.1s, loops:149, build_hash_table:{total:7.92ms, fetch:2.93ms, build:4.99ms}, probe:{concurrency:5, total:1m5.1s, max:13.1s, probe:1m5.1s, fetch:44.1ms} | semi join, equal:[eq(test.customer.c_nationkey, test.customer.c_nationkey)] | 3.57 MB | 0 Bytes | | ├─TableReader_16(Build) | 150000.00 | 150000 | root | | time:2.81ms, loops:149, cop_task: {num: 11, max: 805.8µs, min: 144.5µs, avg: 330.9µs, p95: 805.8µs, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 1.19ms, tot_wait: 364µs, rpc_num: 11, rpc_time: 3.61ms, copr_cache_hit_ratio: 0.64, build_task_duration: 666ns, max_distsql_concurrency: 1} | data:TableFullScan_15 | 657.0 KB | N/A | | │ └─TableFullScan_15 | 150000.00 | 150000 | cop[tikv] | table:b | tikv_task:{proc max:17ms, min:0s, avg: 5.36ms, p80:9ms, p95:17ms, iters:190, tasks:11}, scan_detail: {total_process_keys: 3712, total_process_keys_size: 756069, total_keys: 3716, get_snapshot_time: 188µs, rocksdb: {key_skipped_count: 3712, block: {cache_hit_count: 29}}} | keep order:false | N/A | N/A | | └─TableReader_14(Probe) | 150000.00 | 150000 | root | | time:3.97ms, loops:149, cop_task: {num: 11, max: 894.9µs, min: 330.5µs, avg: 568.2µs, p95: 894.9µs, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 1.28ms, tot_wait: 684.8µs, rpc_num: 11, rpc_time: 6.14ms, copr_cache_hit_ratio: 0.64, build_task_duration: 4.29µs, max_distsql_concurrency: 1} | data:TableFullScan_13 | 657.0 KB | N/A | | └─TableFullScan_13 | 150000.00 | 150000 | cop[tikv] | table:a | tikv_task:{proc max:17ms, min:0s, avg: 5.36ms, p80:9ms, p95:17ms, iters:190, tasks:11}, scan_detail: {total_process_keys: 3712, total_process_keys_size: 756069, total_keys: 3716, get_snapshot_time: 392.9µs, rocksdb: {key_skipped_count: 3712, block: {cache_hit_count: 29}}} | keep order:false | N/A | N/A | +-------------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------+---------+ 6 rows in set (13.12 sec)
The algorithm complexity is O(150000.00 * log(150000.00)), which shouldn't take such a long time to run.
O(150000.00 * log(150000.00))
From the profile, it seems most time is spent on probe:
mysql> select tidb_version(); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tidb_version() | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Release Version: v7.3.0 Edition: Community Git Commit Hash: 40b72e7a9a4fc9670d4c5d974dd503a3c6097471 Git Branch: heads/refs/tags/v7.3.0 UTC Build Time: 2023-08-08 09:27:28 GoVersion: go1.20.7 Race Enabled: false Check Table Before Drop: false Store: tikv | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
The text was updated successfully, but these errors were encountered:
This is hardly a bug. I'm removing the bug label. But we are looking into why semi join is unexpectedly slow.
Sorry, something went wrong.
executor: optimize (left outer) (anti) semi join which has no other c…
1d3b53e
…ondition (#47764) close #47424
b06ebab
…ondition (pingcap#47764) close pingcap#47424
Successfully merging a pull request may close this issue.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
From https://asktug.com/t/topic/1014053/4
Prepare TPCH dataset with
sf=1
:Then the 2-way semi-join with
150000
rows takes 13s to run.The algorithm complexity is
O(150000.00 * log(150000.00))
, which shouldn't take such a long time to run.From the profile, it seems most time is spent on probe:
2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: