Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(JP) JOIN操作を含むSQL文の実行中にAssertionFailedとなることがある #13

Closed
taiki-k opened this issue May 13, 2014 · 7 comments
Labels
bug developer confirmed the steps to reproduce the problem, and does not work as expected

Comments

@taiki-k
Copy link
Contributor

taiki-k commented May 13, 2014

JOIN操作を含むSQL文を実行すると、まれにAssertionFailedとなることがあります。
psqlから実行したもので、多重実行は行っていません。

  • 実行したSQL文
SELECT gender,avg(score) FROM member_score JOIN member_gender USING(id) GROUP BY gender;
  • テーブル定義
 Table "public.member_score"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 date   | date    |
 score  | integer |

 Table "public.member_gender"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 gender | integer |
  • バックトレース
#0  0x0000003cdbc32925 in raise () from /lib64/libc.so.6
#1  0x0000003cdbc34105 in abort () from /lib64/libc.so.6
#2  0x0000000000949085 in ExceptionalCondition (
    conditionName=0xac4e28 "!(( ((void) ((bool) ((! assert_enabled) || ! (!((buffer) <= NBuffers && (buffer) >= -NLocBuffer)) || (ExceptionalCondition(\"!((buffer) <= NBuffers && (buffer) >= -NLocBuffer)\", (\"FailedAssertion\"), \"e"...,
    errorType=0xac4d45 "FailedAssertion", fileName=0xac4d38 "execTuples.c", lineNumber=341) at assert.c:54
#3  0x0000000000674c55 in ExecStoreTuple (tuple=0x2cc7f58, slot=0x2cc0048, buffer=182, shouldFree=1 '\001') at execTuples.c:341
#4  0x00007f078bfdab68 in gpuscan_next_tuple (gss=0x2cbfab0, slot=0x2cc0048) at gpuscan.c:1128
#5  0x00007f078bfdb2bc in gpuscan_fetch_tuple (node=0x2cbfab0) at gpuscan.c:1157
#6  0x00007f078bfdb354 in gpuscan_exec (node=0x2cbfab0) at gpuscan.c:1307
#7  0x00000000006807e7 in ExecCustomPlan (cpstate=0x2cbfab0) at nodeCustom.c:37
#8  0x00000000006684b5 in ExecProcNode (node=0x2cbfab0) at execProcnode.c:453
#9  0x0000000000684a9a in ExecHashJoin (node=0x2cbee00) at nodeHashjoin.c:154
#10 0x00000000006684f4 in ExecProcNode (node=0x2cbee00) at execProcnode.c:468
#11 0x000000000067de7f in agg_fill_hash_table (aggstate=0x2cbe500) at nodeAgg.c:1351
#12 0x000000000067d9b1 in ExecAgg (node=0x2cbe500) at nodeAgg.c:1113
#13 0x0000000000668548 in ExecProcNode (node=0x2cbe500) at execProcnode.c:487
#14 0x0000000000665efe in ExecutePlan (estate=0x2cbe3e8, planstate=0x2cbe500, operation=CMD_SELECT, sendTuples=1 '\001',
    numberTuples=0, direction=ForwardScanDirection, dest=0x2caf180) at execMain.c:1475
#15 0x0000000000663da8 in standard_ExecutorRun (queryDesc=0x2ca2748, direction=ForwardScanDirection, count=0) at execMain.c:308
#16 0x0000000000663beb in ExecutorRun (queryDesc=0x2ca2748, direction=ForwardScanDirection, count=0) at execMain.c:256
#17 0x00000000007f964e in PortalRunSelect (portal=0x2c99a58, forward=1 '\001', count=0, dest=0x2caf180) at pquery.c:946
#18 0x00000000007f9275 in PortalRun (portal=0x2c99a58, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2caf180,
    altdest=0x2caf180, completionTag=0x7fff342c4f10 "") at pquery.c:790
#19 0x00000000007f29e2 in exec_simple_query (
    query_string=0x2c556b8 "select gender,avg(score) from member_score join member_gender using(id) group by gender;")
    at postgres.c:1045
#20 0x00000000007f7162 in PostgresMain (argc=1, argv=0x2be95b8, dbname=0x2be9418 "pgstrom_test", username=0x2be93f8 "tkondo")
    at postgres.c:4004
#21 0x000000000076e18a in BackendRun (port=0x2c11230) at postmaster.c:4104
#22 0x000000000076d840 in BackendStartup (port=0x2c11230) at postmaster.c:3778
#23 0x0000000000769bfe in ServerLoop () at postmaster.c:1569
#24 0x0000000000769251 in PostmasterMain (argc=1, argv=0x2be83b0) at postmaster.c:1222
#25 0x00000000006b8aee in main (argc=1, argv=0x2be83b0) at main.c:223
@kaigai
Copy link
Contributor

kaigai commented May 13, 2014

スキャンの終了条件の判定に起因する同じようなエラーを、LWLock => Transaction Lockへの修正中に見つけて修正しています。
最新版に pull して再現確認をお願いできますか?

@taiki-k
Copy link
Contributor Author

taiki-k commented May 13, 2014

最新版で再現確認したところ、再現しました。

今のところ、次の操作で確実に再現しています。

SET enable_seqscan TO off;
SELECT score, count(*) FROM member_score GROUP BY score;
SELECT gender, avg(score) FROM member_score JOIN member_gender USING(id) GROUP BY gender;

@kaigai
Copy link
Contributor

kaigai commented May 13, 2014

了解。当方でも試してみますので、データを生成するSQLを教えていただけますか?
(テーブルの型定義は上にありますが、データ本体の方です)

@taiki-k
Copy link
Contributor Author

taiki-k commented May 13, 2014

データを生成した際のSQL文です。

create table member_score as select (trunc(random()*1000000+1))::int4 as id,
 ('01/01/2014'::date + trunc(random() * 365) * '1 day'::interval)::date as date,
 (trunc(random()*1000))::int4 as score
 from generate_series(1,10000000);
 create table member_gender as select generate_series(1,1000000)::int4 as id, trunc(random() * 2 + 1)::int4 as gender;

@kaigai
Copy link
Contributor

kaigai commented May 13, 2014

再現しません…。

make clean && make install してもらえますか?

PostgreSQLのMakefileは *.h の変更時に関連する *.c を再コンパイルするようになっていないので、もしかすると、一部のファイルが古い構造体定義を見ているかもしれません。

@kaigai
Copy link
Contributor

kaigai commented May 13, 2014

postgres=# explain analyze SELECT gender, avg(score) FROM member_score JOIN member_gender USING(id) GROUP BY gender;
QUERY PLAN



HashAggregate (cost=513017.46..513017.48 rows=2 width=8) (actual time=9758.470..9758.473 rows=2 loops=1)
Group Key: member_gender.gender
-> Hash Join (cost=50932.00..463017.85 rows=9999922 width=8) (actual time=378.813..7863.591 rows=10000000 loops=1
)
Hash Cond: (member_score.id = member_gender.id)
-> Custom (GpuScan) on member_score (cost=10000.00..165054.21 rows=9999922 width=8) (actual time=2.182..174
2.249 rows=10000000 loops=1)
Host References: id, score
Device References:
-> Hash (cost=24525.00..24525.00 rows=1000000 width=8) (actual time=376.003..376.003 rows=1000000 loops=1)
Buckets: 16384 Batches: 16 Memory Usage: 2457kB
-> Custom (GpuScan) on member_gender (cost=10000.00..24525.00 rows=1000000 width=8) (actual time=2.22
1..180.555 rows=1000000 loops=1)
Host References: id, gender
Device References:
Planning time: 0.306 ms
Execution time: 9759.835 ms
(14 rows)

GpuScanを使っているのは間違いないです。

@kaigai
Copy link
Contributor

kaigai commented May 14, 2014

解析の結果、初回スキャンでtcacheにカラムXが乗っており、次回スキャンで
検索条件にXを、読み取りはYをという場合の Hybrid Scan モード時に
heap bufferから読み出したタプルを結果スロットにセットする際のフラグの
Assertion Checkに引っかかってしまったようでした。

ExecStoreTuple 周りの処理を修正して不再現確認。クローズとします。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug developer confirmed the steps to reproduce the problem, and does not work as expected
Projects
None yet
Development

No branches or pull requests

2 participants