Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(JP) shmem_alloc/shmem_freeを並列実行すると、shmem_alloc時にダブルブッキングすることがある #3

Closed
taiki-k opened this issue Apr 16, 2014 · 4 comments
Labels
bug developer confirmed the steps to reproduce the problem, and does not work as expected

Comments

@taiki-k
Copy link
Contributor

taiki-k commented Apr 16, 2014

8KB固定サイズでshmem_allocを約8万回行った後、shmem_freeで1つずつ開放する処理を3並列で行ったところ、すでに割り当てられ、かつ未開放の領域が割り当たることがある。(ダブルブッキングする)
この後、shmem_freeが行われた時点でAssertionFailedによるSIGABRTが発生する。

テスト方法は、以下のshmem_test.cを使用し、pgbenchでshmem_test.cの関数を呼び出すSQL文(shmem_test.sql)を実行3並列(クライアント数3、スレッド数3、トランザクション数2)した。

  • スタックトレース
#0  0x0000003cdbc32925 in raise () from /lib64/libc.so.6
#1  0x0000003cdbc34105 in abort () from /lib64/libc.so.6
#2  0x0000000000949085 in ExceptionalCondition (conditionName=0x7f8f7b03f2b8 "!(block->active.nullmark == ((void *)0))",
    errorType=0x7f8f7b03f174 "FailedAssertion", fileName=0x7f8f7b03f19d "shmem.c", lineNumber=228) at assert.c:54
#3  0x00007f8f7b02bd16 in pgstrom_shmem_zone_block_free (zone=0x7f8efa96c000, address=0x7f8f2f11c000) at shmem.c:228
#4  0x00007f8f7b02c1d3 in pgstrom_shmem_free (address=0x7f8f2f11c000) at shmem.c:343
#5  0x00007f8ef170bafc in pgstrom_shmem_test_8Kfree (fcinfo=0x2c23710) at shmem_test.c:78
#6  0x000000000066c190 in ExecMakeFunctionResultNoSets (fcache=0x2c236a0, econtext=0x2c23478,
    isNull=0x2c24050 "\177~\177\177\177\177\177\177xJ\301\002", isDone=0x2c24190) at execQual.c:2001
#7  0x000000000066cbbe in ExecEvalFunc (fcache=0x2c236a0, econtext=0x2c23478,
    isNull=0x2c24050 "\177~\177\177\177\177\177\177xJ\301\002", isDone=0x2c24190) at execQual.c:2385
#8  0x00000000006737dd in ExecTargetList (targetlist=0x2c24158, econtext=0x2c23478, values=0x2c24030,
    isnull=0x2c24050 "\177~\177\177\177\177\177\177xJ\301\002", itemIsDone=0x2c24190, isDone=0x7fff35a28600) at execQual.c:5267
#9  0x0000000000673dfa in ExecProject (projInfo=0x2c24070, isDone=0x7fff35a28600) at execQual.c:5482
#10 0x0000000000691ec9 in ExecResult (node=0x2c23360) at nodeResult.c:155
#11 0x0000000000668365 in ExecProcNode (node=0x2c23360) at execProcnode.c:380
#12 0x0000000000665efe in ExecutePlan (estate=0x2c23248, planstate=0x2c23360, operation=CMD_SELECT, sendTuples=1 '\001',
    numberTuples=0, direction=ForwardScanDirection, dest=0x2be8448) at execMain.c:1475
#13 0x0000000000663da8 in standard_ExecutorRun (queryDesc=0x2c1fda8, direction=ForwardScanDirection, count=0) at execMain.c:308
#14 0x0000000000663beb in ExecutorRun (queryDesc=0x2c1fda8, direction=ForwardScanDirection, count=0) at execMain.c:256
#15 0x00000000007f964e in PortalRunSelect (portal=0x2b9bbd8, forward=1 '\001', count=0, dest=0x2be8448) at pquery.c:946
#16 0x00000000007f9275 in PortalRun (portal=0x2b9bbd8, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2be8448,
    altdest=0x2be8448, completionTag=0x7fff35a289e0 "") at pquery.c:790
#17 0x00000000007f29e2 in exec_simple_query (query_string=0x2be6b08 "select pgstrom_shmem_test_8Kfree();") at postgres.c:1045
#18 0x00000000007f7162 in PostgresMain (argc=1, argv=0x2b78f30, dbname=0x2b78e28 "pgstrom_test", username=0x2b78e08 "tkondo")
    at postgres.c:4004
#19 0x000000000076e18a in BackendRun (port=0x2ba07d0) at postmaster.c:4104
#20 0x000000000076d840 in BackendStartup (port=0x2ba07d0) at postmaster.c:3778
#21 0x0000000000769bfe in ServerLoop () at postmaster.c:1569
#22 0x0000000000769251 in PostmasterMain (argc=1, argv=0x2b77dc0) at postmaster.c:1222
#23 0x00000000006b8aee in main (argc=1, argv=0x2b77dc0) at main.c:223
  • 用いたテスト用コード(shmem_test.c)
#include "postgres.h"
#include "fmgr.h"
#include <string.h>

#include "pg_strom.h"

PG_MODULE_MAGIC;

#define ALLOC_MAX_COUNT 262144 /* 256 * 1024 */
void *pgstrom_test8K_addrs[ALLOC_MAX_COUNT];
bool pgstrom_test8K_alloc_done = false;

PG_FUNCTION_INFO_V1(pgstrom_shmem_test_8K);
Datum
pgstrom_shmem_test_8K(PG_FUNCTION_ARGS)
{
  int64 i = 0;
  int64 count = PG_GETARG_INT64(0);

  /* Already allocated */
  if (pgstrom_test8K_alloc_done)
  {
    PG_RETURN_BOOL(false);
  }

  /* Out of range */
  if (count > ALLOC_MAX_COUNT)
  {
    PG_RETURN_BOOL(false);
  }

  /* Clear all addresses */
  for (i = 0; i < ALLOC_MAX_COUNT; i++)
  {
    pgstrom_test8K_addrs[i] = NULL;
  }

  for (i = 0; i < count; i++)
  {
    pgstrom_test8K_addrs[i] = pgstrom_shmem_alloc((Size)(8192 - 4));
    elog(INFO, "Alloc No.%d (addr=%p)", i, pgstrom_test8K_addrs[i]);
    if (pgstrom_test8K_addrs[i] == NULL)
      break;
  }

  elog(LOG, "Allocated %d partition(s).", i);
  pgstrom_test8K_alloc_done = true;

  PG_RETURN_BOOL(true);
}

PG_FUNCTION_INFO_V1(pgstrom_shmem_test_8Kfree);
Datum
pgstrom_shmem_test_8Kfree(PG_FUNCTION_ARGS)
{
  int64 i = 0LL;

  /* Not allocated yet */
  if (!pgstrom_test8K_alloc_done)
  {
    PG_RETURN_BOOL(false);
  }

  /* Free now! */
  do {
    elog(INFO, "Free No.%d (addr=%p)", i, pgstrom_test8K_addrs[i]);
    pgstrom_shmem_free(pgstrom_test8K_addrs[i]);
    pgstrom_test8K_addrs[i++] = NULL;
  } while ((i < ALLOC_MAX_COUNT) && (pgstrom_test8K_addrs[i] != NULL));

  elog(LOG, "Freed %d partition(s).", i);
  pgstrom_test8K_alloc_done = false;

  PG_RETURN_BOOL(true);
}
  • 用いたテストコード(shmem_test.sql)
set log_min_duration_statement=0;
set log_min_messages=info;
set client_min_messages=error;
select pg_backend_pid();
select pgstrom_shmem_test_8K(81947);
select pgstrom_shmem_test_8Kfree();
  • PostgreSQLのログ (log_line_prefix='%a, %s, %t, %p, %c ') ※問題のアドレスでgrepしたもの
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:22 JST, 32656, 534e3b28.7f90  INFO:  Alloc No.73129 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:24 JST, 32656, 534e3b28.7f90  INFO:  Free No.73129 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:24 JST, 32656, 534e3b28.7f90  INFO:  Alloc No.193 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:25 JST, 32655, 534e3b28.7f8f  INFO:  Alloc No.22065 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:25 JST, 32655, 534e3b28.7f8f  INFO:  Alloc No.54966 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:26 JST, 32656, 534e3b28.7f90  INFO:  Free No.193 (addr=0x7f8f2f11c000)
pgbench, 2014-04-16 17:11:20 JST, 2014-04-16 17:11:27 JST, 32655, 534e3b28.7f8f  INFO:  Free No.22065 (addr=0x7f8f2f11c000)
@taiki-k taiki-k added the bug label Apr 16, 2014
@kaigai
Copy link
Contributor

kaigai commented Apr 16, 2014

pgstrom_shmem_zone_block_free()が未使用ブロック同士を併合するロジックの中で、
例えば 0 - 7 番目のブロックが
0-1 Free
2-3 Active
4-7 Active -> Free
状態の時に、4-7ブロックとペアになるブロック列0-3をチェックし、その先頭ブロックの状態
だけを確認して 0-7 をFree状態にするロジックを発見。

現在修正中。

@kaigai
Copy link
Contributor

kaigai commented Apr 16, 2014

250c8db に修正を入れました。

上記バグが原因だと並列実行は関係ありませんが、再現をトライしてみてください。

テストに使ったクエリは以下の通りです。

select count(*) from (select pgstrom_shmem_free(p) from (select pgstrom_shmem_alloc((random() * 1000000)::int) p from generate_series(1,100000) order by random()) as foo where p > 0) as var;

@taiki-k
Copy link
Contributor Author

taiki-k commented Apr 17, 2014

250c8db で再現するかを確認しました。

10並列まで試して再現しませんでしたので、修正できたと判断します。

@taiki-k taiki-k closed this as completed Apr 17, 2014
@kaigai
Copy link
Contributor

kaigai commented Apr 17, 2014

ありがとうございます。

後で発現すると厄介なバグになりかねない場所なので、早期発見できてよかったです。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug developer confirmed the steps to reproduce the problem, and does not work as expected
Projects
None yet
Development

No branches or pull requests

2 participants