Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal state corruption #624

Closed
neoxic opened this issue Oct 1, 2020 · 6 comments
Closed

Internal state corruption #624

neoxic opened this issue Oct 1, 2020 · 6 comments

Comments

@neoxic
Copy link

neoxic commented Oct 1, 2020

Hello,

This is our second (and hopefully final) attempt to report an issue with LuaJIT 2.1 that leads to all sorts of buggy behaviour. The first attempt #615 was regretfully rejected without due attention. We've since further reduced the test case down to about 4.5K lines of code and deliberately got rid of the techniques in question to illustrate that the issue has nothing to do with them. In particular, the code:

  • does not use the package system, i.e. require(), package.*, etc.;
  • does not use the dofile(), loadfile(), loadstring(), load() functions;
  • uses debug.getfenv()/debug.setfenv() to get/change a coroutine's environment;
  • uses debug.traceback() to get stack traces.

We seem to be unable to make it any better as any major movement or even renaming functions/fields hides the bug away. The code itself obviously makes no sense and serves solely as a way to reliably reproduce the issue without much effort.

Link to the test case --> test.zip

By simply running luajit-2.1.0-beta3 test.lua, one can observe all kinds of beautiful things like:

LuaJIT ASSERT lj_obj.h:883: checklivetv: mismatch of TValue type 8 vs GC type 0
LuaJIT ASSERT lj_state.c:179: close_state: memory leak of -8192 bytes
LuaJIT ASSERT lj_gc.c:197: gc_traverse_tab: TValue and GC type mismatch
LuaJIT ASSERT lj_debug.c:105: debug_framepc: return bytecode expected
LuaJIT ASSERT lj_str.c:332: lj_str_new: (o)->gch.gct == ~LJ_TSTR
LuaJIT ASSERT lj_debug.c:307: lj_debug_funcname: pc < pt->sizebc

test.lua:2061: attempt to concatenate field 'r' (a nil value)
test.lua:2061: attempt to perform arithmetic on a nil value
test.lua:2061: attempt to compare table with number
test.lua:2061: attempt to index a number value
test.lua:2103: 'for' initial value must be a number
test.lua:2103: 'for' step must be a number

Segmentation fault
Trace/breakpoint trap
Bus error

Some familiar/new crash traces to get you started:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000476ced in lj_BC_USETN () at buildvm_x86.dasc:520
520	buildvm_x86.dasc: No such file or directory.
(gdb) bt
#0  0x0000000000476ced in lj_BC_USETN () at buildvm_x86.dasc:520
#1  0x0000000000478b6d in lj_ff_coroutine_resume () at buildvm_x86.dasc:1737
#2  0x000000000046d884 in lua_pcall (L=L@entry=0x7ffff7fcc380, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1169
#3  0x0000000000404287 in docall (L=L@entry=0x7ffff7fcc380, narg=narg@entry=0, clear=clear@entry=0) at luajit.c:121
#4  0x0000000000404a91 in handle_script (L=L@entry=0x7ffff7fcc380, argx=argx@entry=0x7fffffffe3c0) at luajit.c:292
#5  0x0000000000405186 in pmain (L=0x7ffff7fcc380) at luajit.c:553
#6  0x0000000000477d1d in lj_BC_FUNCC () at buildvm_x86.dasc:849
#7  0x000000000046dc20 in lua_cpcall (L=L@entry=0x7ffff7fcc380, func=func@entry=0x405063 <pmain>, ud=ud@entry=0x0) at lj_api.c:1197
#8  0x0000000000405243 in main (argc=2, argv=0x7fffffffe3b8) at luajit.c:582
(gdb) r
The program being debugged has been started already.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000033 in ?? ()
(gdb) bt
#0  0x0000000000000033 in ?? ()
#1  0x00000000004799a3 in lj_fff_fallback () at buildvm_x86.dasc:2247
#2  0x0000000000478b6d in lj_ff_coroutine_resume () at buildvm_x86.dasc:1737
#3  0x000000000046d884 in lua_pcall (L=L@entry=0x7ffff7fcc380, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1169
#4  0x0000000000404287 in docall (L=L@entry=0x7ffff7fcc380, narg=narg@entry=0, clear=clear@entry=0) at luajit.c:121
#5  0x0000000000404a91 in handle_script (L=L@entry=0x7ffff7fcc380, argx=argx@entry=0x7fffffffe3c0) at luajit.c:292
#6  0x0000000000405186 in pmain (L=0x7ffff7fcc380) at luajit.c:553
#7  0x0000000000477d1d in lj_BC_FUNCC () at buildvm_x86.dasc:849
#8  0x000000000046dc20 in lua_cpcall (L=L@entry=0x7ffff7fcc380, func=func@entry=0x405063 <pmain>, ud=ud@entry=0x0) at lj_api.c:1197
#9  0x0000000000405243 in main (argc=2, argv=0x7fffffffe3b8) at luajit.c:582

Program received signal SIGSEGV, Segmentation fault.
0x0000007a00000079 in ?? ()
(gdb) bt
#0  0x0000007a00000079 in ?? ()
#1  0x0000000000000007 in ?? ()
#2  0x0000000000000000 in ?? ()
@neoxic
Copy link
Author

neoxic commented Oct 1, 2020

FYI latest v2.1 HEAD e9af1ab built as:

make amalg XCFLAGS="-DLUA_USE_APICHECK -DLUA_USE_ASSERT -Og -g"

@corsix
Copy link

corsix commented Oct 12, 2020

I can get the test case down to 3.2K lines at present, but it still classifies as huge.

My leading hypothesis right now is that PC ends up wrong when coming out of a snapshot, which causes random data to be interpreted as bytecode instructions and executed. Depending on what that random data is, different things can happen.

In support of that hypothesis, the following workaround seems to mask the problem for me:

diff --git a/src/lj_parse.c b/src/lj_parse.c
index 3ae05446e3..97ce5adf70 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -1331,6 +1331,7 @@ static void fs_fixup_bc(FuncState *fs, GCproto *pt, BCIns *bc, MSize n)
                   fs->framesize, 0);
   for (i = 1; i < n; i++)
     bc[i] = base[i].ins;
+  bc[n] = BCINS_AD(BC_RET0, 0, 1);
 }
 
 /* Fixup upvalues for child prototype, step #2. */
@@ -1576,7 +1577,7 @@ static GCproto *fs_finish(LexState *ls, BCLine line)
   fs_fixup_ret(fs);
 
   /* Calculate total size of prototype including all colocated arrays. */
-  sizept = sizeof(GCproto) + fs->pc*sizeof(BCIns) + fs->nkgc*sizeof(GCRef);
+  sizept = sizeof(GCproto) + (fs->pc+1)*sizeof(BCIns) + fs->nkgc*sizeof(GCRef);
   sizept = (sizept + sizeof(TValue)-1) & ~(sizeof(TValue)-1);
   ofsk = sizept; sizept += fs->nkn*sizeof(TValue);
   ofsuv = sizept; sizept += ((fs->nuv+1)&~1)*2;

Alternatively, the following catches the problem with an assertion failure very early, before it gets a chance to corrupt further state:

diff --git a/src/lj_meta.c b/src/lj_meta.c
index f6e6d46a1d..5b3ea666ef 100644
--- a/src/lj_meta.c
+++ b/src/lj_meta.c
@@ -426,6 +426,7 @@ void lj_meta_istype(lua_State *L, BCReg ra, BCReg tp)
 {
   L->top = curr_topL(L);
   ra++; tp--;
+  lj_assertL(tp <= 13, "bad type %d", (int)tp);
   lj_assertL(LJ_DUALNUM || tp != ~LJ_TNUMX, "bad type for ISTYPE");
   if (LJ_DUALNUM && tp == ~LJ_TNUMX) lj_lib_checkint(L, ra);
   else if (tp == ~LJ_TNUMX+1) lj_lib_checknum(L, ra);
diff --git a/src/lj_parse.c b/src/lj_parse.c
index 3ae05446e3..39beb656db 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -1331,6 +1331,7 @@ static void fs_fixup_bc(FuncState *fs, GCproto *pt, BCIns *bc, MSize n)
                   fs->framesize, 0);
   for (i = 1; i < n; i++)
     bc[i] = base[i].ins;
+  bc[n] = BCINS_AD(BC_ISTYPE, 0, 123);
 }
 
 /* Fixup upvalues for child prototype, step #2. */
@@ -1576,7 +1577,7 @@ static GCproto *fs_finish(LexState *ls, BCLine line)
   fs_fixup_ret(fs);
 
   /* Calculate total size of prototype including all colocated arrays. */
-  sizept = sizeof(GCproto) + fs->pc*sizeof(BCIns) + fs->nkgc*sizeof(GCRef);
+  sizept = sizeof(GCproto) + (fs->pc+1)*sizeof(BCIns) + fs->nkgc*sizeof(GCRef);
   sizept = (sizept + sizeof(TValue)-1) & ~(sizeof(TValue)-1);
   ofsk = sizept; sizept += fs->nkn*sizeof(TValue);
   ofsuv = sizept; sizept += ((fs->nuv+1)&~1)*2;

@MikePall
Copy link
Member

MikePall commented Oct 12, 2020

Well, I haven't gotten very far with this one.

What does help in reproducing it, is to turn off ASLR system-wide, turn off all VM randomizations in lj_arch.h and then enforce predictable random number generation with luajit -e "local rseed = math.randomseed; math.randomseed = function(x) rseed(3) end" test.lua ... try different numbers instead of the 3 until you get a moderately quick assert. It varies a lot between different builds.

Also tried with Valgrind, but that only exposed a false positive (IR_NOP added by lj_asm_trace needs to initialize the other IRIns fields, too) and couldn't reproduce under Valgrind.

@MikePall MikePall added the bug label Oct 12, 2020
@corsix
Copy link

corsix commented Oct 12, 2020

Further to my hypothesis, I added an extra assert to catch things a few steps earlier:

diff --git a/src/lj_snap.c b/src/lj_snap.c
index f1358cf29b..4ec535f8f1 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -121,6 +121,10 @@ static MSize snapshot_framelinks(jit_State *J, SnapEntry *map, uint8_t *topslot)
   MSize f = 0;
   map[f++] = SNAP_MKPC(J->pc);  /* The current PC is always the first entry. */
 #endif
+  if (J->pt) {
+    lj_assertJ(J->pc >= proto_bc(J->pt) && J->pc < proto_bc(J->pt) + J->pt->sizebc,
+               "PC out of range");
+  }
   while (frame > lim) {  /* Backwards traversal of all frames above base. */
     if (frame_islua(frame)) {
 #if !LJ_FR2

This assert is subsequently tripped as so:

---- TRACE 22 start 21/stitch test.lua:2049
0105  ITERL    9 => 0089
0106  TGETS    6   2  18  ; "g__transform"
0107  ISF          6
0108  JMP      7 => 0113
0113  TGETS    6   2   5  ; "serialize"
0114  ISF          6
0115  JMP      7 => 0119
0116  TGETS    6   2   0  ; "r"
0117  KPRI     7   0
0118  TSETV    7   6   1
0119  JLOOP    6  18
LuaJIT ASSERT lj_snap.c:126: snapshot_framelinks: PC out of range

The trace 18 referenced by the JLOOP comes from a down-recursion:

---- TRACE 18 start 12/0 test.lua:659
0001  GGET     1   0      ; "type"
0002  MOV      3   0
0003  CALL     1   2   2
0000  . FUNCC               ; type
0004  ISNES    1   1      ; "table"
0005  JMP      1 => 0011
0006  GGET     1   2      ; "rawget"
0007  MOV      3   0
0008  KSTR     4   3      ; "__array"
0009  CALL     1   2   3
0000  . FUNCC               ; rawget
0010  JMP      2 => 0014
0014  RET1     1   2
0005  CALLM    1   1   0
0000  . FUNCV    6          ; test.lua:1219
0001  . . IST          0
0002  . . JMP      1 => 0035
0035  . . MOV      1   0
0036  . . VARG     2   0   1
0037  . . RETM     1   1
0006  GGET     1   2      ; "GLOBAL"
0007  TGETS    1   1   3  ; "ipairs"
0008  MOV      3   0
0009  CALLT    1   2
0000  FUNCC               ; ipairs
0088  JMP      9 => 0104
0104  ITERC    9   3   3
0000  . FUNCC               ; ipairs_aux
0105  ITERL    9 => 0089
0089  GGET    11  14      ; "array"
0090  TGETS   11  11  15  ; "insert"
0091  TGETS   13   2   1  ; "p"
0092  MOV     14   9
0093  CALL    11   1   3
0000  . FUNCV    6          ; test.lua:636
0001  . . GGET     1   0      ; "assert"
0002  . . GGET     3   1      ; "isArray"
0003  . . MOV      5   0
0004  . . CALL     3   0   2
0000  . . . JFUNCF   5  12         ; test.lua:658
0001  . . . GGET     1   0      ; "type"
0002  . . . MOV      3   0
0003  . . . CALL     1   2   2
0000  . . . . FUNCC               ; type
0004  . . . ISNES    1   1      ; "table"
0005  . . . JMP      1 => 0011
0006  . . . GGET     1   2      ; "rawget"
0007  . . . MOV      3   0
0008  . . . KSTR     4   3      ; "__array"
0009  . . . CALL     1   2   3
0000  . . . . FUNCC               ; rawget
0010  . . . JMP      2 => 0014
0014  . . . RET1     1   2
0005  . . CALLM    1   1   0
0000  . . . FUNCV    6          ; test.lua:1219
0001  . . . . IST          0
0002  . . . . JMP      1 => 0035
0035  . . . . MOV      1   0
0036  . . . . VARG     2   0   1
0037  . . . . RETM     1   1
0006  . . GGET     1   2      ; "GLOBAL"
0007  . . TGETS    1   1   3  ; "table_insert"
0008  . . MOV      3   0
0009  . . VARG     4   0   1
0010  . . CALLMT   1   1
0000  . FUNCC               ; table.insert
0094  MOV     13   0
0095  TGETS   11   0  16  ; "copyValue"
0096  MOV     14  10
0097  MOV     15   2
0098  CALL    11   2   4
0000  . FUNCF   16          ; test.lua:2021
0001  . IST          2
0002  . JMP      3 => 0004
0004  . TGETS    3   2   0  ; "r"
0005  . IST          3
0006  . JMP      4 => 0008
0008  . TSETS    3   2   0  ; "r"
0009  . TGETS    3   2   1  ; "p"
0010  . IST          3
0011  . JMP      4 => 0015
0015  . TSETS    3   2   1  ; "p"
0016  . GGET     3   3      ; "type"
0017  . MOV      5   1
0018  . CALL     3   2   2
0000  . . FUNCC               ; type
0019  . ISEQS    3   4      ; "table"
0020  . JMP      3 => 0022
0021  . RET1     1   2
0099  TSETV   11   3   9
0100  GGET    11  14      ; "array"
0101  TGETS   11  11  17  ; "remove"
0102  TGETS   13   2   1  ; "p"
0103  CALL    11   1   2
0000  . FUNCF    7          ; test.lua:640
0001  . GGET     2   0      ; "assert"
0002  . GGET     4   1      ; "isArray"
0003  . MOV      6   0
0004  . CALL     4   0   2
0000  . . JFUNCF   5  12         ; test.lua:658
0001  . . GGET     1   0      ; "type"
0002  . . MOV      3   0
0003  . . CALL     1   2   2
0000  . . . FUNCC               ; type
0004  . . ISNES    1   1      ; "table"
0005  . . JMP      1 => 0011
0006  . . GGET     1   2      ; "rawget"
0007  . . MOV      3   0
0008  . . KSTR     4   3      ; "__array"
0009  . . CALL     1   2   3
0000  . . . FUNCC               ; rawget
0010  . . JMP      2 => 0014
0014  . . RET1     1   2
0005  . CALLM    2   1   0
0000  . . FUNCV    6          ; test.lua:1219
0001  . . . IST          0
0002  . . . JMP      1 => 0035
0035  . . . MOV      1   0
0036  . . . VARG     2   0   1
0037  . . . RETM     1   1
0006  . GGET     2   2      ; "GLOBAL"
0007  . TGETS    2   2   3  ; "table_remove"
0008  . MOV      4   0
0009  . MOV      5   1
0010  . CALLT    2   3
0000  . FUNCF   10          ; builtin:remove
0001  . ISTYPE   0  12
0002  . LEN      2   0
0003  . ISNEP    1   0
0004  . JMP      3 => 0012
0005  . ISEQN    2   0      ; 0
0006  . JMP      3 => 0030
0007  . TGETR    3   0   2
0008  . KPRI     4   0
0009  . TSETR    4   0   2
0010  . RET1     3   2
0104  ITERC    9   3   3
0000  . FUNCC               ; ipairs_aux
0105  ITERL    9 => 0089
0106  TGETS    6   2  18  ; "g__transform"
0107  ISF          6
0108  JMP      7 => 0113
0113  TGETS    6   2   5  ; "serialize"
0114  ISF          6
0115  JMP      7 => 0119
0116  TGETS    6   2   0  ; "r"
0117  KPRI     7   0
0118  TSETV    7   6   1
0119  RET1     3   2
---- TRACE 18 abort test.lua:2060 -- down-recursion, restarting

---- TRACE 18 start test.lua:2060
0119  RET1     3   2
0099  TSETV   11   3   9
0100  GGET    11  14      ; "array"
0101  TGETS   11  11  17  ; "remove"
0102  TGETS   13   2   1  ; "p"
0103  CALL    11   1   2
0000  . FUNCF    7          ; test.lua:640
0001  . GGET     2   0      ; "assert"
0002  . GGET     4   1      ; "isArray"
0003  . MOV      6   0
0004  . CALL     4   0   2
0000  . . JFUNCF   5  12         ; test.lua:658
0001  . . GGET     1   0      ; "type"
0002  . . MOV      3   0
0003  . . CALL     1   2   2
0000  . . . FUNCC               ; type
0004  . . ISNES    1   1      ; "table"
0005  . . JMP      1 => 0011
0006  . . GGET     1   2      ; "rawget"
0007  . . MOV      3   0
0008  . . KSTR     4   3      ; "__array"
0009  . . CALL     1   2   3
0000  . . . FUNCC               ; rawget
0010  . . JMP      2 => 0014
0014  . . RET1     1   2
0005  . CALLM    2   1   0
0000  . . FUNCV    6          ; test.lua:1219
0001  . . . IST          0
0002  . . . JMP      1 => 0035
0035  . . . MOV      1   0
0036  . . . VARG     2   0   1
0037  . . . RETM     1   1
0006  . GGET     2   2      ; "GLOBAL"
0007  . TGETS    2   2   3  ; "table_remove"
0008  . MOV      4   0
0009  . MOV      5   1
0010  . CALLT    2   3
0000  . FUNCF   10          ; builtin:remove
0001  . ISTYPE   0  12
0002  . LEN      2   0
0003  . ISNEP    1   0
0004  . JMP      3 => 0012
0005  . ISEQN    2   0      ; 0
0006  . JMP      3 => 0030
0007  . TGETR    3   0   2
0008  . KPRI     4   0
0009  . TSETR    4   0   2
0010  . RET1     3   2
0104  ITERC    9   3   3
0000  . FUNCC               ; next
---- TRACE 18 IR
....              SNAP   #0   [ ---- ---- ---- ---- ---- ---- ]
0001 r13   >  tab SLOAD  #5    T
....              SNAP   #1   [ ---- ---- ---- ---- ---- ---- ]
0002       >  p64 RETF   proto: 0x07e98278  [0x07e9846c]
....              SNAP   #2   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0001 ]
0003 rbx   >  tab SLOAD  #5    T
0004 r12   >  str SLOAD  #11   T
0005          int FLOAD  0003  tab.hmask
0006       >  int EQ     0005  +0  
0007          tab FLOAD  0003  tab.meta
0008       >  tab EQ     0007  NULL
0009 rax      p64 NEWREF 0003  0004
0010          tab HSTORE 0009  0001
0011          nil TBAR   0003
0012          p64 FREF   0003  tab.nomm
0013          u8  FSTORE 0012  +0  
....              SNAP   #3   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ]
0014 rbx      fun SLOAD  #0    R
0015 rbx      tab FLOAD  0014  func.env
0016          int FLOAD  0015  tab.hmask
0017       >  int EQ     0016  +63 
0018 rbx      p64 FLOAD  0015  tab.node
0019       >  p64 HREFK  0018  "array" @13
0020 rbx   >  tab HLOAD  0019
0021          int FLOAD  0020  tab.hmask
0022       >  int EQ     0021  +15 
0023 rbx      p64 FLOAD  0020  tab.node
0024       >  p64 HREFK  0023  "remove" @2
0025 rbx   >  fun HLOAD  0024
0026 r13   >  tab SLOAD  #4    T
0027          int FLOAD  0026  tab.hmask
0028       >  int EQ     0027  +3  
0029 r13      p64 FLOAD  0026  tab.node
0030       >  p64 HREFK  0029  "p"  @1
0031 r13   >  tab HLOAD  0030
0032       >  fun EQ     0025  test.lua:640
0033 rbp      tab FLOAD  test.lua:640  func.env
0034          int FLOAD  0033  tab.hmask
0035       >  int EQ     0034  +63 
0036 rbp      p64 FLOAD  0033  tab.node
0037       >  p64 HREFK  0036  "assert" @29
0038 rbx   >  fun HLOAD  0037
0039       >  p64 HREFK  0036  "isArray" @22
0040 r12   >  fun HLOAD  0039
0041       >  fun EQ     0040  test.lua:658
0042 r15      tab FLOAD  test.lua:658  func.env
0043          int FLOAD  0042  tab.hmask
0044       >  int EQ     0043  +63 
0045 r15      p64 FLOAD  0042  tab.node
0046       >  p64 HREFK  0045  "type" @40
0047 r12   >  fun HLOAD  0046
0048       >  fun EQ     0047  type
0049       >  p64 HREFK  0045  "rawget" @48
0050 r15   >  fun HLOAD  0049
0051       >  fun EQ     0050  rawget
0052          int FLOAD  0031  tab.hmask
0053       >  int EQ     0052  +1  
0054 r15      p64 FLOAD  0031  tab.node
0055       >  p64 HREFK  0054  "__array" @0
0056       >  tru HLOAD  0055
0057       >  fun EQ     0038  test.lua:1219
0058       >  p64 HREFK  0036  "GLOBAL" @24
0059 rbp   >  tab HLOAD  0058
0060          int FLOAD  0059  tab.hmask
0061       >  int EQ     0060  +15 
0062 rbp      p64 FLOAD  0059  tab.node
0063       >  p64 HREFK  0062  "table_remove" @12
0064 rbp   >  fun HLOAD  0063
0065       >  fun EQ     0064  builtin:remove
0066 rbx      int ALEN   0031  nil 
....              SNAP   #4   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- builtin:remove ftsz|---- ---- ---- ]
0067       >  int NE     0066  +0  
....              SNAP   #5   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- builtin:remove ftsz|0031 ---- 0066 ]
0068          int FLOAD  0031  tab.asize
0069       >  int ABC    0068  0066
0070 rbp      p64 FLOAD  0031  tab.array
0071          p64 AREF   0070  0066
0072 r15   >  str ALOAD  0071
0073          nil ASTORE 0071  nil 
....              SNAP   #6   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- builtin:remove ftsz|---- ---- ---- 0072 ---- ]
0074 r13   >  fun SLOAD  #8    T
0075 rbp   >  tab SLOAD  #9    T
0076 rbx   >  str SLOAD  #10   T
0077       >  fun EQ     0074  next
....              SNAP   #7   [ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- trace: 0x07f55ef8 +0x1.07e30335p-1042 contpc next ftsz|0075 0076 ]
---- TRACE 18 mcode 1187
13c25d90c  mov dword [r14-0xec0], 0x12
13c25d917  mov r15d, 0x07e96fd8
13c25d91d  mov rdi, [r14-0xe08]
13c25d924  mov ebp, 0x07e96890
13c25d929  mov r13, [rdx+0x18]
13c25d92d  ror r13, 0x2f
13c25d931  cmp r13w, -0x0c
13c25d936  jnz 0x13c250010  ->0
13c25d93c  shr r13, 0x11
13c25d940  mov esi, 0x07e9846c
13c25d945  cmp rsi, [rdx-0x8]
13c25d949  jnz 0x13c250014  ->1
13c25d94f  add rdx, -0x68
13c25d953  mov [r14-0xe00], rdx
13c25d95a  mov rbx, [rdx+0x18]
13c25d95e  ror rbx, 0x2f
13c25d962  cmp bx, -0x0c
13c25d966  jnz 0x13c250018  ->2
13c25d96c  shr rbx, 0x11
13c25d970  mov r12, [rdx+0x48]
13c25d974  ror r12, 0x2f
13c25d978  cmp r12w, -0x05
13c25d97d  jnz 0x13c250018  ->2
13c25d983  shr r12, 0x11
13c25d987  cmp dword [rbx+0x34], +0x00
13c25d98b  jnz 0x13c250018  ->2
13c25d991  cmp qword [rbx+0x20], +0x00
13c25d996  jnz 0x13c250018  ->2
13c25d99c  mov edx, 0x07e4e4c8
13c25d9a1  mov [rdx], r12
13c25d9a4  or dword [rdx+0x4], 0xfffd8000
13c25d9ab  mov rsi, rbx
13c25d9ae  call 0x107e3ea00 ->lj_tab_newkey
13c25d9b3  mov rdx, [r14-0xe00]
13c25d9ba  mov [rax], r13
13c25d9bd  or dword [rax+0x4], 0xfffa0000
13c25d9c4  test byte [rbx+0x8], 0x4
13c25d9c8  jz 0x13c25d9e0
13c25d9ca  and byte [rbx+0x8], 0xfb
13c25d9ce  mov rdi, [r14-0xf38]
13c25d9d5  mov [r14-0xf38], rbx
13c25d9dc  mov [rbx+0x18], rdi
13c25d9e0  mov byte [rbx+0xa], 0x0
13c25d9e4  mov rbx, [rdx-0x10]
13c25d9e8  shl rbx, 0x11
13c25d9ec  shr rbx, 0x11
13c25d9f0  mov rbx, [rbx+0x10]
13c25d9f4  cmp dword [rbx+0x34], +0x3f
13c25d9f8  jnz 0x13c25001c  ->3
13c25d9fe  mov rbx, [rbx+0x28]
13c25da02  mov rdi, 0xfffd800007e50688
13c25da0c  cmp rdi, [rbx+0x140]
13c25da13  jnz 0x13c25001c  ->3
13c25da19  mov rbx, [rbx+0x138]
13c25da20  ror rbx, 0x2f
13c25da24  cmp bx, -0x0c
13c25da28  jnz 0x13c25001c  ->3
13c25da2e  shr rbx, 0x11
13c25da32  cmp dword [rbx+0x34], +0x0f
13c25da36  jnz 0x13c25001c  ->3
13c25da3c  mov rbx, [rbx+0x28]
13c25da40  mov rdi, 0xfffd800007e53758
13c25da4a  cmp rdi, [rbx+0x38]
13c25da4e  jnz 0x13c25001c  ->3
13c25da54  mov rbx, [rbx+0x30]
13c25da58  ror rbx, 0x2f
13c25da5c  cmp bx, -0x09
13c25da60  jnz 0x13c25001c  ->3
13c25da66  shr rbx, 0x11
13c25da6a  mov r13, [rdx+0x10]
13c25da6e  ror r13, 0x2f
13c25da72  cmp r13w, -0x0c
13c25da77  jnz 0x13c25001c  ->3
13c25da7d  shr r13, 0x11
13c25da81  cmp dword [r13+0x34], +0x03
13c25da86  jnz 0x13c25001c  ->3
13c25da8c  mov r13, [r13+0x28]
13c25da90  mov rdi, 0xfffd800007e971a8
13c25da9a  cmp rdi, [r13+0x20]
13c25da9e  jnz 0x13c25001c  ->3
13c25daa4  mov r13, [r13+0x18]
13c25daa8  ror r13, 0x2f
13c25daac  cmp r13w, -0x0c
13c25dab1  jnz 0x13c25001c  ->3
13c25dab7  shr r13, 0x11
13c25dabb  cmp rbx, 0x07e96890
13c25dac2  jnz 0x13c25001c  ->3
13c25dac8  mov rbp, [rbp+0x10]
13c25dacc  cmp dword [rbp+0x34], +0x3f
13c25dad0  jnz 0x13c25001c  ->3
13c25dad6  mov rbp, [rbp+0x28]
13c25dada  mov rdi, 0xfffd800007e51768
13c25dae4  cmp rdi, [rbp+0x2c0]
13c25daeb  jnz 0x13c25001c  ->3
13c25daf1  mov rbx, [rbp+0x2b8]
13c25daf8  ror rbx, 0x2f
13c25dafc  cmp bx, -0x09
13c25db00  jnz 0x13c25001c  ->3
13c25db06  shr rbx, 0x11
13c25db0a  mov rdi, 0xfffd800007e6d910
13c25db14  cmp rdi, [rbp+0x218]
13c25db1b  jnz 0x13c25001c  ->3
13c25db21  mov r12, [rbp+0x210]
13c25db28  ror r12, 0x2f
13c25db2c  cmp r12w, -0x09
13c25db31  jnz 0x13c25001c  ->3
13c25db37  shr r12, 0x11
13c25db3b  cmp r12, 0x07e96fd8
13c25db42  jnz 0x13c25001c  ->3
13c25db48  mov r15, [r15+0x10]
13c25db4c  cmp dword [r15+0x34], +0x3f
13c25db51  jnz 0x13c25001c  ->3
13c25db57  mov r15, [r15+0x28]
13c25db5b  mov rdi, 0xfffd800007e519d0
13c25db65  cmp rdi, [r15+0x3c8]
13c25db6c  jnz 0x13c25001c  ->3
13c25db72  mov r12, [r15+0x3c0]
13c25db79  ror r12, 0x2f
13c25db7d  cmp r12w, -0x09
13c25db82  jnz 0x13c25001c  ->3
13c25db88  shr r12, 0x11
13c25db8c  cmp r12, 0x07e51928
13c25db93  jnz 0x13c25001c  ->3
13c25db99  mov rdi, 0xfffd800007e51d28
13c25dba3  cmp rdi, [r15+0x488]
13c25dbaa  jnz 0x13c25001c  ->3
13c25dbb0  mov r15, [r15+0x480]
13c25dbb7  ror r15, 0x2f
13c25dbbb  cmp r15w, -0x09
13c25dbc0  jnz 0x13c25001c  ->3
13c25dbc6  shr r15, 0x11
13c25dbca  cmp r15, 0x07e51cf0
13c25dbd1  jnz 0x13c25001c  ->3
13c25dbd7  cmp dword [r13+0x34], +0x01
13c25dbdc  jnz 0x13c25001c  ->3
13c25dbe2  mov r15, [r13+0x28]
13c25dbe6  mov rdi, 0xfffd800007e835e8
13c25dbf0  cmp rdi, [r15+0x8]
13c25dbf4  jnz 0x13c25001c  ->3
13c25dbfa  cmp dword [r15+0x4], 0xfffeffff
13c25dc02  jnz 0x13c25001c  ->3
13c25dc08  cmp rbx, 0x07e968f0
13c25dc0f  jnz 0x13c25001c  ->3
13c25dc15  mov rdi, 0xfffd800007e81878
13c25dc1f  cmp rdi, [rbp+0x248]
13c25dc26  jnz 0x13c25001c  ->3
13c25dc2c  mov rbp, [rbp+0x240]
13c25dc33  ror rbp, 0x2f
13c25dc37  cmp bp, -0x0c
13c25dc3b  jnz 0x13c25001c  ->3
13c25dc41  shr rbp, 0x11
13c25dc45  cmp dword [rbp+0x34], +0x0f
13c25dc49  jnz 0x13c25001c  ->3
13c25dc4f  mov rbp, [rbp+0x28]
13c25dc53  mov rdi, 0xfffd800007e82d80
13c25dc5d  cmp rdi, [rbp+0x128]
13c25dc64  jnz 0x13c25001c  ->3
13c25dc6a  mov rbp, [rbp+0x120]
13c25dc71  ror rbp, 0x2f
13c25dc75  cmp bp, -0x09
13c25dc79  jnz 0x13c25001c  ->3
13c25dc7f  shr rbp, 0x11
13c25dc83  cmp rbp, 0x07e53880
13c25dc8a  jnz 0x13c25001c  ->3
13c25dc90  mov rdi, r13
13c25dc93  call 0x107e3fad0 ->lj_tab_len
13c25dc98  mov ebx, eax
13c25dc9a  mov rdx, [r14-0xe00]
13c25dca1  movsd xmm7, [rip-0xdb41]
13c25dca9  movsd xmm6, [rip-0xdb51]
13c25dcb1  movsd xmm5, [rip-0xdb61]
13c25dcb9  test ebx, ebx
13c25dcbb  jz 0x13c250020 ->4
13c25dcc1  cmp ebx, [r13+0x30]
13c25dcc5  jnb 0x13c250024  ->5
13c25dccb  mov rbp, [r13+0x10]
13c25dccf  mov r15, [rbp+rbx*8+0x0]
13c25dcd4  ror r15, 0x2f
13c25dcd8  cmp r15w, -0x05
13c25dcdd  jnz 0x13c250024  ->5
13c25dce3  shr r15, 0x11
13c25dce7  mov qword [rbp+rbx*8+0x0], -0x1
13c25dcf0  mov r13, [rdx+0x30]
13c25dcf4  ror r13, 0x2f
13c25dcf8  cmp r13w, -0x09
13c25dcfd  jnz 0x13c250028  ->6
13c25dd03  shr r13, 0x11
13c25dd07  mov rbp, [rdx+0x38]
13c25dd0b  ror rbp, 0x2f
13c25dd0f  cmp bp, -0x0c
13c25dd13  jnz 0x13c250028  ->6
13c25dd19  shr rbp, 0x11
13c25dd1d  mov rbx, [rdx+0x40]
13c25dd21  ror rbx, 0x2f
13c25dd25  cmp bx, -0x05
13c25dd29  jnz 0x13c250028  ->6
13c25dd2f  shr rbx, 0x11
13c25dd33  cmp r13, 0x07e519f8
13c25dd3a  jnz 0x13c250028  ->6
13c25dd40  mov rax, [r14-0xe08]
13c25dd47  mov rax, [rax+0x30]
13c25dd4b  sub rax, rdx
13c25dd4e  cmp rax, 0x88
13c25dd55  jb 0x13c25002c ->7
13c25dd5b  mov [rdx+0x78], rbx
13c25dd5f  or dword [rdx+0x7c], 0xfffd8000
13c25dd66  mov [rdx+0x70], rbp
13c25dd6a  or dword [rdx+0x74], 0xfffa0000
13c25dd71  movsd [rdx+0x68], xmm5
13c25dd76  mov dword [rdx+0x60], 0x07e519f8
13c25dd7d  mov dword [rdx+0x64], 0xfffb8000
13c25dd84  movsd [rdx+0x58], xmm6
13c25dd89  movsd [rdx+0x50], xmm7
13c25dd8e  mov dword [rdx+0x48], 0x07f55ef8
13c25dd95  mov dword [rdx+0x4c], 0xfffb0000
13c25dd9c  add rdx, +0x70
13c25dda0  mov eax, 0x3
13c25dda5  mov ebx, 0x07e4faf8
13c25ddaa  jmp 0x107e304f0
---- TRACE 18 stop -> stitch

The function in question is TestWorld:copyValue from test.lua. If my understanding of the above is correct, then the final RET1 instruction of said function gets turned into a JLOOP (that is, a JIT-compiled loop of the function returning to itself). A subsequent trace hits this JLOOP and wants to attach to it, and forms a snapshot whose PC points to one instruction after the JLOOP, but there is no instruction after the JLOOP, as it came from the final RET1 of the function.

There is some logic in lj_trace.c which looks like it relates to the case of a RET instruction turning into a JLOOP:

  if (bc_op(*pc) == BC_JLOOP) {
    BCIns *retpc = &traceref(J, bc_d(*pc))->startins;
    if (bc_isret(bc_op(*retpc))) {
      if (J->state == LJ_TRACE_RECORD) {
...
      } else {
	pc = retpc;
	setcframe_pc(cf, pc);
      }
    }
  }

This logic requires that the PC from the snapshot is that of the JLOOP rather than one after the JLOOP. Accordingly, the following seems hopeful:

diff --git a/src/lj_record.c b/src/lj_record.c
index 9e41ce0562..4e091e9400 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -571,10 +571,10 @@ static LoopEvent rec_iterl(jit_State *J, const BCIns iterins)
 }
 
 /* Record LOOP/JLOOP. Now, that was easy. */
-static LoopEvent rec_loop(jit_State *J, BCReg ra)
+static LoopEvent rec_loop(jit_State *J, BCReg ra, int adv)
 {
   if (ra < J->maxslot) J->maxslot = ra;
-  J->pc++;
+  J->pc += adv;
   return LOOPEV_ENTER;
 }
 
@@ -2424,7 +2424,7 @@ void lj_record_ins(jit_State *J)
     rec_loop_interp(J, pc, rec_iterl(J, *pc));
     break;
   case BC_LOOP:
-    rec_loop_interp(J, pc, rec_loop(J, ra));
+    rec_loop_interp(J, pc, rec_loop(J, ra, 1));
     break;
 
   case BC_JFORL:
@@ -2434,7 +2434,7 @@ void lj_record_ins(jit_State *J)
     rec_loop_jit(J, rc, rec_iterl(J, traceref(J, rc)->startins));
     break;
   case BC_JLOOP:
-    rec_loop_jit(J, rc, rec_loop(J, ra));
+    rec_loop_jit(J, rc, rec_loop(J, ra, !bc_isret(bc_op(traceref(J, rc)->startins))));
     break;
 
   case BC_IFORL:

@MikePall
Copy link
Member

Fixed. Thanks Arseny and Peter!

@neoxic
Copy link
Author

neoxic commented Oct 12, 2020

If this has been fixed, please also rehabilitate issue #615 for integrity sake by restoring its title and removing the "invalid" label. Thank you!

ligurio pushed a commit to tarantool/luajit that referenced this issue Jul 28, 2023
Reported by Arseny Vakhrushev.
Analysis and fix contributed by Peter Cawley.

(cherry picked from commit ff1e72a)

Sergey Bronnikov:
	* added the description and the test for the problem

Part of tarantool/tarantool#8825

------------

Related issues:
  * LuaJIT#611
  * LuaJIT#615
  * LuaJIT#624

LuaJIT@ff1e72ac
tarantool/tarantool#8825
https://github.com/tarantool/tarantool/wiki/Vanilla-LuaJIT-sync-status
ligurio pushed a commit to tarantool/luajit that referenced this issue Jul 28, 2023
Reported by Arseny Vakhrushev.
Analysis and fix contributed by Peter Cawley.

(cherry picked from commit ff1e72a)

Sergey Bronnikov:
  * added the description and the test for the problem

Part of tarantool/tarantool#8825

------------

Related issues:
  * LuaJIT#611
  * LuaJIT#615
  * LuaJIT#624

LuaJIT@ff1e72ac
tarantool/tarantool#8825
https://github.com/tarantool/tarantool/wiki/Vanilla-LuaJIT-sync-status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants