Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding "last iteration" trace aborts #1393

Open
alexandergall opened this issue Dec 10, 2018 · 2 comments
Open

Understanding "last iteration" trace aborts #1393

alexandergall opened this issue Dec 10, 2018 · 2 comments

Comments

@alexandergall
Copy link
Contributor

alexandergall commented Dec 10, 2018

I'm trying to find an explanation for a certain kind of trace aborts which are among the main causes for the blacklisting of loops by the compiler in my use cases.

As an example, I'm using code from the apps.ipv6.fragmenter app which is currently waiting to be upstreamed (#1383 and #1384). The actual code doesn't matter, i.e. I see the same effect all the time with various pieces of code. I did not find a reasonably small program that would exhibit the issue in a reproducible manner.

Here is the relevant part of the push() method of this particular version of the fragmenter app

   for _ = 1, link.nreadable(input) do  -- Code line 313
      local pkt = link.receive(input)
      local mtu = self.mtu
      if self.pmtud then
         local h = ffi.cast(ether_ipv6_header_ptr_t, pkt.data)
         local entry = self.dcache:lookup_ptr(h.ipv6.dst_ip)
         if entry then
            mtu = entry.value.mtu
         end
      end
      -- FIXME: assumes that there is always room to store the MTU at
      -- the end of the payload.
      ffi.cast("uint16_t *", pkt.data + pkt.length)[0] = mtu
      if pkt.length <= mtu + ether_header_len then
         -- No need to fragment; forward it on.
         counter.add(self.shm["out-ipv6-frag-not"])
         link.transmit(output, pkt)
      else
         -- Packet doesn't fit into MTU; need to fragment.
         link.transmit(input, pkt)
      end
   end

   for _ = 1,  link.nreadable(input) do -- Code line 336
      local pkt  = link.receive(input)
      local mtu = ffi.cast("uint16_t *", pkt.data + pkt.length)[0]
      local next_header =
         ffi.cast(ether_ipv6_header_ptr_t, pkt.data).ipv6.next_header
      pkt_box[0] = pkt
      self:fragment_and_transmit(next_header, pkt_box, mtu)
      packet.free(pkt_box[0])
   end

The line numbers that are relevant in the following JIT dumps are 313 (the start of the first loop) and 336 (the start of the second loop).

In the JIT dump, there are multiple attempts to compile the loop starting at 313 but they all fail and the loop is eventually getting blacklisted. All of these attempts come in one of exactly two flavors. The first one looks like this:

---- TRACE 584 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 584 abort fragment.lua:336 -- leaving loop in root trace

What's happening here is that the interpreter is currently processing an iteration of the loop when the FORL byte code becomes hot (there must have been at least one iteration at this point, because otherwise the FORL byte code would not be executed at all). The FORL instruction is not part of the trace but it is evaluated, which means that first the loop variable is incremented and checked against the upper limit (which was determined by link.nreadable(input) earlier when the FORI byte code was executed). If the loop condition still holds, the interpreter would jump to the loop body and recording would continue.

However, in this case it must have turned out that the loop condition no longer holds (i.e. the last iteration run before recording started was the last one) and the interpreter records the first instruction that follows the loop, which happens to be the loading of the lower bound of the loop starting at line 336, corresponding to the byte code KSHORT 5 1. The recorder notices that this instruction is not within the body of the loop and aborts.

The second flavour of aborted traces related to that loop is this:

---- TRACE 567 start fragment.lua:313
0071  UGET     9   0      ; link
0072  TGETS    9   9   7  ; "receive"
0073  MOV     10   1
0074  CALL     9   2   2
0000  . JFUNCF   5  17         ; link.lua:57
0001  . TGETS    1   0   0  ; "packets"
0000  . . . FUNCC               ; ffi.meta.__index
0002  . TGETS    2   0   1  ; "read"
0000  . . . FUNCC               ; ffi.meta.__index
0003  . TGETV    1   1   2
0000  . . . FUNCC               ; ffi.meta.__index
0004  . UGET     2   0      ; band
0005  . TGETS    3   0   1  ; "read"
0000  . . . FUNCC               ; ffi.meta.__index
0006  . ADDVN    3   3   0  ; 1
0007  . UGET     4   1      ; size
0008  . SUBVN    4   4   0  ; 1
0009  . CALL     2   2   3
0000  . . FUNCC               ; bit.band
0010  . TSETS    2   0   1  ; "read"
0000  . . . FUNCC               ; ffi.meta.__newindex
0011  . UGET     2   2      ; counter
0012  . TGETS    2   2   2  ; "add"
0013  . TGETS    3   0   3  ; "stats"
0000  . . . FUNCC               ; ffi.meta.__index
0014  . TGETS    3   3   4  ; "rxpackets"
0000  . . . FUNCC               ; ffi.meta.__index
0015  . CALL     2   1   2
0000  . . JFUNCF   4   5         ; counter.lua:91
0001  . . TGETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__index
0002  . . ISTC     3   1
0003  . . JMP      3 => 0005
0004  . . KSHORT   3   1
0005  . . ADDVV    2   2   3
0000  . . . . FUNCC               ; ffi.meta.__add
0006  . . TSETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__newindex
0007  . . RET0     0   1
0016  . UGET     2   2      ; counter
0017  . TGETS    2   2   2  ; "add"
0018  . TGETS    3   0   3  ; "stats"
0000  . . . FUNCC               ; ffi.meta.__index
0019  . TGETS    3   3   5  ; "rxbytes"
0000  . . . FUNCC               ; ffi.meta.__index
0020  . TGETS    4   1   6  ; "length"
0000  . . . FUNCC               ; ffi.meta.__index
0021  . CALL     2   1   3
0000  . . JFUNCF   4   5         ; counter.lua:91
0001  . . TGETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__index
0002  . . ISTC     3   1
0003  . . JMP      3 => 0005
0005  . . ADDVV    2   2   3
0000  . . . . FUNCC               ; ffi.meta.__add
0006  . . TSETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__newindex
0007  . . RET0     0   1
0022  . RET1     1   2
0075  TGETS   10   0  18  ; "mtu"
0076  TGETS   11   0  19  ; "pmtud"
0077  ISF         11
0078  JMP     12 => 0094
0079  UGET    11   1      ; ffi
0080  TGETS   11  11   8  ; "cast"
0081  UGET    12   2      ; ether_ipv6_header_ptr_t
0082  TGETS   13   9   9  ; "data"
0000  . . FUNCC               ; ffi.meta.__index
0083  CALL    11   2   3
0000  . FUNCC               ; ffi.cast
0084  TGETS   12   0  20  ; "dcache"
0085  MOV     13  12
0086  TGETS   12  12  21  ; "lookup_ptr"
0087  TGETS   14  11  22  ; "ipv6"
0000  . . FUNCC               ; ffi.meta.__index
0088  TGETS   14  14  23  ; "dst_ip"
0000  . . FUNCC               ; ffi.meta.__index
0089  CALL    12   2   3
0000  . FUNCF    7          ; ctable.lua:367
0001  . TGETS    2   0   0  ; "hash_fn"
0002  . MOV      3   1
0003  . CALL     2   2   2
0000  . . FUNCC               ; ffi.meta.__call
0004  . TGETS    3   0   1  ; "entries"
0005  . UGET     4   0      ; hash_to_index
0006  . MOV      5   2
0007  . TGETS    6   0   2  ; "scale"
0008  . CALL     4   2   3
0000  . . FUNCF    4          ; ctable.lua:69
0001  . . UGET     2   0      ; floor
0002  . . MULVV    3   0   1
0003  . . CALL     2   2   2
0000  . . . FUNCC               ; math.floor
0004  . . RET1     2   2
0009  . ADDVV    3   3   4
0000  . . . FUNCC               ; ffi.meta.__add
0010  . TGETS    4   3   3  ; "hash"
0000  . . . FUNCC               ; ffi.meta.__index
0011  . ISNEV    2   4
0012  . JMP      4 => 0020
0020  . TGETS    4   3   3  ; "hash"
0000  . . . FUNCC               ; ffi.meta.__index
0021  . ISGE     4   2
0022  . JMP      4 => 0026
0026  . TGETS    4   3   3  ; "hash"
0000  . . . FUNCC               ; ffi.meta.__index
0027  . ISNEV    4   2
0028  . JMP      4 => 0039
0039  . KPRI     4   0
0040  . RET1     4   2
0090  ISF         12
0091  JMP     13 => 0094
0094  UGET    11   1      ; ffi
0095  TGETS   11  11   8  ; "cast"
0096  KSTR    12  25      ; "uint16_t *"
0097  TGETS   13   9   9  ; "data"
0000  . . FUNCC               ; ffi.meta.__index
0098  TGETS   14   9  16  ; "length"
0000  . . FUNCC               ; ffi.meta.__index
0099  ADDVV   13  13  14
0000  . . FUNCC               ; ffi.meta.__add
0100  CALL    11   2   3
0000  . FUNCC               ; ffi.cast
0101  TSETB   10  11   0
0000  . . FUNCC               ; ffi.meta.__newindex
0102  TGETS   11   9  16  ; "length"
0000  . . FUNCC               ; ffi.meta.__index
0103  UGET    12   8      ; ether_header_len
0104  ADDVV   12  10  12
0105  ISGT    11  12
0106  JMP     11 => 0118
0107  UGET    11   5      ; counter
0108  TGETS   11  11  12  ; "add"
0109  TGETS   12   0  13  ; "shm"
0110  TGETS   12  12  14  ; "out-ipv6-frag-not"
0111  CALL    11   1   2
0000  . JFUNCF   4   5         ; counter.lua:91
0001  . TGETS    2   0   0  ; "c"
0000  . . . FUNCC               ; ffi.meta.__index
0002  . ISTC     3   1
0003  . JMP      3 => 0005
0004  . KSHORT   3   1
0005  . ADDVV    2   2   3
0000  . . . FUNCC               ; ffi.meta.__add
0006  . TSETS    2   0   0  ; "c"
0000  . . . FUNCC               ; ffi.meta.__newindex
0007  . RET0     0   1
0112  UGET    11   0      ; link
0113  TGETS   11  11  15  ; "transmit"
0114  MOV     12   2
0115  MOV     13   9
0116  CALL    11   1   3
0000  . JFUNCF   5   6         ; link.lua:71
0001  . GGET     2   0      ; "full"
0002  . MOV      3   0
0003  . CALL     2   2   2
0000  . . FUNCF    4          ; link.lua:90
0001  . . UGET     1   0      ; band
0002  . . TGETS    2   0   0  ; "write"
0000  . . . . FUNCC               ; ffi.meta.__index
0003  . . ADDVN    2   2   0  ; 1
0004  . . UGET     3   1      ; size
0005  . . SUBVN    3   3   0  ; 1
0006  . . CALL     1   2   3
0000  . . . FUNCC               ; bit.band
0007  . . TGETS    2   0   1  ; "read"
0000  . . . . FUNCC               ; ffi.meta.__index
0008  . . ISEQV    1   2
0009  . . JMP      1 => 0012
0010  . . KPRI     1   1
0011  . . JMP      2 => 0013
0013  . . RET1     1   2
0004  . ISF          2
0005  . JMP      3 => 0016
0016  . TGETS    2   0   5  ; "packets"
0000  . . . FUNCC               ; ffi.meta.__index
0017  . TGETS    3   0   6  ; "write"
0000  . . . FUNCC               ; ffi.meta.__index
0018  . TSETV    1   2   3
0000  . . . FUNCC               ; ffi.meta.__newindex
0019  . UGET     2   2      ; band
0020  . TGETS    3   0   6  ; "write"
0000  . . . FUNCC               ; ffi.meta.__index
0021  . ADDVN    3   3   0  ; 1
0022  . UGET     4   3      ; size
0023  . SUBVN    4   4   0  ; 1
0024  . CALL     2   2   3
0000  . . FUNCC               ; bit.band
0025  . TSETS    2   0   6  ; "write"
0000  . . . FUNCC               ; ffi.meta.__newindex
0026  . UGET     2   0      ; counter
0027  . TGETS    2   2   1  ; "add"
0028  . TGETS    3   0   2  ; "stats"
0000  . . . FUNCC               ; ffi.meta.__index
0029  . TGETS    3   3   7  ; "txpackets"
0000  . . . FUNCC               ; ffi.meta.__index
0030  . CALL     2   1   2
0000  . . JFUNCF   4   5         ; counter.lua:91
0001  . . TGETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__index
0002  . . ISTC     3   1
0003  . . JMP      3 => 0005
0004  . . KSHORT   3   1
0005  . . ADDVV    2   2   3
0000  . . . . FUNCC               ; ffi.meta.__add
0006  . . TSETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__newindex
0007  . . RET0     0   1
0031  . UGET     2   0      ; counter
0032  . TGETS    2   2   1  ; "add"
0033  . TGETS    3   0   2  ; "stats"
0000  . . . FUNCC               ; ffi.meta.__index
0034  . TGETS    3   3   8  ; "txbytes"
0000  . . . FUNCC               ; ffi.meta.__index
0035  . TGETS    4   1   9  ; "length"
0000  . . . FUNCC               ; ffi.meta.__index
0036  . CALL     2   1   3
0000  . . JFUNCF   4   5         ; counter.lua:91
0001  . . TGETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__index
0002  . . ISTC     3   1
0003  . . JMP      3 => 0005
0005  . . ADDVV    2   2   3
0000  . . . . FUNCC               ; ffi.meta.__add
0006  . . TSETS    2   0   0  ; "c"
0000  . . . . FUNCC               ; ffi.meta.__newindex
0007  . . RET0     0   1
0037  . RET0     0   1
0117  JMP     11 => 0123
0123  FORL     5 => 0071
---- TRACE 567 abort fragment.lua:336 -- leaving loop in root trace

In this case, execution of the FORL instruction results in the recording of an actual iteration of the loop. The trace passes through the then branches of both if statements, executes link.transmit(output, pkt) and finally arrives at the FORL instruction of the first loop (which is located at the end of the loop body). Why does it abort there? The trace recorder recognizes that this iteration happened to have been the last one for this evaluation of the loop. Therefore, it cannot record the actual loop itself and must abort.

Here is the complete sequence of aborted traces that eventually lead to blacklisting of the FORL byte code of the loop starting at line 313

---- TRACE 567 start fragment.lua:313
0071  UGET     9   0      ; link
...
0123  FORL     5 => 0071
---- TRACE 567 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 574 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 574 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 584 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 584 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 585 start fragment.lua:313
0071  UGET     9   0      ; link
...
0123  FORL     5 => 0071
---- TRACE 585 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 597 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 597 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 601 start fragment.lua:313
0071  UGET     9   0      ; link
...
0123  FORL     5 => 0071
---- TRACE 601 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 601 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 601 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 603 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 603 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 608 start fragment.lua:313
0071  UGET     9   0      ; link
...
0123  FORL     5 => 0071
---- TRACE 608 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 611 start fragment.lua:313
0124  KSHORT   5   1
---- TRACE 611 abort fragment.lua:336 -- leaving loop in root trace
---- TRACE 613 start fragment.lua:313
0071  UGET     9   0      ; link
...
0123  FORL     5 => 0071
---- TRACE 613 abort fragment.lua:336 -- leaving loop in root trace

After the abort of trace 613, the FORL is replaced by a IFROL and all traces passing through it from then on abort with something like

0123  IFORL    5 => 0071
---- TRACE 634 abort fragment.lua:313 -- blacklisted

What all of this means is, I guess, that this loop appears to have exactly either one or two iterations left whenever it becomes hot until it is blacklisted. But how can that be?

The loop starts out with the default hot counter (it is counted down and the value of zero triggers the recorder). The first thing to note is that the loop byte codes don't have individual hop counters but use a pretty small hash table of counters indexed by the address of the byte code (essentially). That means that a particular loop can be traced earlier than expected, but that does no harm and seems not related to the issue under discussion.

Each time the recording is aborted, the hot counter is reset and a penalty value is added to it so it takes more iterations until the next recording is attempted. The penalty value is not constant but includes some pseudo-randomness.

Given all that, how can it be that recording always happens for the last one or two iterations? The only reasonable explanation seems to be that the loop actually never has more than two packets to process over an extended period of time. I guess this can, in fact, happen occasionally but I'm seeing the same behavior over and over with all sorts of loops that makes this explanation look pretty improbable to me.

I'm looking for a deeper understanding of this phenomenon because it seems to be at the core of some of the performance issues I see when my programs are subjected to certain changes of the workload.

@lukego
Copy link
Member

lukego commented Dec 11, 2018

Great writeup!

I have to read the JIT code and think about this but please indulge me in a hot-take :-)

Is this just a bug in lj_record.c? If the JIT is waiting for the program counter to return to the start of the loop in order to complete the trace, and this is not happening because the loop is terminating, then perhaps the JIT should instead complete the trace when it reaches the FORL i.e. "pretend" for the purpose of code generation that the branch back into the loop is taken? This seems straightforward if the taken branch was about to immediately complete the loop without recording any further instructions.

Generally I am bugged whenever heuristics like "leaving loop in root trace" lead to blacklistings. Just shouldn't happen IMHO.

@lukego
Copy link
Member

lukego commented Dec 11, 2018

I have been able to reproduce this with a minimal example over at raptorjit/raptorjit#203.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants