Skip to content
R. Bernstein edited this page Oct 29, 2019 · 5 revisions

See https://github.com/rocky/python-uncompyle6/issues/283

Step 1: Create test case and reproduce the problem

The first step is to create a little test case.

There was one in the issue:

G = ( c for c in "spam, Spam, SPAM!" if c > 'A' and c < 'S')

I used pyenv to get me to Python 2.6.9. And produced a byte compiled file:

$ pyenv local 2.6.9
$ ./test/stdlib/compile-file.py  /tmp/bug.py
compiling /tmp/bug.py to /tmp/bug-2.6.pyc

Now run:

$ ./bin/ucompyle /tmp/bug-2.6.pyc
...
Parse error at or near `None' instruction at offset -1
$ 

Ok I can reproduce the problem The offset -1 means that basically there is no code before we hit an error. When this happens, sometimes can add a valid statement before it and that will often produce an offset that is more easily tracked. However here the error is inside the generator: ( c for c in "spam, Spam, SPAM!" if c > 'A' and c < 'S') so this can't easily be done.

Step 2: See if there if a neighboring version that doesn't have the problem

It so happens that this bug doesn't appear in Python 2.7:

$ pyenv local 2.7.16
$ ./test/stdlib/compile-file.py  /tmp/bug.py
compiling /tmp/bug.py to /tmp/bug-2.7.pyc
$ ./bin/ucompyle /tmp/bug-2.7.pyc
...
G = (c for c in 'spam, Spam, SPAM!' if c > 'A' and c < 'S')
# okay decompiling /tmp/bug-2.7.pyc

Great! We now have a way to hone in on the parser error.

Step 3: Compare working and non-working with debug output

Now that we have two examples one that works in uncompyle6 and one that doesn't we can get debug output and see where things go awry. The options I often use are -agT:

$ ./bin/uncompyle6  -agT /tmp/bug-2.7.pyc > /tmp/bug-good.log
$ ./bin/uncompyle6  -agT /tmp/bug-2.6.pyc > /tmp/bug-bad.log 2>&1

Looking at bug-bad.log:

The first set of pseudo-assembly instructions that starts:

L.   1       0  LOAD_CONST               'spam, Spam, SPAM!'
             3  BUILD_LIST_1          1 
             6  STORE_NAME            0  'x'

can be skipped because this is the setup to calling the generator.

And so an the grammar reductions that occur right after that:

               expr ::= LOAD_CONST (1)
               ret_expr ::= expr (1)
               assert_expr ::= expr (1)
               list ::= expr BUILD_LIST_1 (2)

So you want to focus attention on the grammar reduction that occurs after that:

It starts:

         3     expr ::= LOAD_FAST (2)
         3     ret_expr ::= expr (2)
         3     assert_expr ::= expr (2)
         9     store ::= STORE_FAST (4)
        12     expr ::= LOAD_FAST (5)
        15     expr ::= LOAD_CONST (6)
        12-18  compare_single ::= expr expr COMPARE_OP (7)
        12     compare ::= compare_single (7)
        12     expr ::= compare (7)

And for Python 2.7 which is good the corresponding trace looks like:

               expr ::= LOAD_FAST (1)
               ret_expr ::= expr (1)
               assert_expr ::= expr (1)
         6     store ::= STORE_FAST (3)
         9     expr ::= LOAD_FAST (4)
        12     expr ::= LOAD_CONST (5)
         9-15  compare_single ::= expr expr COMPARE_OP (6)
         9     compare ::= compare_single (6)
         9     expr ::= compare (6)

Down toward the bottom in the "good" 2.7 parse we have:

         9-36  and ::= expr jmp_false expr \e_come_from_opt (14)
Reduce and invalid by check
        33-37  gen_comp_body ::= expr YIELD_VALUE POP_TOP (15)
        33     comp_body ::= gen_comp_body (15)
        33     comp_iter ::= comp_body (15)
         9-37  comp_if ::= expr jmp_false comp_iter (15)
         9     comp_iter ::= comp_if (15)
               genexpr_func ::= LOAD_FAST FOR_ITER store comp_iter JUMP_BACK (16)
               stmt ::= genexpr_func (16)
               sstmt ::= stmt (16)
               stmts ::= sstmt (16)
               START ::= |- stmts (16)

while in the bad 2.6 reductions this corresponses to

        12-41  and ::= expr jmp_false expr \e_come_from_opt (16)
Reduce and invalid by check
        38-42  gen_comp_body ::= expr YIELD_VALUE POP_TOP (17)
        38     comp_body ::= gen_comp_body (17)
        38     comp_iter ::= comp_body (17)
        25-42  comp_if ::= expr jmp_false comp_iter (17)
        12-42  comp_if ::= expr jmp_false comp_iter (17)
        25     comp_iter ::= comp_if (17)
        12     comp_iter ::= comp_if (17)
         3-43  genexpr_func ::= LOAD_FAST FOR_ITER store comp_iter JUMP_BACK (18)
         3     stmt ::= genexpr_func (18)
         3     _stmts ::= stmt (18)
         3     l_stmts ::= _stmts (18)
         3     l_stmts_opt ::= l_stmts (18)
        46     come_from_opt ::= COME_FROM (19)

The thing to notice is that we have this reduction of come_from_opt that isn't getting used. The genexpr_func should somehow subsume this. Below in bug-bad.log we count 19 instructions and see the psuedo-assembly is this:

   2       0  SETUP_LOOP           48  'to 51'   (1)
           3  LOAD_FAST             0  '.0'      (2)
           6  FOR_ITER             41  'to 50'   (3)
           9  STORE_FAST            1  'c'       (4)
          12  LOAD_FAST             1  'c'       (5)
          15  LOAD_CONST               'A'       (6)
          18  COMPARE_OP            4  >         (7)
          21  JUMP_IF_FALSE        22  'to 46'   (8)
          24  POP_TOP                            (9)
          25  LOAD_FAST             1  'c'      (10)
          28  LOAD_CONST               'S'      (11)
          31  COMPARE_OP            0  <        (12) 
          34  JUMP_IF_FALSE         9  'to 46'  (13)
          37  POP_TOP                           (14)
          38  LOAD_FAST             1  'c'      (15)
          41  YIELD_VALUE                       (16)
          42  POP_TOP                           (17)
          43  JUMP_BACK             6  'to 6'   (18)
        46_0  COME_FROM            34  '34'     (19)
        46_1  COME_FROM            21  '21'
          46  POP_TOP          
          47  JUMP_BACK             6  'to 6'
          50  POP_BLOCK        
        51_0  COME_FROM             0  '0'

Both COME_FROMs should be part of the genexpr_func. The fact that genexpr_func matched LOAD_FAST FOR_ITER store comp_iter JUMP_BACK is okay, but it just isn't the full story in this case.

I added this rule:

        genexpr_func ::= setup_loop_lf FOR_ITER store comp_iter JUMP_BACK come_froms
                         POP_TOP jb_pb_come_from

which matches all of the instructions up to 51_0. Notice though I did so thinking about and considering whether those instructions should be considered part of the generator function.

I believe the should.

Step 4: Add test case

The last step is to add a test case to make sure we don't regress in the future. Here is a full test case for this example:

# Issue #283 in Python 2.6
# See https://github.com/rocky/python-uncompyle6/issues/283

# This code is RUNNABLE!

G = ( c for c in "spam, Spam, SPAM!" if c > 'A' and c < 'S')
assert list(G) == ["P", "M"]

This code when byte compiled, decompiled and run will check itself. I put this in test/simple_source/bug26/00_generator.py.

And to add it:

$ git add test/simple_source/bug26/00_generator.py
$ cd test && ./add-test.py simple_source/bug26/00_generator.py
...
$ git add --force bytecode_2.6_run/00_generator.pyc