Skip to content

Conversation

@x0ret
Copy link
Collaborator

@x0ret x0ret commented May 16, 2019

This is for getting ready for finalizing python 3.7 decompiler.

@rocky, Please check the continue for COME_FROM_LOOP in scanner3.py since i didn't find a proper way. Since we changed COME_FROM proc, COME_FROM_LOOP will be at it's correct location but i couldn't adjust a proper grammer for it.

@x0ret x0ret requested a review from rocky May 16, 2019 01:40
@x0ret
Copy link
Collaborator Author

x0ret commented May 16, 2019

About failed tests, this work is in progress.

@rocky
Copy link
Owner

rocky commented May 16, 2019

@x0ret looking at the diff I don't see any problem. Adding the label _SETUP or _LOOP was supposed to help, but if it doesn't no problem.

But exactly what is an example of a for loop that uses generates code that matches the rule?

@x0ret
Copy link
Collaborator Author

x0ret commented May 16, 2019

The example is the itn function of 3.7.3/lib/python3.7/tarfile.py, And simplifying the case:

if a == 10:
    print(0)
else:
    if 11 <= a <= 15:
        for i in range(10):
            print(i)
    else:
        print(3)

for earlier versions like 3.6 we have following disasm:

   7      56  LOAD_NAME                'print'
          58  LOAD_NAME                'i'
          60  CALL_FUNCTION         1  ''
          62  POP_TOP
          64  JUMP_BACK            52  'to 52'
          66  POP_BLOCK
        68_0  COME_FROM_LOOP       42  '42'
          68  JUMP_FORWARD         78  'to 78'
          70  ELSE                     '78'

   9      70  LOAD_NAME                'print'
          72  LOAD_CONST            3  3
          74  CALL_FUNCTION         1  ''
          76  POP_TOP
        78_0  COME_FROM            68  '68'
        78_1  COME_FROM            16  '16'

however after our changes in python3.7 we have following case:

    7      56  LOAD_NAME                'print'
          58  LOAD_NAME                'i'
          60  CALL_FUNCTION         1  ''
          62  POP_TOP
          64  JUMP_BACK            52  'to 52'
          66  POP_BLOCK
          68  JUMP_FORWARD         78  'to 78'
        70_0  COME_FROM            40  '40'
        70_1  COME_FROM            34  '34'

   9      70  LOAD_NAME                'print'
          72  LOAD_CONST            3  3
          74  CALL_FUNCTION         1  ''
          76  POP_TOP
        78_0  COME_FROM            68  '68'
        78_1  COME_FROM_LOOP       42  '42'
        78_2  COME_FROM            16  '16'

Please note COME_FROM_LOOP location.

I will commit all my testcases later.

Thanks.

@rocky
Copy link
Owner

rocky commented May 16, 2019

Thanks for the additional information. This helps me understand things a lot. It also give a good example of what I was talking about with the dishonest come-froms, and why we need something like python control flow to do better.

If you look at the 3.6 code the COME_FROM_LOOP is dishonest:

L.   5      42  SETUP_LOOP           78  'to 78'
            44  LOAD_NAME                'range'
            46  LOAD_CONST           10  10
            48  CALL_FUNCTION         1  ''
            50  GET_ITER         
            52  FOR_ITER             66  'to 66'
            54  STORE_NAME               'i'

L.   6      56  LOAD_NAME                'print'
            58  LOAD_NAME                'i'
            60  CALL_FUNCTION         1  ''
            62  POP_TOP          
            64  JUMP_BACK            52  'to 52'
            66  POP_BLOCK        
          68_0  COME_FROM_LOOP       42  '42'
            68  JUMP_FORWARD         78  'to 78'

It seems to be put there so as to mark where the loop ends. Here Python could have probably had SETUP_LOOP 68 instead of SETUP_LOOP 78 since the instruction afterward is a jump to offset 78.

I am guessing that the reason this is working in 3.6 is because there is a pseudo instruction ELSE inserted which doesn't appear in 3.7 and that guides the parser along. But how the ELSE and the the faking up of the COME_FROM_LOOP is done, (to put it in the wrong place) is all black art guessing with ad hoc means.

In 3.7 the COME_FROM_LOOP/COME_FROM really doesn't mark the end boundary of the for loop. In pre 3.6 days when there was less jump optimization (changing jump to a; ... a: jump to b into jump to b then that COME_FROM_LOOP would always appear at the end of the loop.

In 3.8 SETUP_LOOP I believe is removed altogether.

I tried looking to see how python-control-flow handles this. It creates the control-flow graph and dominators correctly, but it comes up with the wrong control flow. I will look at how to fix that over the weekend. I will also create a new decompile project for just 3.7 stripped down from uncompyle6.

@rocky
Copy link
Owner

rocky commented May 19, 2019

I spent some time looking at this. I am now of the opinion that fixing this up properly at this stage is just too much work, so we not merge this. And just live with the bugs for now.

I'll create a new repository for Python 3.7 and beyond and start stripping things and maybe we can add better ways to do control flow there. Possibly at some point this can get backported.

The next iteration that I have tried but not committed adds and/or terminators where the COME_FROMs were improperly placed. Noting the end of an and/or is not dishonest and narrows that use.

What do you think?

@x0ret
Copy link
Collaborator Author

x0ret commented May 19, 2019

Thanks for that, I agree with you.

Besides i'm so interested in python-control-flow. It's so promising. I spent sometime to get familiar with it's code base. Do you have the plan for use python-control-flow for Python 3.7+?

@rocky
Copy link
Owner

rocky commented May 19, 2019

Do you have the plan for use python-control-flow for Python 3.7+?

Absolutely. That was the whole point behind creating this.

@x0ret
Copy link
Collaborator Author

x0ret commented May 19, 2019

Great. Please count on me in the developing process.

@rocky
Copy link
Owner

rocky commented May 19, 2019

Ok. https://github.com/rocky/python-decompile3/ is now there.

It minimally runs. There are some run errors that need fixing,.

Some older code is around because it was imported/subclassed. I will rebase things off of 3.7 later.

There is a way to dump the grammar. There is also a parser profiler so you can see which rules are not used. But this is all the time I can spend on this for now. Go at it!

I am closing this PR. But we should keep the branch around. I'll be putting out a release with everything else soon.

@rocky rocky closed this May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants