Skip to content
R. Bernstein edited this page Apr 25, 2017 · 4 revisions

I've been trying to improve the accuracy of uncompyle6, and most avenues have problems. And it is not just uncompyle6 that has problems, unpycdc has the same problems and it works slightly differently. See for example https://github.com/zrax/pycdc/issues/103

Right now I'm thinking that the best I can probably do is run a sanity grammar checker at the end to look for things I know are problematic.

Specifically:

  • break, or continue statements which are not inside a loop
  • return statements outside of function
  • inline operators (e.g. +=) inside non-statement expressions (e.g. if)

In theory there is a way to write a the grammar that handles these. Specifically the grammar has l_stmts for statements inside a loop. But the proliferation of the grammar to include both l_stmts and stmts apparently became too unwieldy and at some point in the past this idea was largely dropped. (The proliferation in the grammar is sort of like the difference between grammars that don't allow you to specify operator precedence and those that do.)

Another way to handle this is probably to change the Earley-algorithm parser, spark, to allow us to call a function before doing a reduction. For example, before reducing continue_stmt, one might call a continue-checking function which looks at the state to see that there is an (unreduced) SETUP_LOOP earlier on. Because we are using the Earley algorithm or a generic context-free grammar rather than a LL or LR parser, there is no single stack. Nevertheless we can find the corresponding stack for a particular rule.

The preceding paragraph has been implemented and works to some degree. The current best line is to handle control flow outside of the grammar by breaking code up into basic blocks and applying control flow analysis with dominator regions. This would then annotate the instructions in a way that can be easily picked up by the grammar