#### test AST python parser
### test AST python parser
## test AST python parser
# test AST python parser

The code below finds the assign element in the python syntax tree

In [73]:
import ast
source = "import pandas as pd \nfrom imblearn.over_sampling import RandomOverSampler, SMOTE \nimport matplotlib; x = train_df.drop(columns = \"target\", random_seed=seed)"
root = ast.parse(source)
assignment = {n.id  for node in ast.walk(root) if isinstance(node, ast.Assign) for n in node.targets}
all_vars = {node.id for node in ast.walk(root) if isinstance(node, ast.Name)}
print(all_vars)
print(assignment)

{'seed', 'train_df', 'x'}
{'x'}


`targets` of the assign is the left-value. This is a list as multiple objects can be assigned

In [36]:
names[0].targets[0].id

'x'

The right value is stored in `value` attribute

In [37]:
names[0].value.__dict__

{'func': <ast.Attribute at 0x7fd9e8ec19a0>,
 'args': [],
 'keywords': [<ast.keyword at 0x7fd9f8bafb50>],
 'lineno': 2,
 'col_offset': 4,
 'end_lineno': 2,
 'end_col_offset': 37}

Two ways to traverse the tree

In [43]:
ast.dump(root)

"Module(body=[Import(names=[alias(name='pandas', asname='pd')]), ImportFrom(module='imblearn.over_sampling', names=[alias(name='RandomOverSampler'), alias(name='SMOTE')], level=0), Import(names=[alias(name='matplotlib')]), Assign(targets=[Name(id='x', ctx=Store())], value=Call(func=Attribute(value=Name(id='train_df', ctx=Load()), attr='drop', ctx=Load()), args=[], keywords=[keyword(arg='columns', value=Constant(value='target'))]))], type_ignores=[])"

In [54]:
print(root.body[0].names[0].asname) # import as
print(root.body[1].names[0].name) # import from
print(root.body[2].names[0].name) # without as

pd
RandomOverSampler
matplotlib


In [68]:
# this generates all the left values from import
all_imports = set()
for node in ast.walk(root):
    if isinstance(node, (ast.Import, ast.ImportFrom)):
        for n in node.names:
            all_imports.add(n.asname or n.name )
all_imports

{'RandomOverSampler', 'SMOTE', 'matplotlib', 'pd'}

In [76]:
# more elegant form
all_imports =  {n.asname or n.name  for node in ast.walk(root) \
                                    if isinstance(node, (ast.Import, ast.ImportFrom)) \
                                    for n in node.names}
all_imports

{'RandomOverSampler', 'SMOTE', 'matplotlib', 'pd'}

In [42]:
for node in ast.walk(root):
    print(type(node))

<class 'ast.Module'>
<class 'ast.Import'>
<class 'ast.ImportFrom'>
<class 'ast.Import'>
<class 'ast.Assign'>
<class 'ast.alias'>
<class 'ast.alias'>
<class 'ast.alias'>
<class 'ast.alias'>
<class 'ast.Name'>
<class 'ast.Call'>
<class 'ast.Store'>
<class 'ast.Attribute'>
<class 'ast.keyword'>
<class 'ast.Name'>
<class 'ast.Load'>
<class 'ast.Constant'>
<class 'ast.Load'>


Establishing what elements are assigned,  what is used as right value (presume that the notebook does not generate error) will be a very strong indicator of code ordering

The existing variables are: the right values, including variables used in function/method invocation without assignment (e.g. `nn.fit(df)`)

The newly created variables are: left values in assignment, and imported packages

Since the latter is easier, we will find a list of all variables, and subtract from it the list of left values to find the right values

Note: function definitions are not dealt with here

In [None]:
from ast import NodeVisitor

class EvalVisitor(NodeVisitor):
    def __init__(self, **kwargs):
        self._namespace = kwargs

    def visit_Name(self, node):
        return self._namespace[node.id]

    def visit_Num(self, node):
        return node.n

    def visit_NameConstant(self, node):
        return node.value

    def visit_UnaryOp(self, node):
        val = self.visit(node.operand)
        return operators[type(node.op)](val)

    def visit_BinOp(self, node):
        lhs = self.visit(node.left)
        rhs = self.visit(node.right)
        return operators[type(node.op)](lhs, rhs)

    def generic_visit(self, node):
        raise ValueError("malformed node or string: " + repr(node))

### References

Exploring the Python AST

https://mvdwoord.github.io/exploration/2017/08/18/ast_explore.html

AST documentation

https://docs.python.org/3/library/ast.html

https://stackoverflow.com/questions/33554036/how-to-get-all-variable-and-method-names-used-in-script

NodeVisitor: 
https://stackoverflow.com/questions/26398179/replace-variable-names-with-actual-values-in-an-expression-in-ast-python
