# Final Year Project
## Finding Call Sequences

Target Project: Sudoku Robot Master (https://github.com/Sanahm/Sudoku-robot) <br>
Target Library: Tensorflow

In the open source project for Sudoku Robot Master, the library **Tensorflow** is being used in the python file named **mnist_model_convolutionnel.py**

### Creating Module Object for Syntax Tree

Firstly, I save the whole python file as a string to allow LibCST to construct a Syntax Tree in order to analyse and identify the method invocations. <br>
Then, parse this string as a module object for Syntax Tree Analysis.

In [1]:
from mnist_model_convolutionnel import file_string
import libcst as cst
source_tree = cst.parse_module(file_string)

### Classifying Statements

The main module has children nodes of mainly 2 types: **SimpleStatementLines** and **FunctionDef**. <br>
The following will be the implementation of grouping nodes of these 2 types for analysis later on.

In [4]:
statements = source_tree.children
simple_statement_lines = [] #list containing nodes of type SimpleStatementLines
function_def_lines = [] #list containing nodes of type FunctionDef

def sort_statements(statements):
    for statement in statements:
        if(isinstance(statement, cst.SimpleStatementLine)):
            simple_statement_lines.append(statement)
            function_def_lines.append("Simple Statement")
        elif (isinstance(statement, cst.FunctionDef)):
            function_def_lines.append(statement)
            simple_statement_lines.append("Function Def")

sort_statements(statements)

### Analysising Simple Statement Lines

We will be grouping method invocation of TensorFlow together if the method invocation are close* to each other. <br>
To evaluate if 2 method invocations are close together, we will analyse the nodes of type **SimpleStatementLines**, <br>
Method Invocations are nodes of type **Call**. Such nodes will be child (or grandchild, greatgrandchild etc.) of nodes of type **SimpleStatementLines**.<br>
Therefore, if method invocations are from the same SimpleStatementLines node, or subsequent SimpleStatementLines (next to each other), they can be grouped together. <br>

We will be using a search to find the TensorFlow method invocations. Then, we will make a master list of all the method invocations.

In [16]:
calls = []
calls_pIndex = []

def call_search_SSL(parent, index):
    if(isinstance(parent, cst.Call)):
        try:
            if(parent.func.value.value == 'tf'):
                calls.append(parent)
                calls_pIndex.append(index)
            if(parent.func.value.value.value == 'tf'):
                calls.append(parent)
                calls_pIndex.append(index)
        except:
            pass
    if(len(parent.children) == 0):
        return
    else:
        for child in parent.children:
            call_search_SSL(child, index)

for i in range(len(simple_statement_lines)):
    if(isinstance(simple_statement_lines[i], cst.SimpleStatementLine)):
        call_search_SSL(simple_statement_lines[i], i)

In [17]:
simpleStatementLine_calls = []

for i in range(len(calls)):
    simpleStatementLine_calls.append([calls_pIndex[i], calls[i]])

def cluster(data, maxgap):
    '''Arrange data into groups where successive elements
       differ by no more than *maxgap*

        >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    '''
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        if abs(x - groups[-1][-1]) <= maxgap:
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

cluster(calls_pIndex, maxgap=1)

[[34, 35, 36, 37], [45, 46, 47, 47, 48, 49, 50, 50, 51, 52, 53]]

The above show the two clusters identified. <br>
<br>
For the first cluster, [34, 35, 36, 37] are grouped together.<br>
The numbers in the list represent the indexes of the **SimpleStatementLines**.<br>
These nodes are grouped together because they are near each other (at most 1 away).<br>
<br>
Each of this number also represent a specific call to the Tensorflow Library API.<br>

In [18]:
for called in simpleStatementLine_calls:
    if(called[0] == 34):
        print(called)

[34, Call(
    func=Attribute(
        value=Name(
            value='tf',
            lpar=[],
            rpar=[],
        ),
        attr=Name(
            value='placeholder',
            lpar=[],
            rpar=[],
        ),
        dot=Dot(
            whitespace_before=SimpleWhitespace(
                value='',
            ),
            whitespace_after=SimpleWhitespace(
                value='',
            ),
        ),
        lpar=[],
        rpar=[],
    ),
    args=[
        Arg(
            value=SimpleString(
                value='"float"',
                lpar=[],
                rpar=[],
            ),
            keyword=None,
            equal=MaybeSentinel.DEFAULT,
            comma=Comma(
                whitespace_before=SimpleWhitespace(
                    value='',
                ),
                whitespace_after=SimpleWhitespace(
                    value='',
                ),
            ),
            star='',
            whitespace_after_star=Simp

The above shows that '34' represents 'tf.placeholder()'

In [19]:
#Cluster 1
for called in simpleStatementLine_calls:
    if(called[0] in [34, 35, 36, 37]):
        print(called[1])

Call(
    func=Attribute(
        value=Name(
            value='tf',
            lpar=[],
            rpar=[],
        ),
        attr=Name(
            value='placeholder',
            lpar=[],
            rpar=[],
        ),
        dot=Dot(
            whitespace_before=SimpleWhitespace(
                value='',
            ),
            whitespace_after=SimpleWhitespace(
                value='',
            ),
        ),
        lpar=[],
        rpar=[],
    ),
    args=[
        Arg(
            value=SimpleString(
                value='"float"',
                lpar=[],
                rpar=[],
            ),
            keyword=None,
            equal=MaybeSentinel.DEFAULT,
            comma=Comma(
                whitespace_before=SimpleWhitespace(
                    value='',
                ),
                whitespace_after=SimpleWhitespace(
                    value='',
                ),
            ),
            star='',
            whitespace_after_star=SimpleWhi

In [20]:
#Cluster 2
for called in simpleStatementLine_calls:
    if(called[0] in [45, 46, 47, 47, 48, 49, 50, 50, 51, 52, 53]):
        print(called[1])

Call(
    func=Attribute(
        value=Attribute(
            value=Name(
                value='tf',
                lpar=[],
                rpar=[],
            ),
            attr=Name(
                value='nn',
                lpar=[],
                rpar=[],
            ),
            dot=Dot(
                whitespace_before=SimpleWhitespace(
                    value='',
                ),
                whitespace_after=SimpleWhitespace(
                    value='',
                ),
            ),
            lpar=[],
            rpar=[],
        ),
        attr=Name(
            value='softmax',
            lpar=[],
            rpar=[],
        ),
        dot=Dot(
            whitespace_before=SimpleWhitespace(
                value='',
            ),
            whitespace_after=SimpleWhitespace(
                value='',
            ),
        ),
        lpar=[],
        rpar=[],
    ),
    args=[
        Arg(
            value=Name(
                value='connecte

### Analysising Function Definitions

We will be grouping method invocation of TensorFlow together if they are from the same **FunctionDef** node. <br>
Method Invocations are nodes of type **Call**. Such nodes will be child (or grandchild, greatgrandchild etc.) of nodes of type **FunctionDef**.<br>
Therefore, if method invocations are from the same SimpleStatementLines node, or subsequent SimpleStatementLines (next to each other), they can be grouped together. <br>

In [21]:
groups = []

def call_search_FD(parent):
    if(isinstance(parent, cst.Call)):
        try:
            if(parent.func.value.value == 'tf'):
                group.append(parent)
            if(parent.func.value.value.value == 'tf'):
                group.append(parent)
        except:
            pass
    if(len(parent.children) == 0):
        return
    else:
        for child in parent.children:
            call_search_FD(child)

for function_def in function_def_lines:
    if(isinstance(function_def, cst.FunctionDef)):
        group = []
        call_search_FD(function_def)
        groups.append(group)

groups

[[Call(
      func=Attribute(
          value=Name(
              value='tf',
              lpar=[],
              rpar=[],
          ),
          attr=Name(
              value='Variable',
              lpar=[],
              rpar=[],
          ),
          dot=Dot(
              whitespace_before=SimpleWhitespace(
                  value='',
              ),
              whitespace_after=SimpleWhitespace(
                  value='',
              ),
          ),
          lpar=[],
          rpar=[],
      ),
      args=[
          Arg(
              value=Call(
                  func=Attribute(
                      value=Name(
                          value='tf',
                          lpar=[],
                          rpar=[],
                      ),
                      attr=Name(
                          value='truncated_normal',
                          lpar=[],
                          rpar=[],
                      ),
                      dot=Dot(
                 

The following shows that there are mainly 7 function definition nodes.

In [22]:
len(groups)

7

For demonstration purposes, the first function node will be printed below.

In [26]:
print(len(groups[0]))
groups[0]

2


[Call(
     func=Attribute(
         value=Name(
             value='tf',
             lpar=[],
             rpar=[],
         ),
         attr=Name(
             value='Variable',
             lpar=[],
             rpar=[],
         ),
         dot=Dot(
             whitespace_before=SimpleWhitespace(
                 value='',
             ),
             whitespace_after=SimpleWhitespace(
                 value='',
             ),
         ),
         lpar=[],
         rpar=[],
     ),
     args=[
         Arg(
             value=Call(
                 func=Attribute(
                     value=Name(
                         value='tf',
                         lpar=[],
                         rpar=[],
                     ),
                     attr=Name(
                         value='truncated_normal',
                         lpar=[],
                         rpar=[],
                     ),
                     dot=Dot(
                         whitespace_before=SimpleWhites

We can see that there are 2 function calls (method invocation of Tensorflow) in this FunctionDef node.
This 2 function calls will be grouped together.