# OpenBugger
This notebook is a self-contained demo of the OpenBugger package that automatically bugs python code using LibCST

This plan for the notebook is to develop all the components necessary to construct a pipeline that
starting from a python script, it extracts its Concrete Syntax Tree and use it to apply a sequence of revertable syntactical code-mutation to automatically generate training data for debugging language models.

1. Tools to ensure that the sequence of modification is consistent and does not overwrite previously introduced mutations.
2. Bugger class to store the local context of multiple bugs that can be applied in a chain.
3. InverseTransformer class that is able to reverse the transformation of any other transformer sharing the same bugger context.
4. 5/12 example LibCST transformers that can each implement a different logical bug.  
5. Use the InverseTransformer to generate accurate debugging instructions to be used as training data for Large Language Models.

## LibCST

In [1]:
from libcst.codemod import CodemodContext, Codemod
from libcst.metadata import MetadataWrapper
from libcst import Equal, GreaterThanEqual, GreaterThan, CSTNode, Module
from typing import List
from copy import deepcopy
import libcst as cst
from libcst.codemod import CodemodContext, ContextAwareTransformer, ContextAwareVisitor
from libcst.metadata import BatchableMetadataProvider, PositionProvider, CodePosition, CodeRange
import libcst.matchers as m
import uuid
from openbugger.bugger import deep_equals_with_print

In [2]:
script = "while x < y[10]:\n\tprint(x)\n\tx = x[2] + 1+y[2:10]"
script2 = "while x < y[10]:\n\tprint(x)\n\tx = x[2] + 1+z[2:10]"

Module = cst.parse_module(script)
Module2 = cst.parse_module(script2)
deep_equals_with_print(Module, Module2)

Value mismatch in field 'value': y != z
Value mismatch in field 'value': Name(
    value='y',
    lpar=[],
    rpar=[],
) != Name(
    value='z',
    lpar=[],
    rpar=[],
)
Value mismatch in field 'right': Subscript(
    value=Name(
        value='y',
        lpar=[],
        rpar=[],
    ),
    slice=[
        SubscriptElement(
            slice=Slice(
                lower=Integer(
                    value='2',
                    lpar=[],
                    rpar=[],
                ),
                upper=Integer(
                    value='10',
                    lpar=[],
                    rpar=[],
                ),
                step=None,
                first_colon=Colon(
                    whitespace_before=SimpleWhitespace(
                        value='',
                    ),
                    whitespace_after=SimpleWhitespace(
                        value='',
                    ),
                ),
                second_colon=MaybeSentinel.DEFAULT,
      

False

Useful LibCstDocs: 

https://libcst.readthedocs.io/en/latest/metadata.html#position-metadata

https://libcst.readthedocs.io/en/latest/_modules/libcst/metadata/position_provider.html#PositionProvider

https://libcst.readthedocs.io/en/latest/_modules/libcst/metadata/position_provider.html#WhitespaceInclusivePositionProvidingCodegenState


https://libcst.readthedocs.io/en/latest/parser.html

## PositionContextUpdater and is_modified
Because we want to apply potentially random changes to the code we need to introduce some simple helper methods, contained in the is_modified function that use the meta-data saved from transformers after modifyng a node to prevent any other transformer from applying further modifications to the either Node, its parent or its childrens. 

Since each node modification might introduce or remove code we also use the PositionContextUpdater to maintain the context scratch consistent after each mutation.

In [3]:
from openbugger.context import is_modified, save_modified, is_parent_CodeRange, is_child_CodeRange, is_equal_Coderange, PositionContextUpdater

## Bugger and InverseTransformer

In [4]:
from openbugger.bugger import Bugger, TestTransformer, bugger_example, InverseTransformer

### Example Debugging Output Using the Test Tranformer

In [5]:
transformers = [TestTransformer]
# Get the script as a string it should have  while loop and take multiple lines
script = "while x < y[10]:\n\tprint(x)\n\tx = x[2] + 1+y[2:10]"
bugger_example(transformers,script)


original_code
while x < y[10]:
	print(x)
	x = x[2] + 1+y[2:10]
tainted_code
while x < y[10]:
	print(x)
	x = x[2] + 1+y[2:10]
The result of deep_equals between the concrete syntax tree of the original and the bugged code is True
Checking for bugs...
The following Node has a bug of type TestTransformer-a868 starting at line 1, column 12 and ending at line 1, column 14.
The bug can be fixed by substituting the bugged code-string <10> with the following code-string <10>
The following Node has a bug of type TestTransformer-a868 starting at line 3, column 7 and ending at line 3, column 8.
The bug can be fixed by substituting the bugged code-string <2> with the following code-string <2>
The following Node has a bug of type TestTransformer-a868 starting at line 3, column 16 and ending at line 3, column 20.
The bug can be fixed by substituting the bugged code-string <2:10> with the following code-string <2:10>
Debugging...
clean_code
while x < y[10]:
	print(x)
	x = x[2] + 1+y[2:10]
Checking if 

# Example Bugs 

In this section of the notebook we develop the 5/12 example bugs using LibCST ContextAwareTransformer that use the is_modified method to check if the node was already targeted by a mutation and then apply the mutation and save the mutation to the scratchpad using save_modified method.

The bugs are:

1. incorrect_comparison_operator - Done
2. comparison_swap - Done
3. forgetting_to_update_variable - Done
4. infinite_loop - Done
5. off_by_k_index - Done
6. incorrect_return_value
7. incorrect_boolean_operator
8. using_wrong_type_of_loop
9. using_loop_variable_outside_loop
10. using_variable_before_assignment
11. using_wrong_variable_scope
12. incorrect_use_of_exception_handling
13. incorrect_function_call

The current approach only allows for the scratchpad context to be passed as initialization to the transformers, therefore we use generator functions for the bugs that could take multiple inputs like which operator to swap. These transformers apply the bug to ALL the target instances they find, we will derive some wrapper in the later part of the notebook to control the number of bugs or target a specific code-range.

## LogicalBugs

In [6]:
from openbugger.bugs.logical import gen_ComparisonTargetTransfomer, ComparisonSwapTransformer

### Incorrect Comparison Operator
This bug takes as input two libcst comparison operators and swaps every instance of the first for the second

In [7]:
transformers = [gen_ComparisonTargetTransfomer('==','!=')]
# Get the script as a string
script = "x == 1 + 2 == 3 + 2 != 3 + 4 > 3"

bugger_example(transformers,script)

original_code
x == 1 + 2 == 3 + 2 != 3 + 4 > 3
tainted_code
x != 1 + 2 != 3 + 2 != 3 + 4 > 3
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type ComparisonTargetTransformer-1d15 starting at line 1, column 1 and ending at line 1, column 10.
The bug can be fixed by substituting the bugged code-string < != 1 + 2> with the following code-string < == 1 + 2>
The following Node has a bug of type ComparisonTargetTransformer-1d15 starting at line 1, column 10 and ending at line 1, column 19.
The bug can be fixed by substituting the bugged code-string < != 3 + 2> with the following code-string < == 3 + 2>
Debugging...
clean_code
x == 1 + 2 == 3 + 2 != 3 + 4 > 3
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Comparison Swap

In [8]:
transformers = [ComparisonSwapTransformer]
# Get the script as a string
script = "x == 1 + 2 == 3 + 2 != 3 + 4 > 3 "
bugger_example(transformers,script)

original_code
x == 1 + 2 == 3 + 2 != 3 + 4 > 3 
tainted_code
1 + 2 == x == 3 + 2 != 3 + 4 > 3 
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type ComparisonSwapTransformer-0ce3 starting at line 1, column 0 and ending at line 1, column 32.
The bug can be fixed by substituting the bugged code-string <1 + 2 == x == 3 + 2 != 3 + 4 > 3> with the following code-string <x == 1 + 2 == 3 + 2 != 3 + 4 > 3>
Debugging...
clean_code
x == 1 + 2 == 3 + 2 != 3 + 4 > 3 
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


## ControlFLow Bugs

In [9]:
from openbugger.bugs.controlflow import ForgettingToUpdateVariableTransformer, InfiniteWhileTransformer,gen_OffByKIndexTransformer

### ForgettingToUpdateVariable

In [10]:
transformers = [ForgettingToUpdateVariableTransformer]
# Get the script as a string
script = "while x == 1 + 2 == 3 + 2 != 3 + 4 > 3 : \n  y = 1 + 2"
bugger_example(transformers,script)

original_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3 : 
  y = 1 + 2
tainted_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3 : 
  y = y
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type ForgettingToUpdateVariableTransformer-9e6b starting at line 2, column 2 and ending at line 2, column 7.
The bug can be fixed by substituting the bugged code-string <y = y> with the following code-string <y = 1 + 2>
Debugging...
clean_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3 : 
  y = 1 + 2
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Infinite While Loop

In [11]:
transformers = [InfiniteWhileTransformer]
# Get the script as a string
script = "while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: \n  y = 1 + 2"
bugger_example(transformers,script)

original_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
tainted_code
while True: 
  y = 1 + 2
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type InfiniteWhileTransformer-6dc3 starting at line 1, column 0 and ending at line 2, column 11.
The bug can be fixed by substituting the bugged code-string <while True: 
  y = 1 + 2
> with the following code-string <while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
>
Debugging...
clean_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### OffByKIndex 

In [12]:
transformers = [gen_OffByKIndexTransformer(1)]
# Get the script as a string
script = "while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: \n  y = 1 + 2\nx[1:2]\nx[1]"
bugger_example(transformers,script)

original_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
x[1:2]
x[1]
tainted_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
x[2:3]
x[2]
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type OffByKIndexTransformer-b1b6 starting at line 3, column 2 and ending at line 3, column 5.
The bug can be fixed by substituting the bugged code-string <2:3> with the following code-string <1:2>
The following Node has a bug of type OffByKIndexTransformer-b1b6 starting at line 4, column 2 and ending at line 4, column 3.
The bug can be fixed by substituting the bugged code-string <2> with the following code-string <1>
Debugging...
clean_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
x[1:2]
x[1]
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Incorrect Exception Handling
Catching the wrong exceptions, or incorrect use of the `try/except/finally` block.

In [13]:
class IncorrectExceptionHandlerTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)
    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)
    def leave_ExceptHandler(self, original_node: cst.ExceptHandler, updated_node: cst.ExceptHandler) -> cst.ExceptHandler:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node,meta_pos,self.context)
        if not already_modified:
            updated_node = original_node.with_changes(type=None)
            save_modified(self.context,meta_pos,original_node,updated_node,self.id)
        return updated_node

In [14]:
script= """try:
    x = 1 / 0
except ZeroDivisionError:
    print('Zero division error!')
"""
expected_output= """try:
    x = 1 / 0
except:
    print('Zero division error!')
"""

In [15]:
transformers = [IncorrectExceptionHandlerTransformer]
bugger_example(transformers,script)

original_code
try:
    x = 1 / 0
except ZeroDivisionError:
    print('Zero division error!')

tainted_code
try:
    x = 1 / 0
except :
    print('Zero division error!')

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type IncorrectExceptionHandlerTransformer-c132 starting at line 3, column 0 and ending at line 4, column 33.
The bug can be fixed by substituting the bugged code-string <except :
    print('Zero division error!')
> with the following code-string <except ZeroDivisionError:
    print('Zero division error!')
>
Debugging...
clean_code
try:
    x = 1 / 0
except ZeroDivisionError:
    print('Zero division error!')

Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Missing Argument Transformer
Calling functions in the wrong sequence.

In [16]:
class MissingArgumentTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)
    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        self.reverse = reverse
        return self.transform_module(tree)
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node,meta_pos,self.context)
        if not already_modified and original_node.args:
            updated_node = original_node.with_changes(args=original_node.args[:-1])
            save_modified(self.context,meta_pos,original_node,updated_node,self.id)
        return updated_node


In [17]:
script= """print(str(123))"""
expected_output = """print(str())"""
transformers = [MissingArgumentTransformer]
bugger_example(transformers,script)


original_code
print(str(123))
tainted_code
print(str())
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type MissingArgumentTransformer-039e starting at line 1, column 6 and ending at line 1, column 11.
The bug can be fixed by substituting the bugged code-string <str()> with the following code-string <str(123)>
Debugging...
clean_code
print(str(123))
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### ReturningEarly
Returning from a function before all the necessary computations have been made.

In [18]:
class ReturningEarlyTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)
    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        self.reverse = reverse
        return self.transform_module(tree)
    def leave_Return(self, original_node: cst.Return, updated_node: cst.Return) -> cst.Return:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node,meta_pos,self.context)
        if not already_modified:
            updated_node = cst.Return(value=None)
            save_modified(self.context,meta_pos,original_node,updated_node,self.id)
        return updated_node


In [19]:
script= """def add(x, y):
    return x + y
    print('This will not be printed.')
"""
expected_output = """def add(x, y):
    return
    print('This will not be printed.')
"""
transformers = [ReturningEarlyTransformer]
bugger_example(transformers,script)

original_code
def add(x, y):
    return x + y
    print('This will not be printed.')

tainted_code
def add(x, y):
    return
    print('This will not be printed.')

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type ReturningEarlyTransformer-b830 starting at line 2, column 4 and ending at line 2, column 10.
The bug can be fixed by substituting the bugged code-string <return> with the following code-string <return x + y>
Debugging...
clean_code
def add(x, y):
    return x + y
    print('This will not be printed.')

Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


## Data Related Bugs

### Incorrect variable initialization
Variables are initialized with incorrect values.

In [20]:
import random

DEFAULT_VALUES = {
    int: [1, 2, 3, 4, 5],
    str: ['foo', 'bar', 'baz'],
    list: [[1, 2, 3], ['a', 'b', 'c'], []],
    dict: [{'key': 'value'}, {}, {'num': 1, 'bool': False}],
    bool: [True, False],
    None: [None]
}

from libcst import matchers
from libcst.metadata import ParentNodeProvider


class IncorrectVariableInitializationTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider, ParentNodeProvider)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Assign(self, original_node: cst.Assign, updated_node: cst.Assign) -> cst.Assign:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node,meta_pos,self.context)
        if not already_modified:
            if matchers.matches(updated_node.value, matchers.SimpleString()):
                old_value = updated_node.value.value
                old_value_type = str
            elif matchers.matches(updated_node.value, matchers.Integer()):
                old_value = int(updated_node.value.value)
                old_value_type = int
            elif matchers.matches(updated_node.value, matchers.Float()):
                old_value = float(updated_node.value.value)
                old_value_type = float
            elif matchers.matches(updated_node.value, matchers.Name()):
                old_value = updated_node.value.value
                if old_value == "True" or old_value == "False":
                    old_value_type = bool
                elif old_value == "None":
                    old_value_type = type(None)
                else:
                    return updated_node
            else:
                return updated_node

            new_value = random.choice(DEFAULT_VALUES.get(old_value_type, [0]))
            
            if old_value_type == str:
                new_value = f'"{new_value}"'

            updated_node = original_node.with_changes(value=cst.parse_expression(str(new_value)))
            save_modified(self.context,meta_pos,original_node,updated_node,self.id)
        return updated_node
    
    def leave_List(self, original_node: cst.List, updated_node: cst.List) -> cst.List:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node,meta_pos,self.context)
        
        parent_node = self.get_metadata(ParentNodeProvider, original_node)
        if isinstance(parent_node, cst.Assign):
            if not already_modified and len(updated_node.elements) > 0:
                idx = random.randint(0, len(updated_node.elements)-1)
                new_value = random.choice(DEFAULT_VALUES.get(int, [0]))
                updated_elements = list(updated_node.elements)
                updated_elements[idx] = updated_elements[idx].with_changes(value=cst.Integer(str(new_value)))
                updated_node = updated_node.with_changes(elements=tuple(updated_elements))
                save_modified(self.context,meta_pos,original_node,updated_node,self.id)
        return updated_node
    
    def leave_Dict(self, original_node: cst.Dict, updated_node: cst.Dict) -> cst.Dict:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        parent_node = self.get_metadata(ParentNodeProvider, original_node)
        if isinstance(parent_node, cst.Assign):
            if not already_modified and len(updated_node.elements) > 0:
                idx = random.randint(0, len(updated_node.elements) - 1)
                old_element = updated_node.elements[idx]

                if isinstance(old_element, cst.DictElement):
                    new_value = random.choice(DEFAULT_VALUES.get(str, ["foo", "bar", "baz"]))
                    updated_element = old_element.with_changes(value=cst.SimpleString(f'"{new_value}"'))

                    updated_elements = list(updated_node.elements)
                    updated_elements[idx] = updated_element
                    updated_node = updated_node.with_changes(elements=tuple(updated_elements))
                    save_modified(self.context, meta_pos, original_node, updated_node, self.id)
        return updated_node




In [21]:
script = """var1 = 5
var2 = "hello"
var3 = [1, 2, 3]
var4 = {"key": "value"}
var5 = True
var6 = None
"""
possible_output = """var1 = 2
var2 = "bar"
var3 = ['a', 'b', 'c']
var4 = {}
var5 = False
var6 = None
"""
transformers = [IncorrectVariableInitializationTransformer]
bugger_example(transformers,script)

original_code
var1 = 5
var2 = "hello"
var3 = [1, 2, 3]
var4 = {"key": "value"}
var5 = True
var6 = None

tainted_code
var1 = 5
var2 = "bar"
var3 = [1, 5, 3]
var4 = {"key": "baz"}
var5 = False
var6 = 0

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type IncorrectVariableInitializationTransformer-3aa0 starting at line 1, column 0 and ending at line 1, column 8.
The bug can be fixed by substituting the bugged code-string <var1 = 5> with the following code-string <var1 = 5>
The following Node has a bug of type IncorrectVariableInitializationTransformer-3aa0 starting at line 2, column 0 and ending at line 2, column 12.
The bug can be fixed by substituting the bugged code-string <var2 = "bar"> with the following code-string <var2 = "hello">
The following Node has a bug of type IncorrectVariableInitializationTransformer-3aa0 starting at line 3, column 7 and ending at line 3, column 16

### Variable Name Typo
Mistyping the names of variables, causing them to be treated as new, uninitialized variables.

In [22]:
from libcst import matchers as m

class VariableNameTypoTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
        self.seen_variables = set()

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Assign(self, original_node: cst.Assign, updated_node: cst.Assign) -> cst.Assign:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        if not already_modified:
            targets = []
            for target in original_node.targets:
                if isinstance(target.target, cst.Name):
                    var_name = target.target.value
                    self.seen_variables.add(var_name)
                    extra_character = random.choice(list(self.seen_variables)) if self.seen_variables else ''
                    new_name = cst.Name(var_name + extra_character)
                    target = target.with_changes(target=new_name)
                targets.append(target)
            updated_node = original_node.with_changes(targets=targets)
            save_modified(self.context, meta_pos, original_node, updated_node, self.id)
        return updated_node


In [23]:
script = """var1 = 5
var2 = "hello"
var1 = var2
"""
possible_output = """var1 = 5
var2 = "hello"
var1_var2 = var2
"""
transformers = [VariableNameTypoTransformer]
bugger_example(transformers,script)

original_code
var1 = 5
var2 = "hello"
var1 = var2

tainted_code
var1var1 = 5
var2var2 = "hello"
var1var1 = var2

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type VariableNameTypoTransformer-64be starting at line 1, column 0 and ending at line 1, column 12.
The bug can be fixed by substituting the bugged code-string <var1var1 = 5> with the following code-string <var1 = 5>
The following Node has a bug of type VariableNameTypoTransformer-64be starting at line 2, column 0 and ending at line 2, column 18.
The bug can be fixed by substituting the bugged code-string <var2var2 = "hello"> with the following code-string <var2 = "hello">
The following Node has a bug of type VariableNameTypoTransformer-64be starting at line 3, column 0 and ending at line 3, column 15.
The bug can be fixed by substituting the bugged code-string <var1var1 = var2> with the following code-string <var1 = var

### Mutable Default Arguments
Using mutable types (like lists or dictionaries) as default function arguments.

In [24]:
from typing import Optional
class MutableDefaultArgumentTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
        self.used_variables = set()
        self.first_pass = True

    def visit_Name(self, node: cst.Name) -> Optional[bool]:
        # During the first pass, record all the variable names used in the module
        if self.first_pass:
            self.used_variables.add(node.value)
        return None

    def leave_FunctionDef(
        self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef
    ) -> cst.FunctionDef:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        # Only modify function definitions during the second pass
        if not self.first_pass and self.used_variables and not already_modified:
            # Only replace the last variable in the parameters list
            variable_to_replace = updated_node.params.params[-1].name.value if updated_node.params.params else None
            if variable_to_replace and variable_to_replace in self.used_variables:
                self.used_variables.remove(variable_to_replace)
                
                # Build new parameters, replacing the last variable with a mutable default argument
                new_params_list = list(updated_node.params.params[:-1])
                new_params_list.append(cst.Param(name=cst.Name(variable_to_replace), default=cst.List([])))
                new_params = cst.Parameters(params=tuple(new_params_list))

                
                # Build a new function body that modifies the mutable default argument
                assignment = cst.parse_statement(f"{variable_to_replace}.append(1)")

                # If the body is non-empty, insert the assignment before the last statement. 
                # Otherwise, add the assignment as the only statement.
                if updated_node.body.body:
                    new_body = cst.IndentedBlock(
                        body=list(updated_node.body.body[:-1]) + [assignment] + [updated_node.body.body[-1]]
                    )
                else:
                    new_body = cst.IndentedBlock(
                        body=[assignment]
                    )

                # Return the updated function definition
                updated_node = updated_node.with_changes(params=new_params, body=new_body)
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
                return updated_node
        return updated_node




    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        # First pass: collect variable names
        self.first_pass = True
        self.transform_module(tree)
        # Second pass: introduce bugs
        self.first_pass = False
        return self.transform_module(tree)


In [25]:
script = """def func1(arg1, arg2, arg3):
    return arg1 + arg2"""

transformers = [MutableDefaultArgumentTransformer]
bugger_example(transformers,script)

original_code
def func1(arg1, arg2, arg3):
    return arg1 + arg2
tainted_code
def func1(arg1, arg2, arg3 = []):
    arg3.append(1)
    return arg1 + arg2
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type MutableDefaultArgumentTransformer-6491 starting at line 1, column 0 and ending at line 3, column 22.
The bug can be fixed by substituting the bugged code-string <def func1(arg1, arg2, arg3 = []):
    arg3.append(1)
    return arg1 + arg2
> with the following code-string <def func1(arg1, arg2, arg3):
    return arg1 + arg2
>
Debugging...
clean_code
def func1(arg1, arg2, arg3):
    return arg1 + arg2
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Using Variable before Assignment

In [26]:
from typing import List, Dict
import libcst as cst
from libcst.metadata import PositionProvider
from libcst.codemod import CodemodContext, ContextAwareTransformer
import uuid
import random  # Added

class UseBeforeDefinitionTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
        self.first_pass = True
        self.function_scopes: Dict[str, List[str]] = {}
        self.current_function = None

    def visit_FunctionDef(self, node: cst.FunctionDef) -> None:
        self.current_function = node.name.value
        if self.first_pass:
            self.function_scopes[self.current_function] = []

    def visit_Param(self, node: cst.Param) -> None:
        if self.first_pass:
            self.function_scopes[self.current_function].append(node.name.value)

    def leave_FunctionDef(self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef) -> cst.FunctionDef:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        if not self.first_pass and not already_modified:
            local_vars = self.function_scopes[updated_node.name.value]
            if len(local_vars) >= 2:
                # Choose two variable names randomly
                var1, var2 = random.sample(local_vars, 2)
                new_var_name = var1 + var2
                
                # Create an expression with an undefined variable before its definition.
                new_statement = cst.parse_statement(f"{new_var_name} = {var1} + 1")
                new_body = cst.IndentedBlock(
                    body=[new_statement] + list(updated_node.body.body)
                )
                updated_node = updated_node.with_changes(body=new_body)
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
                return updated_node
        return updated_node

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        self.first_pass = True
        self.transform_module(tree)  # First pass: collect function parameters
        self.first_pass = False
        return self.transform_module(tree)  # Second pass: introduce bugs


In [27]:
script = """def add_two_numbers(a, b):
    return a + b"""
expected_output = """def add_two_numbers(a, b):
    return a + temp  # 'temp' is used before it's defined
    temp = b"""
transformers = [UseBeforeDefinitionTransformer]
bugger_example(transformers,script)

original_code
def add_two_numbers(a, b):
    return a + b
tainted_code
def add_two_numbers(a, b):
    ba = b + 1
    return a + b
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type UseBeforeDefinitionTransformer-36d9 starting at line 1, column 0 and ending at line 3, column 16.
The bug can be fixed by substituting the bugged code-string <def add_two_numbers(a, b):
    ba = b + 1
    return a + b
> with the following code-string <def add_two_numbers(a, b):
    return a + b
>
Debugging...
clean_code
def add_two_numbers(a, b):
    return a + b
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


## Type-related Bugs

# Incorrect type used: 
Using an integer when a string was expected, or vice versa.

In [28]:

class IncorrectTypeTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)
    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
        self.type_transformations = {
            cst.Integer: cst.SimpleString,
            cst.SimpleString: cst.Integer,
            cst.Float: cst.Integer,
        }

    def leave_Integer(self, original_node: cst.Integer, updated_node: cst.Integer) -> cst.CSTNode:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        if matchers.matches(updated_node, m.Integer()) and not already_modified:
            updated_node = cst.SimpleString(f'"{original_node.value}"')
            save_modified(self.context, meta_pos, original_node, updated_node, self.id)
            return updated_node
        return updated_node

    def leave_SimpleString(self, original_node: cst.SimpleString, updated_node: cst.SimpleString) -> cst.CSTNode:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        if matchers.matches(updated_node, m.SimpleString()) and not already_modified:
            # Removing the quotes around the string before converting to integer
            value = original_node.value.strip('\"\'')
            try:
                int_value = int(value)
                updated_node = cst.Integer(str(int_value))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
                return updated_node
            except ValueError:
                # If the string cannot be converted to an integer, keep it as a string
                return updated_node
        return updated_node

    def leave_Float(self, original_node: cst.Float, updated_node: cst.Float) -> cst.CSTNode:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        if matchers.matches(updated_node, m.Float()) and not already_modified:
            updated_node = cst.Integer(str(int(float(str(original_node.value)))))
            save_modified(self.context, meta_pos, original_node, updated_node, self.id)
            return updated_node
        return updated_node

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)


In [29]:
script = """def function():
    return 10"""
# script  = """def function():
#     return "10" """
# # script = """def function():
# #     return 10.5"""
transformers = [IncorrectTypeTransformer]
bugger_example(transformers,script)


original_code
def function():
    return 10
tainted_code
def function():
    return "10"
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type IncorrectTypeTransformer-16d8 starting at line 2, column 11 and ending at line 2, column 15.
The bug can be fixed by substituting the bugged code-string <"10"> with the following code-string <10>
Debugging...
clean_code
def function():
    return 10
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Calling non existing Methods

In [30]:
from libcst import matchers as m
from libcst import MaybeSentinel, RemovalSentinel
from libcst.codemod import ContextAwareTransformer
from libcst.codemod.visitors import AddImportsVisitor
from libcst.metadata import PositionProvider
import uuid
from typing import Dict, Union
import libcst as cst
import copy

class NonExistingMethodTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    # Define a mapping from existing methods to non-existing methods
    METHOD_TRANSFORM_MAP: Dict[str, str] = {
        "append": "update",  # list to dictionary
        "add": "extend",  # set to list
        "update": "add",  # dictionary to set
        "extend": "append"  # list to list
    }

    def __init__(self, context):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
        self.mutated = False

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        if m.matches(updated_node, m.Call(func=m.Attribute())):
            attribute = updated_node.func
            if isinstance(attribute.attr, cst.Name):
                attr_name = attribute.attr.value
                if attr_name in self.METHOD_TRANSFORM_MAP:
                    updated_node = updated_node.with_changes(func=attribute.with_changes(attr=cst.Name(self.METHOD_TRANSFORM_MAP[attr_name])))
                    save_modified(self.context, self.get_metadata(PositionProvider, original_node), original_node, updated_node, self.id)
        return updated_node


In [31]:
script = """my_list = [1, 2, 3]
my_list.append(4)
"""
transformers = [NonExistingMethodTransformer]
bugger_example(transformers,script)

original_code
my_list = [1, 2, 3]
my_list.append(4)

tainted_code
my_list = [1, 2, 3]
my_list.update(4)

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NonExistingMethodTransformer-e353 starting at line 2, column 0 and ending at line 2, column 17.
The bug can be fixed by substituting the bugged code-string <my_list.update(4)> with the following code-string <my_list.append(4)>
Debugging...
clean_code
my_list = [1, 2, 3]
my_list.append(4)

Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


# Using wrong type of loop
 Using a `for` loop when a `while` loop would be more appropriate, and vice versa.

In [32]:
from typing import Optional
import libcst as cst
from libcst.metadata import PositionProvider
from libcst.codemod import CodemodContext, ContextAwareTransformer
import uuid

class SwapForTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def leave_For(self, original_node: cst.For, updated_node: cst.For) -> cst.BaseSmallStatement:
        # For to While
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        if not already_modified:
            # Syntax for for loop is 'for target in iterable'
            target = original_node.target
            iterable = original_node.iter
            # While loops are structured 'while condition'
            # For the condition we can use an iterator, a workaround could be using `iter` function
            condition = cst.Call(func=cst.Name("next"), args=[cst.Arg(iterable)])
            while_node = cst.While(test=condition, body=updated_node.body)
            save_modified(self.context, meta_pos, original_node, while_node, self.id)
            return while_node

        return updated_node

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)


In [33]:
script = """for i in range(10):
    print(i)"""

transformers = [SwapForTransformer]
bugger_example(transformers,script)

original_code
for i in range(10):
    print(i)
tainted_code
while next(range(10)):
    print(i)
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type SwapForTransformer-e301 starting at line 1, column 0 and ending at line 2, column 12.
The bug can be fixed by substituting the bugged code-string <while next(range(10)):
    print(i)
> with the following code-string <for i in range(10):
    print(i)
>
Debugging...
clean_code
for i in range(10):
    print(i)
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


## Numpy Bugs

### numpy creation bug
Initalizes a numpy array with empty instead of array

In [34]:
class NumpyArrayCreationTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        
        if not already_modified:
            if m.matches(original_node, m.Call(func=m.Attribute(value=m.Name("np"), attr=m.Name("array")))):
                updated_node = original_node.with_changes(func=cst.Attribute(value=cst.Name("np"), attr=cst.Name("empty")))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
            elif m.matches(original_node, m.Call(func=m.Attribute(value=m.Name("numpy"), attr=m.Name("array")))):
                updated_node = original_node.with_changes(func=cst.Attribute(value=cst.Name("numpy"), attr=cst.Name("empty")))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)

        return updated_node


In [35]:
script = """def create_array():
    arr = numpy.array([1, 2, 3])
    return arr"""
transformers = [NumpyArrayCreationTransformer]
bugger_example(transformers,script)

original_code
def create_array():
    arr = numpy.array([1, 2, 3])
    return arr
tainted_code
def create_array():
    arr = numpy.empty([1, 2, 3])
    return arr
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NumpyArrayCreationTransformer-7191 starting at line 2, column 10 and ending at line 2, column 32.
The bug can be fixed by substituting the bugged code-string <numpy.empty([1, 2, 3])> with the following code-string <numpy.array([1, 2, 3])>
Debugging...
clean_code
def create_array():
    arr = numpy.array([1, 2, 3])
    return arr
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Numpy Sort to argsort 

In [36]:
class NumpyMethodMisuseTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"
    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)
        
        if not already_modified:
            if m.matches(original_node, m.Call(func=m.Attribute(value=m.Name("np"), attr=m.Name("sort")))):
                updated_node = original_node.with_changes(func=cst.Attribute(value=cst.Name("np"), attr=cst.Name("argsort")))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
            elif m.matches(original_node, m.Call(func=m.Attribute(value=m.Name("numpy"), attr=m.Name("sort")))):
                updated_node = original_node.with_changes(func=cst.Attribute(value=cst.Name("numpy"), attr=cst.Name("argsort")))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
        return updated_node


In [37]:
script = """
import numpy as np

def sort_array():
    arr = np.array([3, 1, 2])
    sorted_arr = np.sort(arr)
    return sorted_arr
"""
transformers = [NumpyMethodMisuseTransformer]
bugger_example(transformers,script)

original_code

import numpy as np

def sort_array():
    arr = np.array([3, 1, 2])
    sorted_arr = np.sort(arr)
    return sorted_arr

tainted_code

import numpy as np

def sort_array():
    arr = np.array([3, 1, 2])
    sorted_arr = np.argsort(arr)
    return sorted_arr

The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NumpyMethodMisuseTransformer-5f6e starting at line 6, column 17 and ending at line 6, column 32.
The bug can be fixed by substituting the bugged code-string <np.argsort(arr)> with the following code-string <np.sort(arr)>
Debugging...
clean_code

import numpy as np

def sort_array():
    arr = np.array([3, 1, 2])
    sorted_arr = np.sort(arr)
    return sorted_arr

Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### WrongReshape

In [38]:
import random

class NumpyReshapeMisuseTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        # Matching a Call node for reshape function with a list as the first argument
        reshape_matcher = m.Call(
            func=m.Attribute(value=m.Name(), attr=m.Name("reshape")),
            args=[m.AtLeastN(n=1, matcher=m.Arg(m.List()))]
        )

        if not already_modified and m.matches(original_node, reshape_matcher):
            first_arg = updated_node.args[0].value
            elements = first_arg.elements
            first_elem = elements[0].value
            if isinstance(first_elem, cst.Integer):
                first_elem = int(elements[0].value.value)  # Convert libcst.Integer to Python int
                random_increment = random.randint(1, len(elements))  # Generate a random number
                new_first_elem = cst.Element(cst.Integer(str(first_elem + random_increment)))  
                new_elements = [new_first_elem] + list(elements[1:])
                updated_node = updated_node.with_changes(args=[cst.Arg(cst.List(new_elements))])
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)

        return updated_node


In [39]:
script = """def reshape_array():
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    reshaped_arr = arr.reshape([2, 3])
    return reshaped_arr"""
transformers = [NumpyReshapeMisuseTransformer]
bugger_example(transformers,script)

original_code
def reshape_array():
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    reshaped_arr = arr.reshape([2, 3])
    return reshaped_arr
tainted_code
def reshape_array():
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    reshaped_arr = arr.reshape([4, 3])
    return reshaped_arr
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NumpyReshapeMisuseTransformer-e2e2 starting at line 3, column 19 and ending at line 3, column 38.
The bug can be fixed by substituting the bugged code-string <arr.reshape([4, 3])> with the following code-string <arr.reshape([2, 3])>
Debugging...
clean_code
def reshape_array():
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    reshaped_arr = arr.reshape([2, 3])
    return reshaped_arr
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### numpy arange

In [40]:
class NumpyArangeMisuseTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        arange_matcher = m.Call(
            func=m.Attribute(value=m.Name(), attr=m.Name("arange")),
            args=[m.AtLeastN(n=1)]
        )

        if not already_modified and m.matches(original_node, arange_matcher):
            first_arg = updated_node.args[0].value
            if isinstance(first_arg, cst.Integer):
                stop_val = int(first_arg.value)
                decimal_increment = round(random.uniform(0.1, 0.9), 1)  # Generate a random decimal number
                new_stop_val = cst.Arg(cst.Float(str(stop_val + decimal_increment)))  
                updated_node = updated_node.with_changes(args=[new_stop_val] + list(updated_node.args[1:]))
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)
        return updated_node

In [41]:
script= """data = np.arange(10)"""
transformers = [NumpyArangeMisuseTransformer]
bugger_example(transformers,script)

original_code
data = np.arange(10)
tainted_code
data = np.arange(10.6)
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NumpyArangeMisuseTransformer-bb89 starting at line 1, column 7 and ending at line 1, column 22.
The bug can be fixed by substituting the bugged code-string <np.arange(10.6)> with the following code-string <np.arange(10)>
Debugging...
clean_code
data = np.arange(10)
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


### Numpy axis misuses

In [42]:
class NumpyAxisMisuseTransformer(ContextAwareTransformer):
    METADATA_DEPENDENCIES = (PositionProvider,)

    def __init__(self, context: CodemodContext):
        super().__init__(context)
        self.id = f"{self.__class__.__name__}-{uuid.uuid4().hex[:4]}"

    def mutate(self, tree: cst.Module, reverse: bool = False) -> cst.Module:
        return self.transform_module(tree)

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        meta_pos = self.get_metadata(PositionProvider, original_node)
        already_modified = is_modified(original_node, meta_pos, self.context)

        # Define the matchers for the numpy functions that we want to target
        func_matchers = [
            m.Attribute(value=m.Name(), attr=m.Name(func_name)) 
            for func_name in ["sum", "mean", "min", "max", "std", "var"]
        ]

        axis_arg_matcher = m.Arg(keyword=m.Name("axis"), value=m.Integer())
        
        if not already_modified and any(m.matches(original_node, m.Call(func=func_matcher)) for func_matcher in func_matchers):
            new_args = []
            for arg in updated_node.args:
                # If the argument matches the "axis" keyword argument with an integer value
                if m.matches(arg, axis_arg_matcher):
                    axis_val = int(arg.value.value)
                    new_arg = cst.Arg(keyword=cst.Name("axis"), value=cst.Integer(str((axis_val + 1) % 2)))
                    new_args.append(new_arg)
                else:
                    new_args.append(arg)

            if new_args != updated_node.args:
                updated_node = updated_node.with_changes(args=new_args)
                save_modified(self.context, meta_pos, original_node, updated_node, self.id)

        return updated_node


In [43]:
script = """data = numpy.array([[1, 2, 3], [4, 5, 6]])
sum_along_axis = numpy.sum(data, axis=16)"""
transformers = [NumpyAxisMisuseTransformer]
bugger_example(transformers,script)

original_code
data = numpy.array([[1, 2, 3], [4, 5, 6]])
sum_along_axis = numpy.sum(data, axis=16)
tainted_code
data = numpy.array([[1, 2, 3], [4, 5, 6]])
sum_along_axis = numpy.sum(data, axis = 1)
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type NumpyAxisMisuseTransformer-80ca starting at line 2, column 17 and ending at line 2, column 42.
The bug can be fixed by substituting the bugged code-string <numpy.sum(data, axis = 1)> with the following code-string <numpy.sum(data, axis=16)>
Debugging...
clean_code
data = numpy.array([[1, 2, 3], [4, 5, 6]])
sum_along_axis = numpy.sum(data, axis=16)
Checking if the debugged code is equal to the original code..
The result of deep_equals between the concrete syntax tree of the original and debugged code is True


## Chaining Multiple Bugs

Bugs can be chained in a sequence and they are executed in an order, bugs do not modify nodes that have already been modified or that either their children or parent has been modified. As an example we are going to all InfiniteWhiletransformer, OffByKIndexTransformer and ComparisonTargetTransfomer over the script:

while x == 1 + 2 == 3 + 2 != 3 + 4 > z[4]:  y = 1 + 2\nx[1:2]\nx[1]\nx[1]==3 

Comparison target and OffByKIndex only modify targets that are outside the while target statement.

In [44]:
transformers = [InfiniteWhileTransformer,gen_OffByKIndexTransformer(1),gen_ComparisonTargetTransfomer("==",">=")]
# Get the script as a string
script = "while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: \n  y = 1 + 2\nx[1:2]\nx[1]\nx[1]==3"
bugger_example(transformers,script)

original_code
while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
x[1:2]
x[1]
x[1]==3
tainted_code
while True: 
  y = 1 + 2
x[2:3]
x[2]
x[2] >= 3
The result of deep_equals between the concrete syntax tree of the original and the bugged code is False
Checking for bugs...
The following Node has a bug of type InfiniteWhileTransformer-577c starting at line 1, column 0 and ending at line 2, column 11.
The bug can be fixed by substituting the bugged code-string <while True: 
  y = 1 + 2
> with the following code-string <while x == 1 + 2 == 3 + 2 != 3 + 4 > 3: 
  y = 1 + 2
>
The following Node has a bug of type OffByKIndexTransformer-1193 starting at line 3, column 2 and ending at line 3, column 5.
The bug can be fixed by substituting the bugged code-string <2:3> with the following code-string <1:2>
The following Node has a bug of type OffByKIndexTransformer-1193 starting at line 4, column 2 and ending at line 4, column 3.
The bug can be fixed by substituting the bugged code-string <2> with 