#README

**README**


*   Execute cell 1 to install asts library and import it. All other cells depend on cell 1.
*   Upload the **code-samples** zip folder and use cell 2 to unzip the contents.
Upload the resolved labels csv file.
*   Create a folder named **mutated-data** to contain the mutated code files.

To get mutated data from specific label values execute the first three scripts below.  
[1. GetMutation method](#scrollTo=FO_zo_t1CWKv&line=1&uniqifier=1)

[2. Process resolved Labels CSV and get mutated code](#scrollTo=myUtvjQQCTzl&line=1&uniqifier=1)

[3. Create mutated CSV files](#scrollTo=7IWTa1EwfYjI)

For the second script, we need to do three things:
* Upload one of the two resolved labels csv file (*resolved_range_based_labels.csv* or *resolved_value_based_labels.csv*)
* Set the value of **process_csv_file_name** variable to the uploaded resolved labels csv file
* Uncomment the code for the specific case for the type of mutation we want to perform. Set the value of variable **value** per case. Cases 1,2 and 3 are for value-based mutations. Cases 5 and 6 are for range-based mutations.

For the third script we must change the **mutated_data_folder** variable name to the **mutated-data** folder name created earlier in which to store the mutated files.

After executing the three scripts we may download some log files, the mutation error report files and success report file manually.

The fourth script copies the files of mutated data to google drive for download. Need to change the value of **drive_folder** variable to your personal google drive folder.

[4. Copy mutated data to google drive](#scrollTo=iWY7llGDv1AV&line=1&uniqifier=1)

**Install asts library and import**

In [1]:
! pip3 install asts
import asts


Collecting asts
  Downloading asts-0.9.7-py3-none-manylinux_2_24_x86_64.whl (35.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.7/35.7 MB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting backports.cached-property (from asts)
  Downloading backports.cached_property-1.0.2-py3-none-any.whl (6.1 kB)
Installing collected packages: backports.cached-property, asts
Successfully installed asts-0.9.7 backports.cached-property-1.0.2


In [2]:
!unzip code-samples.zip

Archive:  code-samples.zip
   creating: code-samples/.vscode/
  inflating: code-samples/.vscode/settings.json  
  inflating: code-samples/Clone0.java  
  inflating: code-samples/Clone105.java  
  inflating: code-samples/Clone107.java  
  inflating: code-samples/Clone109.java  
  inflating: code-samples/Clone12.java  
  inflating: code-samples/Clone125.java  
  inflating: code-samples/Clone127.java  
  inflating: code-samples/Clone13.java  
  inflating: code-samples/Clone131.java  
  inflating: code-samples/Clone132.java  
  inflating: code-samples/Clone134.java  
  inflating: code-samples/Clone136.java  
  inflating: code-samples/Clone138.java  
  inflating: code-samples/Clone142.java  
  inflating: code-samples/Clone154.java  
  inflating: code-samples/Clone159.java  
  inflating: code-samples/Clone16.java  
  inflating: code-samples/Clone162.java  
  inflating: code-samples/Clone165.java  
  inflating: code-samples/Clone168.java  
  inflating: code-samples/Clone172.java  
  inflating

# Get Mutated Code (MethodDef)

In [3]:
import asts  # Assuming you have imported the required module
import csv
def add_escape_before_double_quotes(input_string):
    return input_string.replace('"', '\\"')

def get_substrings(original_string, positions):
    substrings = []

    for start, end in sorted(positions, key=lambda x: x[0]):  # Sort positions by start
        start = start - 2
        end = end -2
        if start < 0 or end < start or end > len(original_string):
            # Handle invalid input for each position
            continue

        substring = original_string[start:end]
        substrings.append(substring)

    return substrings


def get_closest_substrings(source_ranges, positions):

        closest_node = None
        closest_distance = float('inf')  # Initialize with a large value
        closest_nodes = []

        for start, end in positions:
            closest_node = None
            closest_distance = float('inf')  # Initialize with a large value
            closest_start_distance = float('inf')
            closest_end_distance = float('inf')

            for node, (node_start, node_end) in source_ranges:
              start_distance = abs(start - node_start[1])
              end_distance = abs(end - node_end[1])
              # print(start_distance)
              # print(end_distance)

              if (start_distance + end_distance)  <= (closest_start_distance + closest_end_distance):
                  closest_start_distance = start_distance
                  closest_end_distance = end_distance

                  closest_node = node
            if closest_node is not None and closest_node not in closest_nodes:
                  closest_nodes.append(closest_node.source_text)
            return closest_nodes



def remove_substrings(original_string, substrings):
    result = original_string

    for substring in substrings:
        result = result.replace(substring, '', 1)  # Remove the first occurrence of the substring

    return result

def getMutatedCode(filename, linenum, originalcode, positions):

    muterror_filename = 'muterrorreport.csv'
    successmutation_filename = 'successfulmutationstatusreport.csv'

    with open(muterror_filename,'a') as error_file, open(successmutation_filename, 'a') as success_file:
        csvwriter = csv.writer(error_file)
        csvwritersuccess = csv.writer(success_file)

        # Determine the number of spaces at the beginning of the original code
        leading_spaces = len(originalcode) - len(originalcode.lstrip())

        # Filter positions based on end value and adjust start and end positions
        positions = [
            (max(1, start - leading_spaces + 1), end - leading_spaces + 1)
            for start, end in positions if end > leading_spaces
        ]
        print(positions)

        # print('original code:', originalcode)
        try:
            root = asts.AST.from_string(
                originalcode,
                language=asts.ASTLanguage.Java,
                    deepest= True
            )
            source_ranges = root.ast_source_ranges()
            print(source_ranges)

            for node, (node_start, node_end) in source_ranges:
                source_text = originalcode[node_start[0] - 1:node_end[0]]
                print(f"Source Text: {node.source_text}, Start: {node_start[1]}, End: {node_end[1]}")
        except:
          print('issue with ast creation from input string')
          error_items = ['issue with ast creation from input string',node, '', filename, linenum, '']
          all_errors = ', '.join(str(item) for item in error_items)
          csvwriter = csv.writer(error_file)
          return 'error'

          # csvwriter.writerow('issue with ast creation from input string')
        try:
            for a in root.traverse():
                if isinstance(a, asts.types.JavaErrorTree) or isinstance(a, asts.types.JavaErrorVariationPointTree):
                    # founderrornode = True
                    print('original code has errortree')
                    # make special cut
                    # Get specified substrings
                    substrings_to_remove = get_closest_substrings(source_ranges, positions)
                    # Remove specified substrings
                    specialmutation = remove_substrings(root.source_text, substrings_to_remove)
                    success_items = ['special replace success', '-','-', filename, linenum, specialmutation]
                    all_success = ', '.join(str(item) for item in success_items)
                    csvwritersuccess.writerow([all_success])
                    # return mutataion
                    print('specialmutation',specialmutation)
                    return specialmutation

        except Exception as e:
            print('issue with special mutation', e)

            error_items = ['issue with special mutataion',node, '', filename, linenum, e]
            all_errors = ', '.join(str(item) for item in error_items)
            csvwriter = csv.writer(error_file)
            return 'error'

        # print(source_ranges)
        closest_node = None
        closest_distance = float('inf')  # Initialize with a large value
        closest_nodes = []

        for start, end in positions:
            closest_node = None
            closest_distance = float('inf')  # Initialize with a large value
            closest_start_distance = float('inf')
            closest_end_distance = float('inf')

            for node, (node_start, node_end) in source_ranges:
              start_distance = abs(start - node_start[1])
              end_distance = abs(end - node_end[1])
              # print(start_distance)
              # print(end_distance)

              if (start_distance + end_distance)  <= (closest_start_distance + closest_end_distance):
                  closest_start_distance = start_distance
                  closest_end_distance = end_distance

                  closest_node = node
            if closest_node is not None and closest_node not in closest_nodes:
                  closest_nodes.append(closest_node)
            # print(f"Closest node for start {start} and end {end}: {closest_node.source_text}")


        # print("Array of closest nodes:", closest_nodes)
        mutationcounter = 0
        for node in closest_nodes:
          print(node.ast_at_point)
          label_Selection = node.source_text
          print('labelselection', label_Selection)

          if node != root:
            itsparent = node.parent(root)
            try:
                newroot = asts.AST.cut(root, node)
                print('modified code:', newroot.source_text)
                # print('cut ok')
                root = newroot
                success_items = ['node cut success', node, itsparent, filename, linenum, label_Selection]
                all_success = ', '.join(str(item) for item in success_items)
                csvwritersuccess.writerow([all_success])
                mutationcounter = mutationcounter + 1
                continue

            except:
                # print('caught exception: cannot mutate the selected node')
                      print('cannot cut node')
                      error_items = ['cannot cut node',node, itsparent, filename, linenum, label_Selection]
                      all_errors = ', '.join(str(item) for item in error_items)
                      csvwriter = csv.writer(error_file)
            # try:
            #         newroot = asts.AST.replace(root, node,'')
            #         # print('modified code:', newroot.source_text)
            #         print('node replaced with space')
            #         root = newroot
            #         return newroot.source_text
            # except:


                      # print('cannot replace node with space')
            itsparent = node.parent(root)
                      # print('its parent: ',itsparent.source_text, itsparent.ast_source_ranges()[0][0])
                      # check if node is a return statement
            if isinstance(node, asts.types.JavaReturnStatement) or isinstance(node, asts.types.JavaReturnStatement0) or isinstance(node, asts.types.JavaReturnStatement1):
                        #remove the return variable from ast
                try:
                  returnstmtidentifier = node.children[0]
                        # print(returnstmtidentifier)
                  newroot = asts.AST.cut(root, returnstmtidentifier)
                        # print('modified code:', newroot.source_text)
                  root = newroot
                  success_items = ['return stmt mutation success', node, itsparent, filename, linenum, label_Selection]
                  all_success = ', '.join(str(item) for item in success_items)
                  csvwritersuccess.writerow([all_success])
                  mutationcounter = mutationcounter + 1
                  continue

                except:
                      error_items = ['cannot mutate return statement',node, itsparent, filename, linenum, label_Selection]
                      all_errors = ', '.join(str(item) for item in error_items)
                      csvwriter = csv.writer(error_file)



                      # check the parent is a binary expression
            elif isinstance(itsparent, asts.types.JavaBinaryExpression):
                   try:
                        rhs = itsparent.children[-1]
                        lhs = itsparent.children[-3]
                        # print(lhs.source_text)
                        # print()
                        if (label_Selection in lhs.source_text) or (label_Selection == lhs.source_text):
                          newroot = asts.AST.replace(root, itsparent, rhs)
                          # print('modified code:', newroot.source_text)
                          root = newroot

                        else:
                          newroot = asts.AST.replace(root, itsparent, lhs)
                          # print('modified code:', newroot.source_text)
                          root = newroot
                        success_items = ['parent binary expression mutation success', node, itsparent, filename, linenum, label_Selection]
                        all_success = ', '.join(str(item) for item in success_items)
                        csvwritersuccess.writerow([all_success])
                        mutationcounter = mutationcounter + 1
                        continue
                   except:
                        error_items = ['cannot mutate parent binary expr',node, itsparent, filename, linenum, label_Selection]
                        all_errors = ', '.join(str(item) for item in error_items)
                        csvwriter = csv.writer(error_file)

            elif isinstance(itsparent, asts.types.JavaArgumentList) or isinstance(itsparent, asts.types.JavaArgumentList0) or isinstance(itsparent, asts.types.JavaArgumentList1):
                          # print('Ifstatement')
                          # replace with empty string
                    try:
                          for child in itsparent.children:
                              newroot = asts.AST.cut(root, child)
                              # print('modified code:', newroot.source_text)
                              root = newroot
                          success_items = ['arg list mutation success', node, itsparent, filename, linenum, label_Selection]
                          all_success = ', '.join(str(item) for item in success_items)
                          csvwritersuccess.writerow([all_success])
                          mutationcounter = mutationcounter + 1
                          continue
                    except:
                        error_items = ['cannot mutate arg list',node, itsparent, filename, linenum, label_Selection]
                        all_errors = ', '.join(str(item) for item in error_items)
                        csvwriter = csv.writer(error_file)

            # elif isinstance(node, asts.types.JavaTypeIdentifier):
            #               newroot = asts.AST.replace(root, node,'')
            #               root = newroot

            else:

                          print('Node is not processed:',node, itsparent, filename, linenum, label_Selection )
                          # with open(muterror_filename,'a') as error_file:
                          error_items = ['no mutation worked',node, itsparent, filename, linenum, label_Selection]
                          all_errors = ', '.join(str(item) for item in error_items)
                          csvwriter = csv.writer(error_file)
                          csvwriter.writerow([all_errors])
                          return 'remove'

                          # replace with empty string
                          try:
                              if root != itsparent:

                                newroot = asts.AST.replace(root, itsparent,'')

                                print(newroot.ast_source_ranges())
                                print('modified code:', newroot.source_text)
                                root = newroot
                          except Exception as e:
                              print("Error while processing AST:")
                              with open(muterror_filename,'a') as error_file:
                                  error_items = ['parent replacement failed',node, itsparent, filename, linenum, label_Selection]
                                  all_errors = ', '.join(str(item) for item in error_items)
                                  csvwriter = csv.writer(error_file)
                                  csvwriter.writerow([all_errors])
                              return 'asterror'


          else:
            print ('root and node are the same')
            with open(muterror_filename,'a') as error_file:
                        error_items = ['closest node is root',node, '-', filename, linenum, label_Selection]
                        all_errors = ', '.join(str(item) for item in error_items)
                        csvwriter = csv.writer(error_file)
                        csvwriter.writerow([all_errors])
            return 'remove'


        newroot = root
        try:
            print(newroot.source_text)
            success_items = ['mutation success', node, itsparent, filename, linenum, label_Selection]
            all_success = ', '.join(str(item) for item in success_items)
            csvwritersuccess.writerow([all_success])
            return newroot.source_text
        except:
              # with open(muterror_filename,'a') as error_file:
              error_items = ['invalid AST after mutation',node, '-', filename, linenum, label_Selection]
              all_errors = ', '.join(str(item) for item in error_items)
              csvwriter = csv.writer(error_file)
              csvwriter.writerow([all_errors])
              return 'remove'

# Example usage
originalcode = "    if (s == null || size <= s.length ()) return s;"
positions = [(5, 18)]  # List of input start and end positions

# originalcode = "    int before = (len - text.length ()) / 2;"
# positions = [(1, 4),(4,44)]  # List of input start and end positions

# originalcode = "    try(File in = new File ())"
# positions = [(5,8)]

# print("Adjusted positions:", positions)

# mutated_code = getMutatedCode('DummyFile.java',1,originalcode, positions)
# print('Mutated Code:', mutated_code)


# Process Resolved Labels CSV and get mutated code


---



In [8]:
process_csv_file_name = "lxresolution_280_authorlabels.csv"

import csv
import linecache
# import asts

class LineData:
    def __init__(self, positions, codes, labelvalues):
        # self.mutop = mutop
        # self.modifiedcode = modifiedcode
        self.positions = positions
        self.codes = codes
        self.labelvalues = labelvalues

class MutatedLineData:
    def __init__(self, mutatedcode,mutop):

        self.mutatedcode = mutatedcode
        self.mutop = mutop

def find_lonely_brackets(java_code):
    stack = []
    unmatched_brackets = []

    # Define bracket pairs
    bracket_pairs = {')': '(', '}': '{', ']': '['}

    for i, char in enumerate(java_code):
        if char in '([{':
            stack.append((char, i))
        elif char in ')]}':
            if not stack:
                unmatched_brackets.append((char, i))
            else:
                top_char, _ = stack.pop()
                if top_char != bracket_pairs[char]:
                    unmatched_brackets.append((top_char, i))

    # Any remaining characters in the stack are unmatched
    while stack:
        char, i = stack.pop()
        unmatched_brackets.append((char, i))

    return unmatched_brackets

def is_whole_line_selected(code, start, end):
    # selected_text = code[start-1:end+1]
    # return selected_text == code and start==1
    trimmedcode = code.lstrip().rstrip("\n")
    labeledcode = code[start:end]
    # print('labeled code', code, labeledcode)
    trimmedlabelcode = labeledcode.lstrip()
    # print(trimmedcode == trimmedlabelcode)
    return trimmedcode == trimmedlabelcode

def process_csv(filename, value, lower_range_wise, upper_range_wise, concrete ):
    with open(filename, 'r', encoding="utf-8-sig") as csv_file:
        csv_reader = csv.DictReader(csv_file)
        current_filename = None
        current_line = None
        char_positions_for_line = []
        positions = []
        line_data_dict = {}

        for row in csv_reader:
            print(row)
            filename = row['FileName']
            line_start = int(row['LineStart'])
            char_start = int(row['CharStart'])
            char_end = int(row['CharEnd'])
            code = row['Code']
            labelvalue = row['LabelValue']

            condition = False
            if lower_range_wise == False and upper_range_wise == False:
                if float(labelvalue) == value:
                  condition = True
                else:
                  condition = False
            elif lower_range_wise == True and upper_range_wise == False:
              if concrete == False:
                if float(labelvalue) <= value:
                  condition = True
                else:
                  condition = False
              else:
                if float(labelvalue) == value or float(labelvalue) == value-1:
                  condition = True
                else:
                  condition = False
            elif lower_range_wise == False and upper_range_wise == True :
              if concrete == False:
                if float(labelvalue) >= value:
                  condition = True
                else:
                  condition = False
              else:
                 if float(labelvalue) == value or float(labelvalue) == value+1:
                  condition = True
                 else:
                  condition = False

            elif lower_range_wise == True and upper_range_wise == True:
                if float(labelvalue) == value:
                  condition = True
                else:
                  condition = False

            if condition:

                char_positions_for_line = []
                positions = []
                codes = []
                labelvalues = []
                positions.append(char_start)
                positions.append(char_end)
                line_data = line_data_dict.get((filename, line_start))
                if line_data:
                    line_data.positions.append(positions)
                    line_data.codes.append(code)
                    line_data.labelvalues.append(labelvalue)
                else:
                    char_positions_for_line.append(positions)
                    codes.append(code)
                    labelvalues.append(labelvalue)
                    line_data_dict[(filename, line_start)] = LineData(char_positions_for_line, codes, labelvalues)


    return line_data_dict


# read resolved labels data and perform mutataions and get the linewise mutations


# line_data_dict = process_csv('resolvedauthorlabels.csv',2.0)

# case 1: Value-based mutation. Remove core sim (2) or core diff (-2). Removes lines with label value set to value = 2 or -2
value = -2
lower_range_wise = False
upper_range_wise = False
concrete = False  #dont care


# case 2: Value-based mutation. Remove any non-core sim (1) or core sim (2). Removes lines with label value = 1 and one value above it e.g. if value is 1, will consider value 2 ones for removal also.
# value = 1
# lower_range_wise = False
# upper_range_wise = True
# concrete = True

# case 3: Value-based mutation. Remove any non-core diff (-1) or core diff (-2). Removes lines with label value = -1 and one value below it e.g. if value is -1, will consider value -2 ones for removal also.
# value = -1
# lower_range_wise = True
# upper_range_wise = False
# concrete = True


# case 4: Range-based mutation. Remove core sim if value=1.5 or any sim if value=0.5. Removes lines with label value and all values above it e.g. if value is 1.5, will consider value 1.6 etc. for removal also till 2.
# value can be 0.5 or 1.5 to select all core and noncore similarities or just core similarities respectively
# value = 0.5
# lower_range_wise = False
# upper_range_wise = True
# concrete = False

# case 5: Range-based mutation. Remove core diff if value= -1.5 or any diff if value= -0.5. Removes lines with input label value and all values below it e.g. if value is -0.5, will consider value -1.5 etc. for removal also till -2
# # value can be -0.5 or -1.5 to select all core and noncore differences or just core differences respectively
# value = -1.5
# lower_range_wise = True
# upper_range_wise = False
# concrete = False

# line_data_dict = process_csv('resolved_280_authorlabels.csv',value, lower_range_wise, upper_range_wise, concrete)
line_data_dict = process_csv(process_csv_file_name, value, lower_range_wise, upper_range_wise, concrete)
print(line_data_dict)
# for (filename, line_start), line_data in line_data_dict.items():
    # print(f"Filename: {filename}, Line: {line_start}")
    # print(f"Positions: {line_data.positions} codes: {line_data.codes} labelvalues: {line_data.labelvalues}")
    # print()  # Print a blank line between entries

# for each line get mutated code
mutated_line_data_dict = {}

for (filename, line_start), line_data in line_data_dict.items():
    # print(f"Filename: {filename}, Line: {line_start}")
    line = linecache.getline('code-samples/'+ filename, line_start+1)
    print("Line", line_start, ":", line)
    # Determine the number of spaces at the beginning of the original code
    leading_spaces = len(line) - len(line.lstrip())

    # Filter positions based on end value
    # if end position occurs in leading whitespace we dont need that position
    positions = [
        (start, end)
        for start, end in line_data.positions if end > leading_spaces
    ]
    line_data.positions = positions

    print("Adjusted positions:", positions, len(positions))

    if (len(line_data.positions) == 1):
                # check if position covers whole line, set remove as mutop else mod

                if(is_whole_line_selected(line, line_data.positions[0][0], line_data.positions[0][1])):
                  # mutated_line_data_dict[(filename, line_start+1)] = MutatedLineData('',"remove")
                  # print("Line ", line_start, "removed")
                      unmatched_brackets = find_lonely_brackets(line)
                      brackets_string = ''
                      if unmatched_brackets:
                          for bracket, position in unmatched_brackets:
                              # print(f"Unmatched Bracket '{bracket}' at position {position}")
                              brackets_string = brackets_string + bracket
                          mutated_code = brackets_string
                          mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")
                      else:
                          mutated_line_data_dict[(filename, line_start+1)] = MutatedLineData('',"remove")
                          print("Line ", line_start, "removed")

                else:
                  print("Line ", line_start, "mutated to ")
                  mutated_code = getMutatedCode(filename, line_start, line, line_data.positions)
                  if(mutated_code == "remove"):
                      unmatched_brackets = find_lonely_brackets(mutated_code)
                      brackets_string = ''
                      if unmatched_brackets:
                          for bracket, position in unmatched_brackets:
                              # print(f"Unmatched Bracket '{bracket}' at position {position}")
                              brackets_string.append(bracket)
                          mutated_code = brackets_string
                          mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")
                      else:
                          mutated_line_data_dict[(filename, line_start+1)] = MutatedLineData('',"remove")
                          print("Line ", line_start, "removed")

                  else:
                       mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")
    elif (len(line_data.positions) > 1):
        print("Line ", line_start, "mutated to ")
        mutated_code = getMutatedCode(filename, line_start, line, line_data.positions)
        # mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")
        if(mutated_code == "remove"):
                      unmatched_brackets = find_lonely_brackets(mutated_code)
                      brackets_string = ''
                      if unmatched_brackets:
                          for bracket, position in unmatched_brackets:
                              # print(f"Unmatched Bracket '{bracket}' at position {position}")
                              brackets_string.append(bracket)
                          mutated_code = brackets_string
                          mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")
                      else:
                          mutated_line_data_dict[(filename, line_start+1)] = MutatedLineData('',"remove")
                          print("Line ", line_start, "removed")

        else:
                       mutated_line_data_dict[(filename, line_start + 1)] = MutatedLineData(mutated_code, "mutate")



# build a new mutated file
for (filename, line_start), line_data in mutated_line_data_dict.items():
    print(filename,line_start, line_data.mutatedcode, line_data.mutop)

# original_filename = filename
# mutated_filename = filename + 'mutated.java'

# Read the original file and create the mutated file
# with open(original_filename, 'r') as original_file, open(mutated_filename, 'w') as mutated_file:
#     for line_number, line in enumerate(original_file, start=1):
#         # Check if there's mutated data for this line
#         mutated_line_data = mutated_line_data_dict.get((original_filename, line_number))

#         if mutated_line_data:
#             # Replace the line with mutated data
#             mutated_file.write(mutated_line_data.mutatedcode + '\n')
#         else:
#             # Copy the original line as-is
#             mutated_file.write(line)




[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Line  94 removed
Line 95 :     }

Adjusted positions: [(1, 5)] 1
Line 9 :     Class unsafeClass = Class.forName ("sun.misc.Unsafe");

Adjusted positions: [(1, 58)] 1
Line  9 removed
Line 10 :     Field f = unsafeClass.getDeclaredField ("theUnsafe");

Adjusted positions: [(1, 57)] 1
Line  10 removed
Line 11 :     f.setAccessible (true);

Adjusted positions: [(1, 27)] 1
Line  11 removed
Line 12 :     Unsafe unsafe = (Unsafe) f.get (null);

Adjusted positions: [(1, 42)] 1
Line  12 removed
Line 13 :     System.out.print ("4..3..2..1...");

Adjusted positions: [] 0
Line 14 :     try {

Adjusted positions: [(1, 9)] 1
Line 15 :         for (;;) unsafe.allocateMemory (1024 * 1024);

Adjusted positions: [(1, 53)] 1
Line  15 removed
Line 16 :     } catch (Error e) {

Adjusted positions: [(1, 23)] 1
Line 17 :         System.out.println ("Boom :)");

Adjusted positions: [(1, 39)] 1
Line 18 :         e.printStackTrace ();

Adjusted po

# Create mutated CSV file

In [7]:
# Code to build mutated CSV file

unique_file_names = {key[0] for key in mutated_line_data_dict.keys()}
error_filename = 'methoderrorsreport.csv'
# print(unique_file_names)
for filename in unique_file_names:
        original_filename = 'code-samples/' + filename
        mutated_data_folder = 'mutated-data-lx-diff-only/'
        mutated_filename = mutated_data_folder +filename + 'mutated.java'
        linecount = 0;
        method_lines = []

        with open(original_filename, 'r') as original_file, open(mutated_filename, 'w') as mutated_file, open(error_filename,'a') as error_file:
                method_lines = []
                write_to_file = []
                error_lines = []
                prev_line = ''
                for line in original_file:
                    linecount += 1
                    line = line.strip()
                    if (linecount == 9):
                        print('Line9:',line)
                    # next_line = original_file.readline().strip()
                    # print(line,next_line)
                    if(linecount>8):
                        mutated_line_data = mutated_line_data_dict.get((filename, linecount))
                        # print('m data', mutated_line_data)
                        if mutated_line_data:
                                        # Replace the line with mutated code
                                        print(mutated_line_data)
                                        mutated_code = mutated_line_data.mutatedcode
                                        mutop = mutated_line_data.mutop
                                        if(mutop == "remove"):
                                            print('do nothing')

                                        elif(mutated_code != ""):
                                            # mutated_file.write(mutated_code + '\n')
                                            # print(mutated_code + '\n')
                                            method_lines.append(mutated_code)
                        else:
                                        # Write the original method code if the line is nonempty
                                        if(line != ''):
                                            # mutated_file.write(line + '\n')
                                            # print(line)
                                            method_lines.append(line)
                                        # else:
                                        #     print('empty line')
                        if (prev_line == "}" and line == ""):
                          print('method code',method_lines)
                          for line in method_lines:
                            print(line)
                          method_code = '\n'.join(method_lines) + "\n"
                          print('code for ast',method_code)
                            # Build AST to check if it's error-free
                          try:
                                method_code_escaped = add_escape_before_double_quotes(method_code)
                                root = asts.AST.from_string(
                                    'public class { ' + method_code_escaped + '}' ,
                                    language=asts.ASTLanguage.Java,
                                    deepest=True
                                )
                                write_to_file.append(method_code + '\n')
                          except Exception as e:
                            #     # Handle AST errors or exceptions here
                                # print(root.ast_source_ranges())
                                print(f"Error processing method:", filename, method_code, e)
                                error_strings = [filename, method_code, str(e)]
                                all_errors = ', '.join(error_strings)
                                csvwriter = csv.writer(error_file)
                                csvwriter.writerow([all_errors])
                                write_to_file.append(method_code + '\n')
                          method_lines = []
                        prev_line = line

                    else:
                        write_to_file.append(line + '\n')
                for code in write_to_file:
                  mutated_file.write(code)
                mutated_file.write('}')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
<__main__.MutatedLineData object at 0x7a48bd6d8f70>
do nothing
<__main__.MutatedLineData object at 0x7a48bd4775b0>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29c3a0>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29ce80>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29d7e0>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29e380>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29ee00>
do nothing
<__main__.MutatedLineData object at 0x7a48bd29f880>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a3c10>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a0700>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a09d0>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a2b30>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a38e0>
<__main__.MutatedLineData object at 0x7a48bd2a87f0>
do nothing
<__main__.MutatedLineData object at 0x7a48bd2a3b20>
do nothing
<

#Copy the mutated data file to google drive for download

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import shutil
import os

# Set the source and destination directories
drive_folder = '/content/drive/My Drive/MutatedData/ValueBasedCoreDiffRemoved/'  # Change this to your Google Drive folder path

# List all files in the folder
files = os.listdir(mutated_data_folder)

# Count the number of files
file_count = len(files)

# Print the file count
print(f"Number of files in the folder: {file_count}")
# Copy each file to the Google Drive folder
for file in files:
    source_file = os.path.join(mutated_data_folder, file)
    destination_file = os.path.join(drive_folder, file)
    print(source_file,destination_file)
    shutil.copy(source_file, destination_file)

print("Files copied successfully!")