Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

wwzeng1 · 2023-08-23T23:20:50Z

Details

No response

Checklist

sweepai/utils/benchmark_modify_file.py

• Import the necessary modules at the beginning of the file. This includes the time module for benchmarking and the modify_file function from diff.py.
• Define a function named benchmark_modify_file that takes a file path as an argument.
• Inside the benchmark_modify_file function, open and read the file using the provided file path.
• Still within the benchmark_modify_file function, record the current time before calling the modify_file function.
• Call the modify_file function with the necessary arguments, including the context obtained from the file.
• After the modify_file function call, record the current time again and calculate the difference to get the execution time.
• Print the execution time of the modify_file function.
• At the end of the file, add a conditional statement to call the benchmark_modify_file function when the script is run directly.

The text was updated successfully, but these errors were encountered:

sweep-nightly · 2023-08-23T23:21:03Z

Here's the PR! #1368.

💎 Sweep Pro: I used GPT-4 to create this ticket. You have 107 GPT-4 tickets left for the month. To retrigger Sweep, edit the issue.

Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description.

sweep/sweepai/utils/diff.py

Lines 1 to 460 in 3f6f817

    
           import difflib 
        
           import re 
        
           from sweepai.core.entities import SweepContext 
        
           from sweepai.utils.chat_logger import discord_log_error 
        
           def diff_contains_dups_or_removals(diff, new_code): 
        
               # The regex pattern for lines removed or added in the actual code 
        
               removed_line_pattern = r"^-.*" 
        
               added_line_pattern = r"^\+.*" 
        
               lines_removed = False 
        
               duplicate_lines_added = False 
        
               # Split the diff and new_code into separate lines 
        
               diff_lines = diff.split("\n")[3:]  # Start from the third line 
        
               new_code_lines = [line.strip() for line in new_code.split("\n")] 
        
               # Check if there are removed lines 
        
               for line in diff_lines: 
        
                   if re.match(removed_line_pattern, line): 
        
                       lines_removed = True 
        
               # Check if there are duplicate lines added 
        
               added_lines = [ 
        
                   line[1:].strip() for line in diff_lines if re.match(added_line_pattern, line) 
        
               ] 
        
               for line in added_lines: 
        
                   if new_code_lines.count(line) > 1: 
        
                       duplicate_lines_added = True 
        
                       break 
        
               return lines_removed or duplicate_lines_added 
        
           def generate_diff(old_code, new_code): 
        
               old_code = old_code.strip() 
        
               new_code = new_code.strip() 
        
               diff = difflib.unified_diff( 
        
                   old_code.splitlines(keepends=True), new_code.splitlines(keepends=True) 
        
               ) 
        
               diff_text = "".join(diff) 
        
               return diff_text 
        
           def revert_whitespace_changes(original_file_str, modified_file_str): 
        
               original_lines = original_file_str.split("\n") 
        
               modified_lines = modified_file_str.split("\n") 
        
               diff = difflib.SequenceMatcher(None, original_lines, modified_lines) 
        
               final_lines = [] 
        
               for opcode in diff.get_opcodes(): 
        
                   if opcode[0] == "equal" or opcode[0] == "replace": 
        
                       # If the lines are equal or replace (means the change is not whitespace only) 
        
                       # use original lines. 
        
                       final_lines.extend(original_lines[opcode[1] : opcode[2]]) 
        
                   elif opcode[0] == "insert": 
        
                       # If the lines are inserted in the modified file, check if it's just whitespace changes 
        
                       # If it's just whitespace changes, ignore them. 
        
                       for line in modified_lines[opcode[3] : opcode[4]]: 
        
                           if line.strip() != "": 
        
                               final_lines.append(line) 
        
               return "\n".join(final_lines) 
        
           def format_contents(file_contents, is_markdown=False): 
        
               """ 
        
               Add arbitrary postprocessing here, this affects files and diffs 
        
               """ 
        
               lines = file_contents.split("\n") 
        
               # Handle small files 
        
               if len(lines) <= 5: 
        
                   start_idx = 0 
        
                   end_idx = len(lines) 
        
                   for idx, line in enumerate(lines): 
        
                       if start_idx == 0 and line.strip().startswith("```"): 
        
                           start_idx = idx + 1 
        
                       if start_idx != 0 and line.strip().endswith("```"): 
        
                           end_idx = idx 
        
                   lines = lines[start_idx:end_idx] 
        
                   return "\n".join(lines) 
        
               first_three_lines = lines[:3] 
        
               last_three_lines = lines[-3:] 
        
               first_line_idx = 0 
        
               last_line_idx = 3 
        
               for idx, line in enumerate(first_three_lines): 
        
                   line = line.strip() 
        
                   if line.startswith("```"): 
        
                       first_line_idx = max(first_line_idx, idx + 1) 
        
                   if "user_code>" in line: 
        
                       first_line_idx = max(first_line_idx, idx + 1) 
        
               for idx, line in enumerate(last_three_lines):  # Check in reverse 
        
                   line = line.strip() 
        
                   if line.endswith("```"): 
        
                       last_line_idx = min(idx, last_line_idx) 
        
                   if "user_code>" in line: 
        
                       last_line_idx = min(idx, last_line_idx) 
        
               first_three_lines = first_three_lines[first_line_idx:] 
        
               last_three_lines = last_three_lines[:last_line_idx] 
        
               lines = first_three_lines + lines[3:-3] + last_three_lines 
        
               return "\n".join(lines) 
        
           def generate_new_file( 
        
               modify_file_response: str, old_file_content: str, chunk_offset: int = 0 
        
           ) -> str: 
        
               result_file = "" 
        
               old_file_lines = old_file_content.split("\n") 
        
               # Extract content between <new_file> tags 
        
               new_file = re.search( 
        
                   r".*?<new_file>\n?(.*)\n<\/new_file>", modify_file_response, re.DOTALL 
        
               ).group(1) 
        
               if "<copy_lines" not in new_file: 
        
                   return new_file 
        
               # v5 
        
               result = [] 
        
               lines = new_file.split("\n") 
        
               for line_number, line in enumerate(lines): 
        
                   # Todo: make it support 1 number only 
        
                   matches = re.finditer(r"<copy_lines\s(\d+-\d+)/?>", line) 
        
                   copied_lines = False 
        
                   for match in matches: 
        
                       copied_lines = True 
        
                       start, end = match.group(1).split("-") 
        
                       start, end = int(start) - 1, int(end) - 1 
        
                       if chunk_offset != 0:  # Correct for the line numbers being much higher 
        
                           start -= chunk_offset 
        
                           end -= chunk_offset 
        
                       start = max(0, start) 
        
                       end = min(len(old_file_lines) - 1, end) 
        
                       replacements = old_file_lines[start : end + 1] 
        
                       replacements_str = "\n".join(replacements) 
        
                       line = line.replace(match.group(0), replacements_str) 
        
                   # check if line was incorrectly duplicated 
        
                   append = True 
        
                   if not copied_lines:  # if bot generated, and line before is not bot generated 
        
                       if len(result) > 0: 
        
                           # Get last line in results 
        
                           last_group = result[-1] 
        
                           # last_line = last_group 
        
                           if "\n" in last_group: 
        
                               last_line = last_group[ 
        
                                   last_group.rindex("\n") + 1 : 
        
                               ]  # if its multiple lines 
        
                               # if last line is same is current line 
        
                               if last_line == line: 
        
                                   append = False 
        
                   if append: 
        
                       result.append(line) 
        
               result = "\n".join(result) 
        
               return result 
        
           NOT_FOUND = "NOT_FOUND" 
        
           IDENTICAL_LINES = "NO MATCHES FOUND" 
        
           MULTIPLE_HITS = "MULTIPLE_HITS" 
        
           INCOMPLETE_MATCH = "INCOMPLETE_MATCH" 
        
           def match_string( 
        
               original, search, start_index=None, exact_match=False, ignore_comments=False 
        
           ): 
        
               index = -1 
        
               max_similarity = 0 
        
               current_hits = 0 
        
               # sliding window comparison from original to search 
        
               # Todo: 2 pointer approach (find start, then find end) 
        
               # Todo: use rapidfuzz to compute fuzzy similarity over code 
        
               for i in range(start_index or 0, len(original)): 
        
                   count = 0 
        
                   for j in range(len(search)): 
        
                       if i + j >= len(original): 
        
                           continue 
        
                       original_line = original[i + j] 
        
                       if ignore_comments: 
        
                           # Remove comments 
        
                           original_line = original_line.rsplit("#")[0].rsplit("//")[0] 
        
                       match = ( 
        
                           search[j] == original_line 
        
                           if exact_match 
        
                           else search[j].strip() == original_line.strip() 
        
                       ) 
        
                       if match: 
        
                           count += 1 
        
                           # If searching for previous snippet (like regex) 
        
                           if start_index is not None and search[j] == original[i + j]: 
        
                               count += 0.001 
        
                   if count > max_similarity: 
        
                       index = i 
        
                       max_similarity = count 
        
                       current_hits = 1 
        
                   elif count == max_similarity: 
        
                       current_hits += 1 
        
               return index, max_similarity, current_hits 
        
           def lstrip_max(s, chars, max_count): 
        
               count = 0 
        
               for char in s: 
        
                   if char in chars and count < max_count: 
        
                       count += 1 
        
                   else: 
        
                       break 
        
               return s[count:] 
        
           def get_snippet_with_padding(original, index, search): 
        
               snippet = original[index : index + len(search)] 
        
               # Fix whitespace 
        
               if len(search[0]) - len(search[0].lstrip()) == 0: 
        
                   spaces = " " * (len(snippet[0]) - len(snippet[0].lstrip())) 
        
                   strip = False 
        
               else:  # Do diff between snippet and search 
        
                   # Todo(lukejagg): This might need to be more robust. 
        
                   # Check multiple lines for their whitespace 
        
                   min_whitespace = min([len(s) - len(s.lstrip()) for s in search]) 
        
                   spaces = " " * min_whitespace 
        
                   strip = True 
        
               return snippet, spaces, strip 
        
           def sliding_window_replacement( 
        
               original, search, replace, search_context_before=None, **kwargs 
        
           ): 
        
               status, replace_index = None, None 
        
               # First, do check for "..." (example: define method, then put ... to ignore initial lines) 
        
               canDoDotCheck = not any( 
        
                   "..." in line.strip() for line in original 
        
               )  # If ... not in original file 
        
               if canDoDotCheck: 
        
                   # Check first 3 lines for '...' 
        
                   first_line_idx = -1 
        
                   for i in range(len(search)): 
        
                       if search[i].strip() == "...": 
        
                           first_line_idx = i 
        
                           break 
        
                   # Do this for replace too 
        
                   first_line_idx_replace = -1 
        
                   for i in range(len(replace)): 
        
                       if replace[i].strip() == "...": 
        
                           first_line_idx_replace = i 
        
                           break 
        
                   if first_line_idx == 0 and first_line_idx_replace == 0: 
        
                       search = search[1:] 
        
                       replace = replace[1:] 
        
                   elif ( 
        
                       first_line_idx == len(search) - 1 
        
                       and first_line_idx_replace == len(replace) - 1 
        
                   ): 
        
                       search = search[:first_line_idx] 
        
                       replace = replace[:first_line_idx_replace] 
        
                   elif first_line_idx != -1 and first_line_idx_replace != -1: 
        
                       # SPLIT INTO TWO PARTS 
        
                       # TODO(lukejagg): pass in the first and last lines as context for matching (so ambiguous ... can be matched) 
        
                       search_context_before = search[:first_line_idx] 
        
                       original, replace_index, status = sliding_window_replacement( 
        
                           original, 
        
                           search[first_line_idx + 1 :], 
        
                           replace[first_line_idx_replace + 1 :], 
        
                           search_context_before, 
        
                           **kwargs, 
        
                       ) 
        
                       search = search[:first_line_idx] 
        
                       replace = replace[:first_line_idx_replace] 
        
               exact_match = kwargs.get("exact_match", False) 
        
               ignore_comments = kwargs.get("ignore_comments", False) 
        
               index, max_similarity, current_hits = match_string( 
        
                   original, search, exact_match=exact_match, ignore_comments=ignore_comments 
        
               ) 
        
               # No changes could be found. Return original code. 
        
               if max_similarity == 0: 
        
                   if not ignore_comments:  # In case Sweep decided not to include comments 
        
                       return sliding_window_replacement( 
        
                           original, 
        
                           search, 
        
                           replace, 
        
                           ignore_comments=True, 
        
                           **{k: v for k, v in kwargs.items() if k != "ignore_comments"}, 
        
                       ) 
        
                   print("WARNING: No identical lines") 
        
                   return original, None, IDENTICAL_LINES 
        
               if current_hits > 1: 
        
                   success = False 
        
                   if search_context_before: 
        
                       old_index, _, current_hits = match_string( 
        
                           original, 
        
                           search_context_before, 
        
                           exact_match=exact_match, 
        
                       ) 
        
                       _, old_spaces, _ = get_snippet_with_padding( 
        
                           original, old_index, search_context_before 
        
                       ) 
        
                       if current_hits == 1: 
        
                           index, max_similarity, current_hits = match_string( 
        
                               original, 
        
                               [old_spaces + s for s in search], 
        
                               start_index=old_index + 1, 
        
                               exact_match=exact_match, 
        
                           ) 
        
                           current_hits = 1  # Ignore multiple hits, use first complete comparison 
        
                           success = True 
        
                   if not success: 
        
                       if ( 
        
                           len(replace) == 1 and not replace[0] and not search_context_before 
        
                       ):  # Backup 1: independent line matches 
        
                           exact_matches = [line for line in original if line in search] 
        
                           # If there are no duplicates and all lines have a match 
        
                           if len(set(exact_matches)) == len(search): 
        
                               # Remove all of those corresponding lines in the content 
        
                               original = [line for line in original if line not in search] 
        
                               return original, None, None 
        
                       if not exact_match:  # Backup 2: exact line matches 
        
                           return sliding_window_replacement( 
        
                               original, 
        
                               search, 
        
                               replace, 
        
                               exact_match=True, 
        
                               **{k: v for k, v in kwargs.items() if k != "exact_match"}, 
        
                           ) 
        
                       print("WARNING: Multiple hits") 
        
                       return original, None, MULTIPLE_HITS 
        
               # Todo(lukejagg): Remove unreachable code 
        
               if index == -1: 
        
                   # First, try matching beginning of search 
        
                   return original, None, NOT_FOUND 
        
               # Todo(lukejagg): this doesn't seem to work, add later 
        
               # if int(max_similarity) != len(search): 
        
               #     return original, None, INCOMPLETE_MATCH 
        
               # if max_similarity != len(search): 
        
               snippet, spaces, strip = get_snippet_with_padding(original, index, search) 
        
               if strip: 
        
                   # Todo: What if whitespace in search is incorrect 
        
                   first_line_spaces = min([len(s) - len(s.lstrip()) for s in search]) 
        
                   modified = [ 
        
                       spaces + (lstrip_max(line, [" "], first_line_spaces) if strip else line) 
        
                       for line in replace 
        
                   ] 
        
               else: 
        
                   modified = [spaces + line for line in replace] 
        
               # replaced original with modified 
        
               original = original[:index] + modified + original[index + len(search) :] 
        
               return original, index, None 
        
           def get_all_diffs(modify_file_response: str) -> str: 
        
               matches = re.findall( 
        
                   r"(<<<<.*?\n(.*?)\n====[^\n=]*\n(.*?)\n?>>>>)", modify_file_response, re.DOTALL 
        
               ) 
        
               result = "\n\n".join([_match for _match, *_ in matches]) 
        
               return result 
        
           def generate_new_file_from_patch( 
        
               modify_file_response: str, 
        
               old_file_content: str, 
        
               chunk_offset: int = 0, 
        
               sweep_context: SweepContext = None, 
        
           ): 
        
               old_file_lines = old_file_content.split("\n") 
        
               # Extract content between <new_file> tags 
        
               matches = re.findall( 
        
                   r"<<<<.*?\n(.*?)\n====[^\n=]*\n(.*?)\n?>>>>", modify_file_response, re.DOTALL 
        
               ) 
        
               errors = [] 
        
               if not old_file_content.strip(): 
        
                   # If old file is empty, just return the first match 
        
                   print(matches) 
        
                   search_and_replace, *_ = matches 
        
                   return search_and_replace[1] 
        
               for search, replace in matches: 
        
                   # Remove trailing tags 
        
                   if search.lstrip().startswith("<old_file>") and replace.lstrip().startswith( 
        
                       "<old_file" 
        
                   ): 
        
                       search = search.lstrip()[len("<old_file>") :] 
        
                       replace = replace.lstrip()[len("<old_file>") :] 
        
                   # Remove trailing tags 
        
                   if search.rstrip().endswith("</old_file>") and replace.rstrip().endswith( 
        
                       "</old_file" 
        
                   ): 
        
                       search = search.rstrip()[: -len("</old_file>")] 
        
                       replace = replace.rstrip()[: -len("</old_file>")] 
        
                   old_file_lines, replace_index, status = sliding_window_replacement( 
        
                       old_file_lines, search.split("\n"), replace.split("\n") 
        
                   ) 
        
                   if status is not None: 
        
                       s = search.replace("`", "\\`") 
        
                       r = replace.replace("`", "\\`") 
        
                       errors.append(f"- {status}\n```\n{s}\n```\n\n```\n{r}\n```") 
        
               if len(errors) > 0: 
        
                   log = "\n\n".join(errors) 
        
                   discord_log_error( 
        
                       f"{sweep_context.issue_url}\nModify Parsing Errors {'gpt3.5' if sweep_context.use_faster_model else 'gpt4'}: \n" 
        
                       + log, 
        
                       priority=0 if sweep_context.use_faster_model else 1, 
        
                   ) 
        
               result = "\n".join(old_file_lines) 
        
               return result, errors 
        
           def join_contents_k(first, second, k): 
        
               """ 
        
               Join contents together removing k duplicate lines 
        
               """ 
        
               first_lines = first.split("\n") 
        
               second_lines = second.split("\n") 
        
               for i in range(k, 0, -1): 
        
                   if len(first_lines) < k or len(second_lines) < k: 
        
                       continue 
        
                   if first_lines[-i:] == second_lines[:i]: 
        
                       return "\n".join(first_lines) + "\n" + "\n".join(second_lines[i:]) 
        
               return "\n".join(first_lines) + "\n" + "\n".join(second_lines) 
        
           def is_markdown(filename): 
        
               return ( 
        
                   filename.endswith(".md") 
        
                   or filename.endswith(".rst") 
        
                   or filename.endswith(".txt")

sweep/sweepai/handlers/on_ticket.py

Lines 681 to 789 in 3f6f817

    
           if do_map: 
        
               subissues: list[ProposedIssue] = sweep_bot.generate_subissues() 
        
               edit_sweep_comment( 
        
                   f"I'm creating the following subissues:\n\n" 
        
                   + "\n\n".join( 
        
                       [ 
        
                           f"* #{subissue.title}:\n> " 
        
                           + subissue.body.replace("\n", "\n> ") 
        
                           for subissue in subissues 
        
                       ] 
        
                   ), 
        
                   3, 
        
               ) 
        
               for subissue in tqdm(subissues): 
        
                   subissue.issue_id = repo.create_issue( 
        
                       title="Sweep: " + subissue.title, 
        
                       body=subissue.body + f"\n\nParent issue: #{issue_number}", 
        
                       assignee=username, 
        
                   ).number 
        
               subissues_checklist = "\n\n".join( 
        
                   [ 
        
                       f"- [ ] #{subissue.issue_id}\n\n> " 
        
                       + f"**{subissue.title}**\n{subissue.body}".replace("\n", "\n> ") 
        
                       for subissue in subissues 
        
                   ] 
        
               ) 
        
               current_issue.edit( 
        
                   body=summary + "\n\n---\n\nChecklist:\n\n" + subissues_checklist 
        
               ) 
        
               edit_sweep_comment( 
        
                   f"I finished creating the subissues! Track them at:\n\n" 
        
                   + "\n".join(f"* #{subissue.issue_id}" for subissue in subissues), 
        
                   4, 
        
               ) 
        
               edit_sweep_comment(f"N/A", 5) 
        
               edit_sweep_comment(f"I finished creating all the subissues.", 6) 
        
               return {"success": True} 
        
           # COMMENT ON ISSUE 
        
           # TODO: removed issue commenting here 
        
           logger.info("Fetching files to modify/create...") 
        
           file_change_requests, plan = sweep_bot.get_files_to_change() 
        
           if not file_change_requests: 
        
               if len(title + summary) < 60: 
        
                   edit_sweep_comment( 
        
                       "Sorry, I could not find any files to modify, can you please provide more details? Please make sure that the title and summary of the issue are at least 60 characters.", 
        
                       -1, 
        
                   ) 
        
               else: 
        
                   edit_sweep_comment( 
        
                       "Sorry, I could not find any files to modify, can you please provide more details?", 
        
                       -1, 
        
                   ) 
        
               raise Exception("No files to modify.") 
        
           sweep_bot.summarize_snippets(plan) 
        
           file_change_requests = sweep_bot.validate_file_change_requests( 
        
               file_change_requests 
        
           ) 
        
           table = tabulate( 
        
               [ 
        
                   [ 
        
                       f"`{file_change_request.filename}`", 
        
                       file_change_request.instructions_display.replace( 
        
                           "\n", "<br/>" 
        
                       ).replace("```", "\\```"), 
        
                   ] 
        
                   for file_change_request in file_change_requests 
        
               ], 
        
               headers=["File Path", "Proposed Changes"], 
        
               tablefmt="pipe", 
        
           ) 
        
           edit_sweep_comment( 
        
               "From looking through the relevant snippets, I decided to make the following modifications:\n\n" 
        
               + table 
        
               + "\n\n", 
        
               2, 
        
           ) 
        
           # TODO(lukejagg): Generate PR after modifications are made 
        
           # CREATE PR METADATA 
        
           logger.info("Generating PR...") 
        
           pull_request = sweep_bot.generate_pull_request() 
        
           pull_request_content = pull_request.content.strip().replace("\n", "\n>") 
        
           pull_request_summary = f"**{pull_request.title}**\n`{pull_request.branch_name}`\n>{pull_request_content}\n" 
        
           edit_sweep_comment( 
        
               f"I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:\n\n{pull_request_summary}", 
        
               3, 
        
           ) 
        
           logger.info("Making PR...") 
        
           files_progress = [ 
        
               ( 
        
                   file_change_request.filename, 
        
                   file_change_request.instructions_display, 
        
                   "⏳ In Progress", 
        
               ) 
        
               for file_change_request in file_change_requests 
        
           ] 
        
           checkboxes_progress = [ 
        
               (file_change_request.filename, file_change_request.instructions, " ") 
        
               for file_change_request in file_change_requests 
        
           ] 
        
           checkboxes_message = collapsible_template.format( 
        
               summary="Checklist",

sweep/sweepai/utils/file_change_functions.py

Lines 1 to 106 in 3f6f817

    
           from loguru import logger 
        
           from sweepai.core.chat import Function 
        
           from sweepai.utils.diff import format_contents 
        
           modify_file_function = Function( 
        
               name="modify_file", 
        
               description="Edits the code in a file. Use start_line and end_line to completely cover the line indexes of code that should be replaced. Indent and format the code in the edits. Output the code in the order it should appear in the file. Make sure start_line and end_line do not overlap between code edits.", 
        
               parameters={ 
        
                   "type": "object", 
        
                   "properties": { 
        
                       "file_name": { 
        
                           "type": "string", 
        
                           "description": "The name of the file to modify.", 
        
                       }, 
        
                       "code_edits": { 
        
                           "type": "array", 
        
                           "items": { 
        
                               "type": "object", 
        
                               "properties": { 
        
                                   "start_line": { 
        
                                       "type": "integer", 
        
                                       "description": "The index where the code should start being inserted/replaced.", 
        
                                   }, 
        
                                   "end_line": { 
        
                                       "type": "integer", 
        
                                       "description": "The index where the code should stop being inserted/replaced. Add 1 to this number to include the line.", 
        
                                   }, 
        
                                   "inserted_code": { 
        
                                       "type": "string", 
        
                                       "description": "Only the new code to insert into the file. Indent and format this code properly using spaces, keeping in mind the entire block will be affected by num_indents. To delete a line, set this to '' (single quoted empty string).", 
        
                                   }, 
        
                                   "num_indents": { 
        
                                       "type": "integer", 
        
                                       "description": "Use this to indent the entire inserted_code. BE SURE to match the indentation to be inline. There will be two spaces for however many num_indents are set. num_indents can be set to 0, but ONLY IF NEEDED. When it is ambiguous, set as many num_indents as possible. ", 
        
                                   }, 
        
                               }, 
        
                               "required": ["start_line", "end_line", "code", "num_indents"], 
        
                           }, 
        
                           "description": "An array of edits. Each `code_edit` represents a slice of the code split by newlines and delimited by `start_line` and `end_line`. Both `start_line` and `end_line` are zero-indexed and inclusive.", 
        
                       }, 
        
                   }, 
        
                   "required": ["file_name", "code_edits"], 
        
               }, 
        
           ) 
        
           def apply_code_edits(file_contents, code_edits): 
        
               modifications = [] 
        
               for edit in code_edits: 
        
                   start_line = int(edit["start_line"]) 
        
                   end_line = int(edit["end_line"]) 
        
                   new_code = format_contents(edit["inserted_code"]) 
        
                   # Indentation 
        
                   indentation = int(edit["num_indents"]) 
        
                   logger.info(f"The code {new_code} has {indentation} indents") 
        
                   # Starts with or ends with "" should be swapped to '' for json 
        
                   if len(new_code) >= 2 and new_code[:1] == '""': 
        
                       new_code = "''" + new_code[2:] 
        
                   elif len(new_code) >= 2 and new_code[-2:] == '""': 
        
                       new_code = new_code[:-2] + "''" 
        
                   new_code = edit["inserted_code"].split("\n") 
        
                   modifications.append((start_line, end_line, new_code, indentation)) 
        
               # Sort modifications by start line in reverse order 
        
               modifications.sort(key=lambda x: x[0], reverse=True) 
        
               lines = file_contents.split("\n") 
        
               for start_line, end_line, new_code, indentation in modifications: 
        
                   if start_line > end_line: 
        
                       logger.error(f"Start line {start_line} is greater than end line {end_line}") 
        
                       continue 
        
                   if start_line < 0: 
        
                       logger.error(f"Start line {start_line} is less than 0") 
        
                       continue 
        
                   if end_line > len(lines) - 1: 
        
                       logger.error( 
        
                           f"End line {end_line} is greater than the number of lines in the file {len(lines)}" 
        
                       ) 
        
                       continue 
        
                   # Handle duplicate lines between the existing code and new code 
        
                   indents = "  " * indentation 
        
                   if ( 
        
                       start_line > 0 
        
                       and end_line < len(lines) 
        
                       and new_code[0] == lines[start_line - 1] 
        
                       and new_code[-1] == lines[end_line] 
        
                   ): 
        
                       new_code = new_code[1:-1] 
        
                       new_code = [indents + line for line in new_code] 
        
                       lines[start_line:end_line] = new_code 
        
                       continue 
        
                   elif start_line > 0 and new_code[0] == lines[start_line - 1]: 
        
                       new_code = new_code[1:] 
        
                       new_code = [indents + line for line in new_code] 
        
                       lines[start_line - 1 : end_line + 1] = new_code  # Exit and merge first line 
        
                       continue 
        
                   elif end_line < len(lines) and new_code[-1] == lines[end_line]: 
        
                       new_code = new_code[:-1] 
        
                       new_code = [indents + line for line in new_code] 
        
                       lines[start_line:end_line] = new_code  # Exit and merge last line 
        
                       continue 
        
                   # Check index error 
        
                   if end_line > len(lines) - 1: 
        
                       end_line = len(lines) - 1 
        
                   new_code = [indents + line for line in new_code] 
        
                   lines[start_line : end_line + 1] = new_code  # Start and end are inclusive

sweep/sweepai/core/sweep_bot.py

Lines 426 to 645 in 3f6f817

    
               contents_line_numbers: str = "", 
        
               branch=None, 
        
               chunking: bool = False, 
        
               chunk_offset: int = 0, 
        
               retries: int = 1, 
        
           ) -> tuple[str, str]: 
        
               for count in range(retries): 
        
                   key = f"file_change_modified_{file_change_request.filename}" 
        
                   file_markdown = is_markdown(file_change_request.filename) 
        
                   # TODO(sweep): edge case at empty file 
        
                   message = modify_file_prompt_3.format( 
        
                       filename=file_change_request.filename, 
        
                       instructions=file_change_request.instructions, 
        
                       code=contents_line_numbers, 
        
                       line_count=contents.count("\n") + 1, 
        
                   ) 
        
                   try: 
        
                       if chunking: 
        
                           # TODO (sweep): make chunking / streaming better 
        
                           message = chunking_prompt + message 
        
                           modify_file_response = self.chat( 
        
                               message, 
        
                               message_key=key, 
        
                           ) 
        
                           self.delete_messages_from_chat(key) 
        
                       else: 
        
                           modify_file_response = self.chat( 
        
                               message, 
        
                               message_key=key, 
        
                           ) 
        
                   except Exception as e:  # Check for max tokens error 
        
                       if "max tokens" in str(e).lower(): 
        
                           logger.error( 
        
                               f"Max tokens exceeded for {file_change_request.filename}" 
        
                           ) 
        
                           raise MaxTokensExceeded(file_change_request.filename) 
        
                   try: 
        
                       logger.info( 
        
                           f"generate_new_file with contents: {contents} and modify_file_response: {modify_file_response}" 
        
                       ) 
        
                       new_file, errors = generate_new_file_from_patch( 
        
                           modify_file_response, 
        
                           contents, 
        
                           chunk_offset=chunk_offset, 
        
                           sweep_context=self.sweep_context, 
        
                       ) 
        
                       new_file = format_contents(new_file, file_markdown) 
        
                       commit_message_match = re.search( 
        
                           'Commit message: "(?P<commit_message>.*)"', modify_file_response 
        
                       ) 
        
                       if commit_message_match: 
        
                           commit_message = commit_message_match.group("commit_message") 
        
                       else: 
        
                           commit_message = f"Updated {file_change_request.filename}" 
        
                       commit_message = commit_message[: min(len(commit_message), 50)] 
        
                       # self.delete_messages_from_chat(key) 
        
                       # proposed_diffs = get_all_diffs(modify_file_response) 
        
                       # proposed_diffs = ( 
        
                       #     f"<proposed_diffs>\n{proposed_diffs}\n</proposed_diffs>\n\n" 
        
                       #     if proposed_diffs 
        
                       #     else "" 
        
                       # ) 
        
                       # validation step 
        
                       # logger.info("Validating file change request...") 
        
                       # new_diffs = self.chat( 
        
                       #     code_repair_modify_prompt.format( 
        
                       #         filename=file_change_request.filename, 
        
                       #         instructions=file_change_request.instructions, 
        
                       #         code=new_file, 
        
                       #     ), 
        
                       #     message_key=key + "-validation", 
        
                       # ) 
        
                       # final_file, errors = generate_new_file_from_patch( 
        
                       #     new_diffs, 
        
                       #     new_file, 
        
                       #     chunk_offset=chunk_offset, 
        
                       #     sweep_context=self.sweep_context, 
        
                       # ) 
        
                       # final_file = format_contents(final_file, file_markdown) 
        
                       # logger.info("Done validating file change request") 
        
                       # return final_file, commit_message 
        
                       return new_file, commit_message 
        
                   except Exception as e: 
        
                       tb = traceback.format_exc() 
        
                       logger.warning( 
        
                           f"Failed to parse. Retrying for the {count}th time. Received error {e}\n{tb}" 
        
                       ) 
        
                       self.delete_messages_from_chat(key) 
        
                       continue 
        
               raise Exception(f"Failed to parse response after {retries} attempts.") 
        
           def change_files_in_github( 
        
               self, 
        
               file_change_requests: list[FileChangeRequest], 
        
               branch: str, 
        
               blocked_dirs: list[str] = [], 
        
               sandbox=None, 
        
           ): 
        
               # should check if branch exists, if not, create it 
        
               logger.debug(file_change_requests) 
        
               num_fcr = len(file_change_requests) 
        
               completed = 0 
        
               for _, changed_file in self.change_files_in_github_iterator( 
        
                   file_change_requests, branch, blocked_dirs, sandbox=sandbox 
        
               ): 
        
                   if changed_file: 
        
                       completed += 1 
        
               return completed, num_fcr 
        
           def change_files_in_github_iterator( 
        
               self, 
        
               file_change_requests: list[FileChangeRequest], 
        
               branch: str, 
        
               blocked_dirs: list[str], 
        
               sandbox=None, 
        
           ) -> Generator[tuple[FileChangeRequest, bool], None, None]: 
        
               # should check if branch exists, if not, create it 
        
               logger.debug(file_change_requests) 
        
               num_fcr = len(file_change_requests) 
        
               completed = 0 
        
               added_modify_hallucination = False 
        
               for file_change_request in file_change_requests: 
        
                   changed_file = False 
        
                   try: 
        
                       if self.is_blocked(file_change_request.filename, blocked_dirs)[ 
        
                           "success" 
        
                       ]: 
        
                           logger.info( 
        
                               f"Skipping {file_change_request.filename} because it is blocked." 
        
                           ) 
        
                           continue 
        
                       print( 
        
                           f"Processing {file_change_request.filename} for change type {file_change_request.change_type}..." 
        
                       ) 
        
                       match file_change_request.change_type: 
        
                           case "create": 
        
                               # Add example for more consistent generation 
        
                               if not added_modify_hallucination: 
        
                                   added_modify_hallucination = True 
        
                                   # Add hallucinated example for better parsing 
        
                                   for message in modify_file_hallucination_prompt: 
        
                                       self.messages.append(Message(**message)) 
        
                               changed_file = self.handle_create_file( 
        
                                   file_change_request, branch, sandbox=sandbox 
        
                               ) 
        
                           case "modify": 
        
                               # Add example for more consistent generation 
        
                               if not added_modify_hallucination: 
        
                                   added_modify_hallucination = True 
        
                                   # Add hallucinated example for better parsing 
        
                                   for message in modify_file_hallucination_prompt: 
        
                                       self.messages.append(Message(**message)) 
        
                               # Remove snippets from this file if they exist 
        
                               snippet_msgs = [ 
        
                                   m for m in self.messages if m.key == BOT_ANALYSIS_SUMMARY 
        
                               ] 
        
                               if len(snippet_msgs) > 0:  # Should always be true 
        
                                   snippet_msg = snippet_msgs[0] 
        
                                   # Use regex to remove this snippet from the message 
        
                                   file = re.escape(file_change_request.filename) 
        
                                   regex = rf'<snippet source="{file}:\d*-?\d*.*?<\/snippet>' 
        
                                   snippet_msg.content = re.sub( 
        
                                       regex, 
        
                                       "", 
        
                                       snippet_msg.content, 
        
                                       flags=re.DOTALL, 
        
                                   ) 
        
                               changed_file = self.handle_modify_file( 
        
                                   file_change_request, branch, sandbox=sandbox 
        
                               ) 
        
                           case "delete": 
        
                               contents = self.repo.get_contents( 
        
                                   file_change_request.filename, ref=branch 
        
                               ) 
        
                               self.repo.delete_file( 
        
                                   file_change_request.filename, 
        
                                   f"Deleted {file_change_request.filename}", 
        
                                   sha=contents.sha, 
        
                                   branch=branch, 
        
                               ) 
        
                               changed_file = True 
        
                           case "rename": 
        
                               contents = self.repo.get_contents( 
        
                                   file_change_request.filename, ref=branch 
        
                               ) 
        
                               self.repo.create_file( 
        
                                   file_change_request.instructions, 
        
                                   f"Renamed {file_change_request.filename} to {file_change_request.instructions}", 
        
                                   contents.decoded_content, 
        
                                   branch=branch, 
        
                               ) 
        
                               self.repo.delete_file( 
        
                                   file_change_request.filename, 
        
                                   f"Deleted {file_change_request.filename}", 
        
                                   sha=contents.sha, 
        
                                   branch=branch, 
        
                               ) 
        
                               changed_file = True 
        
                           case _: 
        
                               raise Exception( 
        
                                   f"Unknown change type {file_change_request.change_type}" 
        
                               ) 
        
                       print(f"Done processing {file_change_request.filename}.") 
        
                       yield file_change_request, changed_file 
        
                   except MaxTokensExceeded as e:

sweep/sweepai/handlers/on_ticket.py

Lines 790 to 906 in 3f6f817

    
               body="\n".join( 
        
                   [ 
        
                       checkbox_template.format( 
        
                           check=check, 
        
                           filename=filename, 
        
                           instructions=instructions.replace("\n", "\n> "), 
        
                       ) 
        
                       for filename, instructions, check in checkboxes_progress 
        
                   ] 
        
               ), 
        
           ) 
        
           issue = repo.get_issue(number=issue_number) 
        
           issue.edit(body=summary + "\n\n" + checkboxes_message) 
        
           delete_branch = False 
        
           generator = create_pr_changes( 
        
               file_change_requests, 
        
               pull_request, 
        
               sweep_bot, 
        
               username, 
        
               installation_id, 
        
               issue_number, 
        
               sandbox=sandbox, 
        
               chat_logger=chat_logger, 
        
           ) 
        
           table_message = tabulate( 
        
               [ 
        
                   (f"`{filename}`", instructions.replace("\n", "<br/>"), progress) 
        
                   for filename, instructions, progress in files_progress 
        
               ], 
        
               headers=["File", "Instructions", "Progress"], 
        
               tablefmt="pipe", 
        
           ) 
        
           logger.info(files_progress) 
        
           edit_sweep_comment(table_message, 4) 
        
           response = {"error": NoFilesException()} 
        
           for item in generator: 
        
               if isinstance(item, dict): 
        
                   response = item 
        
                   break 
        
               file_change_request, changed_file = item 
        
               if changed_file: 
        
                   commit_hash = repo.get_branch(pull_request.branch_name).commit.sha 
        
                   commit_url = f"https://github.com/{repo_full_name}/commit/{commit_hash}" 
        
                   files_progress = [ 
        
                       ( 
        
                           file, 
        
                           instructions, 
        
                           f"✅ Commit [`{commit_hash[:7]}`]({commit_url})", 
        
                       ) 
        
                       if file_change_request.filename == file 
        
                       else (file, instructions, progress) 
        
                       for file, instructions, progress in files_progress 
        
                   ] 
        
                   checkboxes_progress = [ 
        
                       (file, instructions, "X") 
        
                       if file_change_request.filename == file 
        
                       else (file, instructions, progress) 
        
                       for file, instructions, progress in checkboxes_progress 
        
                   ] 
        
                   checkboxes_message = collapsible_template.format( 
        
                       summary="Checklist", 
        
                       body="\n".join( 
        
                           [ 
        
                               checkbox_template.format( 
        
                                   check=check, 
        
                                   filename=filename, 
        
                                   instructions=instructions.replace("\n", "\n> "), 
        
                               ) 
        
                               for filename, instructions, check in checkboxes_progress 
        
                           ] 
        
                       ), 
        
                   ) 
        
                   issue = repo.get_issue(number=issue_number) 
        
                   issue.edit(body=summary + "\n\n" + checkboxes_message) 
        
               else: 
        
                   files_progress = [ 
        
                       (file, instructions, "❌ Failed") 
        
                       if file_change_request.filename == file 
        
                       else (file, instructions, progress) 
        
                       for file, instructions, progress in files_progress 
        
                   ] 
        
               logger.info(files_progress) 
        
               logger.info(f"Edited {file_change_request.filename}") 
        
               table_message = tabulate( 
        
                   [ 
        
                       (f"`{filename}`", instructions.replace("\n", "<br/>"), progress) 
        
                       for filename, instructions, progress in files_progress 
        
                   ], 
        
                   headers=["File", "Instructions", "Progress"], 
        
                   tablefmt="pipe", 
        
               ) 
        
               edit_sweep_comment(table_message, 4) 
        
           if not response.get("success"): 
        
               raise Exception(f"Failed to create PR: {response.get('error')}") 
        
           pr_changes = response["pull_request"] 
        
           edit_sweep_comment( 
        
               table_message 
        
               + "I have finished coding the issue. I am now reviewing it for completeness.", 
        
               4, 
        
           ) 
        
           review_message = f"Here are my self-reviews of my changes at [`{pr_changes.pr_head}`](https://github.com/{repo_full_name}/commits/{pr_changes.pr_head}).\n\n" 
        
           lint_output = None 
        
           try: 
        
               current_issue.delete_reaction(eyes_reaction.id) 
        
           except: 
        
               pass 
        
           # Clone repo and perform local tests (linters, formatters, GHA) 
        
           try: 
        
               lint_sandbox = Sandbox.from_token(username, user_token, repo) 
        
               if lint_sandbox is None: 
        
                   raise Exception("Sandbox is disabled")

Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path	Proposed Changes
`sweepai/utils/benchmark_modify_file.py`	Create sweepai/utils/benchmark_modify_file.py with contents: • Import the necessary modules at the beginning of the file. This includes the `time` module for benchmarking and the `modify_file` function from `diff.py`. • Define a function named `benchmark_modify_file` that takes a file path as an argument. • Inside the `benchmark_modify_file` function, open and read the file using the provided file path. • Still within the `benchmark_modify_file` function, record the current time before calling the `modify_file` function. • Call the `modify_file` function with the necessary arguments, including the context obtained from the file. • After the `modify_file` function call, record the current time again and calculate the difference to get the execution time. • Print the execution time of the `modify_file` function. • At the end of the file, add a conditional statement to call the `benchmark_modify_file` function when the script is run directly.

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Add benchmarking script for modify_file function
sweep/add-benchmarking-script

Description

This PR adds a benchmarking script for the modify_file function in the diff.py module. The benchmarking script is located at sweepai/utils/benchmark_modify_file.py. It allows for testing the performance of the modify_file function by parsing a file for the necessary context and measuring the execution time.

Summary of Changes

Created a new Python script benchmark_modify_file.py in the sweepai/utils directory.

Imported the necessary modules for benchmarking and the modify_file function from diff.py.

Defined the benchmark_modify_file function that takes a file path as an argument.

Opened and read the file using the provided file path.

Recorded the current time before and after calling the modify_file function to measure the execution time.

Printed the execution time of the modify_file function.

Please review and merge this PR to enable benchmarking of the modify_file function.

Step 4: ⌨️ Coding

File	Instructions	Progress	Error logs
`sweepai/utils/benchmark_modify_file.py`	Create sweepai/utils/benchmark_modify_file.py with contents: • Import the necessary modules at the beginning of the file. This includes the `time` module for benchmarking and the `modify_file` function from `diff.py`. • Define a function named `benchmark_modify_file` that takes a file path as an argument. • Inside the `benchmark_modify_file` function, open and read the file using the provided file path. • Still within the `benchmark_modify_file` function, record the current time before calling the `modify_file` function. • Call the `modify_file` function with the necessary arguments, including the context obtained from the file. • After the `modify_file` function call, record the current time again and calculate the difference to get the execution time. • Print the execution time of the `modify_file` function. • At the end of the file, add a conditional statement to call the `benchmark_modify_file` function when the script is run directly.	✅ Commit `00a2183`	No errors.

Step 5: 🔁 Code Review

Here are my self-reviews of my changes at sweep/add-benchmarking-script.

Here is the 1st review

Thanks for your contribution. There are a couple of changes that need to be made:

In the file sweepai/utils/benchmark_modify_file.py on lines 15-17, the modify_file method is not defined in the SweepBot class. Please ensure that this method is implemented in the SweepBot class or call the correct method.

In the same file on lines 26-28, the argument to the benchmark_modify_file function needs to be an actual file path. Please replace "path_to_file" with the path of the file you want to benchmark.

Please make these changes and submit a new pull request. If you need any help, feel free to ask.

I finished incorporating these changes.

🎉 Latest improvements to Sweep:

Use Sweep Map to break large issues into smaller sub-issues, perfect for large tasks like "Sweep (map): migrate from React class components to function components"
Getting Sweep to format before committing! Check out Sweep Sandbox Configs to set it up.
We released a demo of our chunker, where you can find the corresponding blog and code.

💡 To recreate the pull request edit the issue title or description.
^{Join Our Discord}

wwzeng1 added the sweep Assigns Sweep to an issue or pull request. label Aug 23, 2023

sweep-nightly bot linked a pull request Aug 23, 2023 that will close this issue

Add benchmarking script for modify_file function #1368

Open

wwzeng1 added sweep Assigns Sweep to an issue or pull request. and removed sweep Assigns Sweep to an issue or pull request. labels Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

wwzeng1 commented Aug 23, 2023 •

edited by sweep-nightly bot

Loading

sweep-nightly bot commented Aug 23, 2023 •

edited

Loading

Description

Summary of Changes

Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

Comments

wwzeng1 commented Aug 23, 2023 • edited by sweep-nightly bot Loading

Details

sweep-nightly bot commented Aug 23, 2023 • edited Loading

Here's the PR! #1368.

Step 1: 🔍 Code Search

Step 2: 🧐 Snippet Analysis

Step 3: 📝 Planning

Description

Summary of Changes

Step 4: ⌨️ Coding

Step 5: 🔁 Code Review

wwzeng1 commented Aug 23, 2023 •

edited by sweep-nightly bot

Loading

sweep-nightly bot commented Aug 23, 2023 •

edited

Loading