Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep: Using SweepBot, write a benchmarking script that tests modify_file. This script should parse a file for the necessary context and then try to successfully make a code change using the methods in diff.py #1367

Open
1 task done
wwzeng1 opened this issue Aug 23, 2023 · 1 comment · May be fixed by #1368
Labels
sweep Assigns Sweep to an issue or pull request.

Comments

@wwzeng1
Copy link
Contributor

wwzeng1 commented Aug 23, 2023

Details

No response

Checklist
  • sweepai/utils/benchmark_modify_file.py

• Import the necessary modules at the beginning of the file. This includes the time module for benchmarking and the modify_file function from diff.py.
• Define a function named benchmark_modify_file that takes a file path as an argument.
• Inside the benchmark_modify_file function, open and read the file using the provided file path.
• Still within the benchmark_modify_file function, record the current time before calling the modify_file function.
• Call the modify_file function with the necessary arguments, including the context obtained from the file.
• After the modify_file function call, record the current time again and calculate the difference to get the execution time.
• Print the execution time of the modify_file function.
• At the end of the file, add a conditional statement to call the benchmark_modify_file function when the script is run directly.

@wwzeng1 wwzeng1 added the sweep Assigns Sweep to an issue or pull request. label Aug 23, 2023
@sweep-nightly
Copy link
Contributor

sweep-nightly bot commented Aug 23, 2023

Here's the PR! #1368.

💎 Sweep Pro: I used GPT-4 to create this ticket. You have 107 GPT-4 tickets left for the month. To retrigger Sweep, edit the issue.


Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description.

import difflib
import re
from sweepai.core.entities import SweepContext
from sweepai.utils.chat_logger import discord_log_error
def diff_contains_dups_or_removals(diff, new_code):
# The regex pattern for lines removed or added in the actual code
removed_line_pattern = r"^-.*"
added_line_pattern = r"^\+.*"
lines_removed = False
duplicate_lines_added = False
# Split the diff and new_code into separate lines
diff_lines = diff.split("\n")[3:] # Start from the third line
new_code_lines = [line.strip() for line in new_code.split("\n")]
# Check if there are removed lines
for line in diff_lines:
if re.match(removed_line_pattern, line):
lines_removed = True
# Check if there are duplicate lines added
added_lines = [
line[1:].strip() for line in diff_lines if re.match(added_line_pattern, line)
]
for line in added_lines:
if new_code_lines.count(line) > 1:
duplicate_lines_added = True
break
return lines_removed or duplicate_lines_added
def generate_diff(old_code, new_code):
old_code = old_code.strip()
new_code = new_code.strip()
diff = difflib.unified_diff(
old_code.splitlines(keepends=True), new_code.splitlines(keepends=True)
)
diff_text = "".join(diff)
return diff_text
def revert_whitespace_changes(original_file_str, modified_file_str):
original_lines = original_file_str.split("\n")
modified_lines = modified_file_str.split("\n")
diff = difflib.SequenceMatcher(None, original_lines, modified_lines)
final_lines = []
for opcode in diff.get_opcodes():
if opcode[0] == "equal" or opcode[0] == "replace":
# If the lines are equal or replace (means the change is not whitespace only)
# use original lines.
final_lines.extend(original_lines[opcode[1] : opcode[2]])
elif opcode[0] == "insert":
# If the lines are inserted in the modified file, check if it's just whitespace changes
# If it's just whitespace changes, ignore them.
for line in modified_lines[opcode[3] : opcode[4]]:
if line.strip() != "":
final_lines.append(line)
return "\n".join(final_lines)
def format_contents(file_contents, is_markdown=False):
"""
Add arbitrary postprocessing here, this affects files and diffs
"""
lines = file_contents.split("\n")
# Handle small files
if len(lines) <= 5:
start_idx = 0
end_idx = len(lines)
for idx, line in enumerate(lines):
if start_idx == 0 and line.strip().startswith("```"):
start_idx = idx + 1
if start_idx != 0 and line.strip().endswith("```"):
end_idx = idx
lines = lines[start_idx:end_idx]
return "\n".join(lines)
first_three_lines = lines[:3]
last_three_lines = lines[-3:]
first_line_idx = 0
last_line_idx = 3
for idx, line in enumerate(first_three_lines):
line = line.strip()
if line.startswith("```"):
first_line_idx = max(first_line_idx, idx + 1)
if "user_code>" in line:
first_line_idx = max(first_line_idx, idx + 1)
for idx, line in enumerate(last_three_lines): # Check in reverse
line = line.strip()
if line.endswith("```"):
last_line_idx = min(idx, last_line_idx)
if "user_code>" in line:
last_line_idx = min(idx, last_line_idx)
first_three_lines = first_three_lines[first_line_idx:]
last_three_lines = last_three_lines[:last_line_idx]
lines = first_three_lines + lines[3:-3] + last_three_lines
return "\n".join(lines)
def generate_new_file(
modify_file_response: str, old_file_content: str, chunk_offset: int = 0
) -> str:
result_file = ""
old_file_lines = old_file_content.split("\n")
# Extract content between <new_file> tags
new_file = re.search(
r".*?<new_file>\n?(.*)\n<\/new_file>", modify_file_response, re.DOTALL
).group(1)
if "<copy_lines" not in new_file:
return new_file
# v5
result = []
lines = new_file.split("\n")
for line_number, line in enumerate(lines):
# Todo: make it support 1 number only
matches = re.finditer(r"<copy_lines\s(\d+-\d+)/?>", line)
copied_lines = False
for match in matches:
copied_lines = True
start, end = match.group(1).split("-")
start, end = int(start) - 1, int(end) - 1
if chunk_offset != 0: # Correct for the line numbers being much higher
start -= chunk_offset
end -= chunk_offset
start = max(0, start)
end = min(len(old_file_lines) - 1, end)
replacements = old_file_lines[start : end + 1]
replacements_str = "\n".join(replacements)
line = line.replace(match.group(0), replacements_str)
# check if line was incorrectly duplicated
append = True
if not copied_lines: # if bot generated, and line before is not bot generated
if len(result) > 0:
# Get last line in results
last_group = result[-1]
# last_line = last_group
if "\n" in last_group:
last_line = last_group[
last_group.rindex("\n") + 1 :
] # if its multiple lines
# if last line is same is current line
if last_line == line:
append = False
if append:
result.append(line)
result = "\n".join(result)
return result
NOT_FOUND = "NOT_FOUND"
IDENTICAL_LINES = "NO MATCHES FOUND"
MULTIPLE_HITS = "MULTIPLE_HITS"
INCOMPLETE_MATCH = "INCOMPLETE_MATCH"
def match_string(
original, search, start_index=None, exact_match=False, ignore_comments=False
):
index = -1
max_similarity = 0
current_hits = 0
# sliding window comparison from original to search
# Todo: 2 pointer approach (find start, then find end)
# Todo: use rapidfuzz to compute fuzzy similarity over code
for i in range(start_index or 0, len(original)):
count = 0
for j in range(len(search)):
if i + j >= len(original):
continue
original_line = original[i + j]
if ignore_comments:
# Remove comments
original_line = original_line.rsplit("#")[0].rsplit("//")[0]
match = (
search[j] == original_line
if exact_match
else search[j].strip() == original_line.strip()
)
if match:
count += 1
# If searching for previous snippet (like regex)
if start_index is not None and search[j] == original[i + j]:
count += 0.001
if count > max_similarity:
index = i
max_similarity = count
current_hits = 1
elif count == max_similarity:
current_hits += 1
return index, max_similarity, current_hits
def lstrip_max(s, chars, max_count):
count = 0
for char in s:
if char in chars and count < max_count:
count += 1
else:
break
return s[count:]
def get_snippet_with_padding(original, index, search):
snippet = original[index : index + len(search)]
# Fix whitespace
if len(search[0]) - len(search[0].lstrip()) == 0:
spaces = " " * (len(snippet[0]) - len(snippet[0].lstrip()))
strip = False
else: # Do diff between snippet and search
# Todo(lukejagg): This might need to be more robust.
# Check multiple lines for their whitespace
min_whitespace = min([len(s) - len(s.lstrip()) for s in search])
spaces = " " * min_whitespace
strip = True
return snippet, spaces, strip
def sliding_window_replacement(
original, search, replace, search_context_before=None, **kwargs
):
status, replace_index = None, None
# First, do check for "..." (example: define method, then put ... to ignore initial lines)
canDoDotCheck = not any(
"..." in line.strip() for line in original
) # If ... not in original file
if canDoDotCheck:
# Check first 3 lines for '...'
first_line_idx = -1
for i in range(len(search)):
if search[i].strip() == "...":
first_line_idx = i
break
# Do this for replace too
first_line_idx_replace = -1
for i in range(len(replace)):
if replace[i].strip() == "...":
first_line_idx_replace = i
break
if first_line_idx == 0 and first_line_idx_replace == 0:
search = search[1:]
replace = replace[1:]
elif (
first_line_idx == len(search) - 1
and first_line_idx_replace == len(replace) - 1
):
search = search[:first_line_idx]
replace = replace[:first_line_idx_replace]
elif first_line_idx != -1 and first_line_idx_replace != -1:
# SPLIT INTO TWO PARTS
# TODO(lukejagg): pass in the first and last lines as context for matching (so ambiguous ... can be matched)
search_context_before = search[:first_line_idx]
original, replace_index, status = sliding_window_replacement(
original,
search[first_line_idx + 1 :],
replace[first_line_idx_replace + 1 :],
search_context_before,
**kwargs,
)
search = search[:first_line_idx]
replace = replace[:first_line_idx_replace]
exact_match = kwargs.get("exact_match", False)
ignore_comments = kwargs.get("ignore_comments", False)
index, max_similarity, current_hits = match_string(
original, search, exact_match=exact_match, ignore_comments=ignore_comments
)
# No changes could be found. Return original code.
if max_similarity == 0:
if not ignore_comments: # In case Sweep decided not to include comments
return sliding_window_replacement(
original,
search,
replace,
ignore_comments=True,
**{k: v for k, v in kwargs.items() if k != "ignore_comments"},
)
print("WARNING: No identical lines")
return original, None, IDENTICAL_LINES
if current_hits > 1:
success = False
if search_context_before:
old_index, _, current_hits = match_string(
original,
search_context_before,
exact_match=exact_match,
)
_, old_spaces, _ = get_snippet_with_padding(
original, old_index, search_context_before
)
if current_hits == 1:
index, max_similarity, current_hits = match_string(
original,
[old_spaces + s for s in search],
start_index=old_index + 1,
exact_match=exact_match,
)
current_hits = 1 # Ignore multiple hits, use first complete comparison
success = True
if not success:
if (
len(replace) == 1 and not replace[0] and not search_context_before
): # Backup 1: independent line matches
exact_matches = [line for line in original if line in search]
# If there are no duplicates and all lines have a match
if len(set(exact_matches)) == len(search):
# Remove all of those corresponding lines in the content
original = [line for line in original if line not in search]
return original, None, None
if not exact_match: # Backup 2: exact line matches
return sliding_window_replacement(
original,
search,
replace,
exact_match=True,
**{k: v for k, v in kwargs.items() if k != "exact_match"},
)
print("WARNING: Multiple hits")
return original, None, MULTIPLE_HITS
# Todo(lukejagg): Remove unreachable code
if index == -1:
# First, try matching beginning of search
return original, None, NOT_FOUND
# Todo(lukejagg): this doesn't seem to work, add later
# if int(max_similarity) != len(search):
# return original, None, INCOMPLETE_MATCH
# if max_similarity != len(search):
snippet, spaces, strip = get_snippet_with_padding(original, index, search)
if strip:
# Todo: What if whitespace in search is incorrect
first_line_spaces = min([len(s) - len(s.lstrip()) for s in search])
modified = [
spaces + (lstrip_max(line, [" "], first_line_spaces) if strip else line)
for line in replace
]
else:
modified = [spaces + line for line in replace]
# replaced original with modified
original = original[:index] + modified + original[index + len(search) :]
return original, index, None
def get_all_diffs(modify_file_response: str) -> str:
matches = re.findall(
r"(<<<<.*?\n(.*?)\n====[^\n=]*\n(.*?)\n?>>>>)", modify_file_response, re.DOTALL
)
result = "\n\n".join([_match for _match, *_ in matches])
return result
def generate_new_file_from_patch(
modify_file_response: str,
old_file_content: str,
chunk_offset: int = 0,
sweep_context: SweepContext = None,
):
old_file_lines = old_file_content.split("\n")
# Extract content between <new_file> tags
matches = re.findall(
r"<<<<.*?\n(.*?)\n====[^\n=]*\n(.*?)\n?>>>>", modify_file_response, re.DOTALL
)
errors = []
if not old_file_content.strip():
# If old file is empty, just return the first match
print(matches)
search_and_replace, *_ = matches
return search_and_replace[1]
for search, replace in matches:
# Remove trailing tags
if search.lstrip().startswith("<old_file>") and replace.lstrip().startswith(
"<old_file"
):
search = search.lstrip()[len("<old_file>") :]
replace = replace.lstrip()[len("<old_file>") :]
# Remove trailing tags
if search.rstrip().endswith("</old_file>") and replace.rstrip().endswith(
"</old_file"
):
search = search.rstrip()[: -len("</old_file>")]
replace = replace.rstrip()[: -len("</old_file>")]
old_file_lines, replace_index, status = sliding_window_replacement(
old_file_lines, search.split("\n"), replace.split("\n")
)
if status is not None:
s = search.replace("`", "\\`")
r = replace.replace("`", "\\`")
errors.append(f"- {status}\n```\n{s}\n```\n\n```\n{r}\n```")
if len(errors) > 0:
log = "\n\n".join(errors)
discord_log_error(
f"{sweep_context.issue_url}\nModify Parsing Errors {'gpt3.5' if sweep_context.use_faster_model else 'gpt4'}: \n"
+ log,
priority=0 if sweep_context.use_faster_model else 1,
)
result = "\n".join(old_file_lines)
return result, errors
def join_contents_k(first, second, k):
"""
Join contents together removing k duplicate lines
"""
first_lines = first.split("\n")
second_lines = second.split("\n")
for i in range(k, 0, -1):
if len(first_lines) < k or len(second_lines) < k:
continue
if first_lines[-i:] == second_lines[:i]:
return "\n".join(first_lines) + "\n" + "\n".join(second_lines[i:])
return "\n".join(first_lines) + "\n" + "\n".join(second_lines)
def is_markdown(filename):
return (
filename.endswith(".md")
or filename.endswith(".rst")
or filename.endswith(".txt")

if do_map:
subissues: list[ProposedIssue] = sweep_bot.generate_subissues()
edit_sweep_comment(
f"I'm creating the following subissues:\n\n"
+ "\n\n".join(
[
f"* #{subissue.title}:\n> "
+ subissue.body.replace("\n", "\n> ")
for subissue in subissues
]
),
3,
)
for subissue in tqdm(subissues):
subissue.issue_id = repo.create_issue(
title="Sweep: " + subissue.title,
body=subissue.body + f"\n\nParent issue: #{issue_number}",
assignee=username,
).number
subissues_checklist = "\n\n".join(
[
f"- [ ] #{subissue.issue_id}\n\n> "
+ f"**{subissue.title}**\n{subissue.body}".replace("\n", "\n> ")
for subissue in subissues
]
)
current_issue.edit(
body=summary + "\n\n---\n\nChecklist:\n\n" + subissues_checklist
)
edit_sweep_comment(
f"I finished creating the subissues! Track them at:\n\n"
+ "\n".join(f"* #{subissue.issue_id}" for subissue in subissues),
4,
)
edit_sweep_comment(f"N/A", 5)
edit_sweep_comment(f"I finished creating all the subissues.", 6)
return {"success": True}
# COMMENT ON ISSUE
# TODO: removed issue commenting here
logger.info("Fetching files to modify/create...")
file_change_requests, plan = sweep_bot.get_files_to_change()
if not file_change_requests:
if len(title + summary) < 60:
edit_sweep_comment(
"Sorry, I could not find any files to modify, can you please provide more details? Please make sure that the title and summary of the issue are at least 60 characters.",
-1,
)
else:
edit_sweep_comment(
"Sorry, I could not find any files to modify, can you please provide more details?",
-1,
)
raise Exception("No files to modify.")
sweep_bot.summarize_snippets(plan)
file_change_requests = sweep_bot.validate_file_change_requests(
file_change_requests
)
table = tabulate(
[
[
f"`{file_change_request.filename}`",
file_change_request.instructions_display.replace(
"\n", "<br/>"
).replace("```", "\\```"),
]
for file_change_request in file_change_requests
],
headers=["File Path", "Proposed Changes"],
tablefmt="pipe",
)
edit_sweep_comment(
"From looking through the relevant snippets, I decided to make the following modifications:\n\n"
+ table
+ "\n\n",
2,
)
# TODO(lukejagg): Generate PR after modifications are made
# CREATE PR METADATA
logger.info("Generating PR...")
pull_request = sweep_bot.generate_pull_request()
pull_request_content = pull_request.content.strip().replace("\n", "\n>")
pull_request_summary = f"**{pull_request.title}**\n`{pull_request.branch_name}`\n>{pull_request_content}\n"
edit_sweep_comment(
f"I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:\n\n{pull_request_summary}",
3,
)
logger.info("Making PR...")
files_progress = [
(
file_change_request.filename,
file_change_request.instructions_display,
"⏳ In Progress",
)
for file_change_request in file_change_requests
]
checkboxes_progress = [
(file_change_request.filename, file_change_request.instructions, " ")
for file_change_request in file_change_requests
]
checkboxes_message = collapsible_template.format(
summary="Checklist",

from loguru import logger
from sweepai.core.chat import Function
from sweepai.utils.diff import format_contents
modify_file_function = Function(
name="modify_file",
description="Edits the code in a file. Use start_line and end_line to completely cover the line indexes of code that should be replaced. Indent and format the code in the edits. Output the code in the order it should appear in the file. Make sure start_line and end_line do not overlap between code edits.",
parameters={
"type": "object",
"properties": {
"file_name": {
"type": "string",
"description": "The name of the file to modify.",
},
"code_edits": {
"type": "array",
"items": {
"type": "object",
"properties": {
"start_line": {
"type": "integer",
"description": "The index where the code should start being inserted/replaced.",
},
"end_line": {
"type": "integer",
"description": "The index where the code should stop being inserted/replaced. Add 1 to this number to include the line.",
},
"inserted_code": {
"type": "string",
"description": "Only the new code to insert into the file. Indent and format this code properly using spaces, keeping in mind the entire block will be affected by num_indents. To delete a line, set this to '' (single quoted empty string).",
},
"num_indents": {
"type": "integer",
"description": "Use this to indent the entire inserted_code. BE SURE to match the indentation to be inline. There will be two spaces for however many num_indents are set. num_indents can be set to 0, but ONLY IF NEEDED. When it is ambiguous, set as many num_indents as possible. ",
},
},
"required": ["start_line", "end_line", "code", "num_indents"],
},
"description": "An array of edits. Each `code_edit` represents a slice of the code split by newlines and delimited by `start_line` and `end_line`. Both `start_line` and `end_line` are zero-indexed and inclusive.",
},
},
"required": ["file_name", "code_edits"],
},
)
def apply_code_edits(file_contents, code_edits):
modifications = []
for edit in code_edits:
start_line = int(edit["start_line"])
end_line = int(edit["end_line"])
new_code = format_contents(edit["inserted_code"])
# Indentation
indentation = int(edit["num_indents"])
logger.info(f"The code {new_code} has {indentation} indents")
# Starts with or ends with "" should be swapped to '' for json
if len(new_code) >= 2 and new_code[:1] == '""':
new_code = "''" + new_code[2:]
elif len(new_code) >= 2 and new_code[-2:] == '""':
new_code = new_code[:-2] + "''"
new_code = edit["inserted_code"].split("\n")
modifications.append((start_line, end_line, new_code, indentation))
# Sort modifications by start line in reverse order
modifications.sort(key=lambda x: x[0], reverse=True)
lines = file_contents.split("\n")
for start_line, end_line, new_code, indentation in modifications:
if start_line > end_line:
logger.error(f"Start line {start_line} is greater than end line {end_line}")
continue
if start_line < 0:
logger.error(f"Start line {start_line} is less than 0")
continue
if end_line > len(lines) - 1:
logger.error(
f"End line {end_line} is greater than the number of lines in the file {len(lines)}"
)
continue
# Handle duplicate lines between the existing code and new code
indents = " " * indentation
if (
start_line > 0
and end_line < len(lines)
and new_code[0] == lines[start_line - 1]
and new_code[-1] == lines[end_line]
):
new_code = new_code[1:-1]
new_code = [indents + line for line in new_code]
lines[start_line:end_line] = new_code
continue
elif start_line > 0 and new_code[0] == lines[start_line - 1]:
new_code = new_code[1:]
new_code = [indents + line for line in new_code]
lines[start_line - 1 : end_line + 1] = new_code # Exit and merge first line
continue
elif end_line < len(lines) and new_code[-1] == lines[end_line]:
new_code = new_code[:-1]
new_code = [indents + line for line in new_code]
lines[start_line:end_line] = new_code # Exit and merge last line
continue
# Check index error
if end_line > len(lines) - 1:
end_line = len(lines) - 1
new_code = [indents + line for line in new_code]
lines[start_line : end_line + 1] = new_code # Start and end are inclusive

contents_line_numbers: str = "",
branch=None,
chunking: bool = False,
chunk_offset: int = 0,
retries: int = 1,
) -> tuple[str, str]:
for count in range(retries):
key = f"file_change_modified_{file_change_request.filename}"
file_markdown = is_markdown(file_change_request.filename)
# TODO(sweep): edge case at empty file
message = modify_file_prompt_3.format(
filename=file_change_request.filename,
instructions=file_change_request.instructions,
code=contents_line_numbers,
line_count=contents.count("\n") + 1,
)
try:
if chunking:
# TODO (sweep): make chunking / streaming better
message = chunking_prompt + message
modify_file_response = self.chat(
message,
message_key=key,
)
self.delete_messages_from_chat(key)
else:
modify_file_response = self.chat(
message,
message_key=key,
)
except Exception as e: # Check for max tokens error
if "max tokens" in str(e).lower():
logger.error(
f"Max tokens exceeded for {file_change_request.filename}"
)
raise MaxTokensExceeded(file_change_request.filename)
try:
logger.info(
f"generate_new_file with contents: {contents} and modify_file_response: {modify_file_response}"
)
new_file, errors = generate_new_file_from_patch(
modify_file_response,
contents,
chunk_offset=chunk_offset,
sweep_context=self.sweep_context,
)
new_file = format_contents(new_file, file_markdown)
commit_message_match = re.search(
'Commit message: "(?P<commit_message>.*)"', modify_file_response
)
if commit_message_match:
commit_message = commit_message_match.group("commit_message")
else:
commit_message = f"Updated {file_change_request.filename}"
commit_message = commit_message[: min(len(commit_message), 50)]
# self.delete_messages_from_chat(key)
# proposed_diffs = get_all_diffs(modify_file_response)
# proposed_diffs = (
# f"<proposed_diffs>\n{proposed_diffs}\n</proposed_diffs>\n\n"
# if proposed_diffs
# else ""
# )
# validation step
# logger.info("Validating file change request...")
# new_diffs = self.chat(
# code_repair_modify_prompt.format(
# filename=file_change_request.filename,
# instructions=file_change_request.instructions,
# code=new_file,
# ),
# message_key=key + "-validation",
# )
# final_file, errors = generate_new_file_from_patch(
# new_diffs,
# new_file,
# chunk_offset=chunk_offset,
# sweep_context=self.sweep_context,
# )
# final_file = format_contents(final_file, file_markdown)
# logger.info("Done validating file change request")
# return final_file, commit_message
return new_file, commit_message
except Exception as e:
tb = traceback.format_exc()
logger.warning(
f"Failed to parse. Retrying for the {count}th time. Received error {e}\n{tb}"
)
self.delete_messages_from_chat(key)
continue
raise Exception(f"Failed to parse response after {retries} attempts.")
def change_files_in_github(
self,
file_change_requests: list[FileChangeRequest],
branch: str,
blocked_dirs: list[str] = [],
sandbox=None,
):
# should check if branch exists, if not, create it
logger.debug(file_change_requests)
num_fcr = len(file_change_requests)
completed = 0
for _, changed_file in self.change_files_in_github_iterator(
file_change_requests, branch, blocked_dirs, sandbox=sandbox
):
if changed_file:
completed += 1
return completed, num_fcr
def change_files_in_github_iterator(
self,
file_change_requests: list[FileChangeRequest],
branch: str,
blocked_dirs: list[str],
sandbox=None,
) -> Generator[tuple[FileChangeRequest, bool], None, None]:
# should check if branch exists, if not, create it
logger.debug(file_change_requests)
num_fcr = len(file_change_requests)
completed = 0
added_modify_hallucination = False
for file_change_request in file_change_requests:
changed_file = False
try:
if self.is_blocked(file_change_request.filename, blocked_dirs)[
"success"
]:
logger.info(
f"Skipping {file_change_request.filename} because it is blocked."
)
continue
print(
f"Processing {file_change_request.filename} for change type {file_change_request.change_type}..."
)
match file_change_request.change_type:
case "create":
# Add example for more consistent generation
if not added_modify_hallucination:
added_modify_hallucination = True
# Add hallucinated example for better parsing
for message in modify_file_hallucination_prompt:
self.messages.append(Message(**message))
changed_file = self.handle_create_file(
file_change_request, branch, sandbox=sandbox
)
case "modify":
# Add example for more consistent generation
if not added_modify_hallucination:
added_modify_hallucination = True
# Add hallucinated example for better parsing
for message in modify_file_hallucination_prompt:
self.messages.append(Message(**message))
# Remove snippets from this file if they exist
snippet_msgs = [
m for m in self.messages if m.key == BOT_ANALYSIS_SUMMARY
]
if len(snippet_msgs) > 0: # Should always be true
snippet_msg = snippet_msgs[0]
# Use regex to remove this snippet from the message
file = re.escape(file_change_request.filename)
regex = rf'<snippet source="{file}:\d*-?\d*.*?<\/snippet>'
snippet_msg.content = re.sub(
regex,
"",
snippet_msg.content,
flags=re.DOTALL,
)
changed_file = self.handle_modify_file(
file_change_request, branch, sandbox=sandbox
)
case "delete":
contents = self.repo.get_contents(
file_change_request.filename, ref=branch
)
self.repo.delete_file(
file_change_request.filename,
f"Deleted {file_change_request.filename}",
sha=contents.sha,
branch=branch,
)
changed_file = True
case "rename":
contents = self.repo.get_contents(
file_change_request.filename, ref=branch
)
self.repo.create_file(
file_change_request.instructions,
f"Renamed {file_change_request.filename} to {file_change_request.instructions}",
contents.decoded_content,
branch=branch,
)
self.repo.delete_file(
file_change_request.filename,
f"Deleted {file_change_request.filename}",
sha=contents.sha,
branch=branch,
)
changed_file = True
case _:
raise Exception(
f"Unknown change type {file_change_request.change_type}"
)
print(f"Done processing {file_change_request.filename}.")
yield file_change_request, changed_file
except MaxTokensExceeded as e:

body="\n".join(
[
checkbox_template.format(
check=check,
filename=filename,
instructions=instructions.replace("\n", "\n> "),
)
for filename, instructions, check in checkboxes_progress
]
),
)
issue = repo.get_issue(number=issue_number)
issue.edit(body=summary + "\n\n" + checkboxes_message)
delete_branch = False
generator = create_pr_changes(
file_change_requests,
pull_request,
sweep_bot,
username,
installation_id,
issue_number,
sandbox=sandbox,
chat_logger=chat_logger,
)
table_message = tabulate(
[
(f"`{filename}`", instructions.replace("\n", "<br/>"), progress)
for filename, instructions, progress in files_progress
],
headers=["File", "Instructions", "Progress"],
tablefmt="pipe",
)
logger.info(files_progress)
edit_sweep_comment(table_message, 4)
response = {"error": NoFilesException()}
for item in generator:
if isinstance(item, dict):
response = item
break
file_change_request, changed_file = item
if changed_file:
commit_hash = repo.get_branch(pull_request.branch_name).commit.sha
commit_url = f"https://github.com/{repo_full_name}/commit/{commit_hash}"
files_progress = [
(
file,
instructions,
f"✅ Commit [`{commit_hash[:7]}`]({commit_url})",
)
if file_change_request.filename == file
else (file, instructions, progress)
for file, instructions, progress in files_progress
]
checkboxes_progress = [
(file, instructions, "X")
if file_change_request.filename == file
else (file, instructions, progress)
for file, instructions, progress in checkboxes_progress
]
checkboxes_message = collapsible_template.format(
summary="Checklist",
body="\n".join(
[
checkbox_template.format(
check=check,
filename=filename,
instructions=instructions.replace("\n", "\n> "),
)
for filename, instructions, check in checkboxes_progress
]
),
)
issue = repo.get_issue(number=issue_number)
issue.edit(body=summary + "\n\n" + checkboxes_message)
else:
files_progress = [
(file, instructions, "❌ Failed")
if file_change_request.filename == file
else (file, instructions, progress)
for file, instructions, progress in files_progress
]
logger.info(files_progress)
logger.info(f"Edited {file_change_request.filename}")
table_message = tabulate(
[
(f"`{filename}`", instructions.replace("\n", "<br/>"), progress)
for filename, instructions, progress in files_progress
],
headers=["File", "Instructions", "Progress"],
tablefmt="pipe",
)
edit_sweep_comment(table_message, 4)
if not response.get("success"):
raise Exception(f"Failed to create PR: {response.get('error')}")
pr_changes = response["pull_request"]
edit_sweep_comment(
table_message
+ "I have finished coding the issue. I am now reviewing it for completeness.",
4,
)
review_message = f"Here are my self-reviews of my changes at [`{pr_changes.pr_head}`](https://github.com/{repo_full_name}/commits/{pr_changes.pr_head}).\n\n"
lint_output = None
try:
current_issue.delete_reaction(eyes_reaction.id)
except:
pass
# Clone repo and perform local tests (linters, formatters, GHA)
try:
lint_sandbox = Sandbox.from_token(username, user_token, repo)
if lint_sandbox is None:
raise Exception("Sandbox is disabled")


Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path Proposed Changes
sweepai/utils/benchmark_modify_file.py Create sweepai/utils/benchmark_modify_file.py with contents:
• Import the necessary modules at the beginning of the file. This includes the time module for benchmarking and the modify_file function from diff.py.
• Define a function named benchmark_modify_file that takes a file path as an argument.
• Inside the benchmark_modify_file function, open and read the file using the provided file path.
• Still within the benchmark_modify_file function, record the current time before calling the modify_file function.
• Call the modify_file function with the necessary arguments, including the context obtained from the file.
• After the modify_file function call, record the current time again and calculate the difference to get the execution time.
• Print the execution time of the modify_file function.
• At the end of the file, add a conditional statement to call the benchmark_modify_file function when the script is run directly.

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Add benchmarking script for modify_file function
sweep/add-benchmarking-script

Description

This PR adds a benchmarking script for the modify_file function in the diff.py module. The benchmarking script is located at sweepai/utils/benchmark_modify_file.py. It allows for testing the performance of the modify_file function by parsing a file for the necessary context and measuring the execution time.

Summary of Changes

  • Created a new Python script benchmark_modify_file.py in the sweepai/utils directory.
  • Imported the necessary modules for benchmarking and the modify_file function from diff.py.
  • Defined the benchmark_modify_file function that takes a file path as an argument.
  • Opened and read the file using the provided file path.
  • Recorded the current time before and after calling the modify_file function to measure the execution time.
  • Printed the execution time of the modify_file function.

Please review and merge this PR to enable benchmarking of the modify_file function.


Step 4: ⌨️ Coding

File Instructions Progress Error logs
sweepai/utils/benchmark_modify_file.py Create sweepai/utils/benchmark_modify_file.py with contents:
• Import the necessary modules at the beginning of the file. This includes the time module for benchmarking and the modify_file function from diff.py.
• Define a function named benchmark_modify_file that takes a file path as an argument.
• Inside the benchmark_modify_file function, open and read the file using the provided file path.
• Still within the benchmark_modify_file function, record the current time before calling the modify_file function.
• Call the modify_file function with the necessary arguments, including the context obtained from the file.
• After the modify_file function call, record the current time again and calculate the difference to get the execution time.
• Print the execution time of the modify_file function.
• At the end of the file, add a conditional statement to call the benchmark_modify_file function when the script is run directly.
✅ Commit 00a2183 No errors.

Step 5: 🔁 Code Review

Here are my self-reviews of my changes at sweep/add-benchmarking-script.

Here is the 1st review

Thanks for your contribution. There are a couple of changes that need to be made:

  • In the file sweepai/utils/benchmark_modify_file.py on lines 15-17, the modify_file method is not defined in the SweepBot class. Please ensure that this method is implemented in the SweepBot class or call the correct method.
  • In the same file on lines 26-28, the argument to the benchmark_modify_file function needs to be an actual file path. Please replace "path_to_file" with the path of the file you want to benchmark.

Please make these changes and submit a new pull request. If you need any help, feel free to ask.

I finished incorporating these changes.


🎉 Latest improvements to Sweep:

  • Use Sweep Map to break large issues into smaller sub-issues, perfect for large tasks like "Sweep (map): migrate from React class components to function components"
  • Getting Sweep to format before committing! Check out Sweep Sandbox Configs to set it up.
  • We released a demo of our chunker, where you can find the corresponding blog and code.

💡 To recreate the pull request edit the issue title or description.
Join Our Discord

@sweep-nightly sweep-nightly bot linked a pull request Aug 23, 2023 that will close this issue
@wwzeng1 wwzeng1 added sweep Assigns Sweep to an issue or pull request. and removed sweep Assigns Sweep to an issue or pull request. labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sweep Assigns Sweep to an issue or pull request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant