Skip to content

Commit

Permalink
Adding a notebook to make use of the TSSB-3M bugs dataset (#1425)
Browse files Browse the repository at this point in the history
Closes #1395

The ManySStuBs4J corpus is a collection of simple fixes to Java bugs,
designed for evaluating program repair techniques. We collect all
bug-fixing changes using the SZZ heuristic, and then filter these to
obtain a data set of small bug fix changes.

It is easy to put it into dialogue form:

User: Find the bug in the following code:
{CODE}
Reply: The bugfix can be described as follows:
{COMMIT_MESSAGE}
The fixed code is:
{FIXED-CODE}

As for now, I create prompts only with broken and fixed code, without
commit messages that must be scraped from GitHub. Still it should be
very useful.

## Contributing

Adding a way to incorporate commit messages into the prompts would be a
great contribution. This can be done by scraping the GitHub API for the
commit messages based on the commit hash, or by downloading the
repository with the full history and extracting the commit messages from
there.

Co-authored-by: Oliver Stanley <olivergestanley@gmail.com>
  • Loading branch information
RiccardoRiglietti and olliestanley committed Feb 21, 2023
1 parent 19c9ca7 commit ac41901
Show file tree
Hide file tree
Showing 2 changed files with 920 additions and 0 deletions.

0 comments on commit ac41901

Please sign in to comment.