New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge bug mining into master #135

Merged
merged 277 commits into from Jan 18, 2019

Conversation

Projects
None yet
5 participants
@tomecho
Copy link
Collaborator

tomecho commented Dec 22, 2017

fixes #120, fixes #123, fixes #138

Major Changes Introduced

  • All scripts that used to live only in the bug-mining branch are now ported to master.
  • Bug-mining now uses a dedicated working directory and does not touch the core framework before a project and all its bugs are ready to be promoted to the main database.
  • The create-project.pl script now takes care of the entire configuration and initialization -- no more manual steps required.
  • The merge-commit-db.pl script's behavior for issuing bid (bug id's) changed: first respect existing bug ids then issue higher bids to newer bugs. Previously, a lower bid indicated a newer bug.
  • The directories for a project's perl module and meta data can be globally configured in Defects4J to support bug mining in a dedicated working directory.
  • Project->checkout_vid now has an additional parameter is_bugmine; if set to 1, the checkout will skip the processing of meta data, which may not exist during bug mining.
  • New subroutines in each of the project modules (e.g., framework/core/Project/Lang.pm):
    • determine_layout: dynamically determine the directory layout and identify src and test directories for a checked-out revision. The results are cached in a file named dir-layout.csv.
    • initialize_revision: identify and log flaky or random tests -> bootstrapping routine, used during bug mining, to prepare a revision to be useful with the core framework.

@tomecho tomecho requested a review from rjust Dec 22, 2017

`$cmd`; $?==0 or confess("Couldn't checkout $TAG_POST_FIX");
my $rev1 = $self->lookup("${bid}f");
my $rev2 = $self->lookup("${bid}b");
# TODO: svn doesn't support diffing of binary files
# -> checkout and tag the pre-fix revision instead
$self->{_vcs}->export_diff($rev1, $rev2, $tmp_file);
$self->{_vcs}->apply_patch($work_dir, $tmp_file);
$self->export_diff($rev1, $rev2, $tmp_file, "src/");

This comment has been minimized.

@rjust

rjust Feb 4, 2018

Owner

What's the hard-coded "src/" argument?

This comment has been minimized.

@tomecho

tomecho Feb 4, 2018

Author Collaborator

I explicitly write src here to export the diff only from the src tree of the project, meaning test/ and other directories will be excluded from any diff.

This comment has been minimized.

@rjust

rjust Feb 4, 2018

Owner

The issue is that some projects store the sources in a differently named directory. For example, Chart's sources live in source. Also, isn't this diff supposed to include all the changes? This code computes and applies the diff from the post-fix revision to the pre-fix revision, which should be equivalent to checking out the pre-fix revision.

This comment has been minimized.

@tomecho

tomecho Feb 4, 2018

Author Collaborator

Okay, I will have to adjust that so it export the diff of the entire project.


=pod
$project->checkout_vid(vid [, work_dir, is_bugmine])
Checks out the provided version id (C<vid>) to F<work_dir>, and tags the the buggy AND
the fixed program version of this bug. Format of C<vid>: C<\d+[bf]>.
The working directory (C<work_dir>) is optional, the default is C<prog_root>.

This comment has been minimized.

@rjust

rjust Feb 4, 2018

Owner

This comment needs to be updated (work_dir should be prog_root, right?).

This comment has been minimized.

@tomecho

tomecho Feb 4, 2018

Author Collaborator

Yes.

@@ -95,6 +95,17 @@ project-specific build file ("project_id"/"project_id".build.xml) for the
<fail unless="ant.refid:all.manual.tests" />
</target>

<!--
Light weight sanity check for bug mining

This comment has been minimized.

@rjust

rjust Feb 4, 2018

Owner

Can you add one sentence about what the sanity check does or is supposed to do in general? Seems like this is checking for the required properties.

my $name = "commons-lang";
my $src = "src/main/java";
my $test = "src/test";
my $vcs = Vcs::Git->new($PID,
"$REPO_DIR/$name.git",
"$SCRIPT_DIR/projects/$PID/commit-db",
(shift // "$SCRIPT_DIR/projects/$PID/commit-db"),

This comment has been minimized.

@rjust

rjust Feb 4, 2018

Owner

This subroutine now has two parameters. Can you make this more explicit by assigning them early on (my (...) = @_)? The call to shift at this location may cause trouble in the future.

Note that this comment also applies to the other project modules.

@rjust

This comment has been minimized.

Copy link
Owner

rjust commented Feb 4, 2018

@tomecho, overall this looks good. Can you please update the PR by pulling in the latest changes from master and by addressing the minor comments? Once all changes from master are pulled in, all bugs will be reproduced by travis, which would be a great sanity check.

@tomecho

This comment has been minimized.

Copy link
Collaborator Author

tomecho commented Feb 4, 2018

Sounds good @rjust I should have that done by tomorrow evening.

jose and others added some commits Dec 7, 2018

Added module List::Util to the list of required Perl modules as the d…
…ownload-issues.pl script in the bug-mining framework requires two additional functions only available in recent versions: the 'all' function is available since version 1.33 and 'pairmap' function is available since 1.29.
Several command-line options were not enabled. Some are still not use…
…d, but I allow them to be set, and have enabled the query option to actually work, as that is needed for projects like GSON that do not label issues as 'bug'
@Greg4cr

This comment has been minimized.

Copy link
Collaborator

Greg4cr commented Dec 13, 2018

Currently, sub _init_version in initialize-revisions.pl does not work for projects that are subprojects of a greater project. That is, projects that are within a folder in the default work directory. For example, gson is actually in $work_dir/gson.

This means I get a "Unsupported build system" error, even though Gson is actually a Maven project - there's not a pom.xml file in the default work directory. I need to be able to point the script instead to $work_dir/gson, which does have a pom.xml.

This same issue affects the functionality in the project file too. There, I can make it work by altering the project file to account for the directory shift. Of course, I could also do this in initialize-revisions.pl, but I wonder if we could support subprojects in a cleaner manner that does not require one-off edits to core bug mining files. Any thoughts?

=head1 NAME
download-issues.pl -- Collect all issues from the project issue tracker.

This comment has been minimized.

@Greg4cr

Greg4cr Dec 20, 2018

Collaborator

Potential enhancement:
For Github, the issue URL identifies if whether the matching number of an issue or a pull request. For example:
40,FasterXML/jackson-core#40
39,FasterXML/jackson-core#39
"40" represents an issue, while "39" represents a pull request. We could add an optional argument to filter this file for only issues, removing pull requests identified by the regex. This should be optional, rather than mandatory, as some pull requests also identify issues (GSON has examples of this).

@Greg4cr
Copy link
Collaborator

Greg4cr left a comment

I have worked through the process with JacksonCore. I have only worked through a couple of the example bugs, but each stage of the bug mining process appears to work.

There are some compilation errors that prevent a full replication of the earlier process I followed. I'll look into those when I have more time. However, I suspect these are not a result of anything inherent to the bug mining process.

I think we can merge in the PR. After the holidays, I can give this a more extensive evaluation by working through the new projects.

jose and others added some commits Jan 18, 2019

@rjust rjust merged commit 8830c99 into rjust:master Jan 18, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment