Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Cleansing Manual #2

Open
masud-technope opened this issue May 24, 2019 · 0 comments
Open

Data Cleansing Manual #2

masud-technope opened this issue May 24, 2019 · 0 comments

Comments

@masud-technope
Copy link
Owner

masud-technope commented May 24, 2019

Phase I: Steps to be followed

  1. Select an issue report like this one
  2. Determine whether it discusses about a bug or a new feature.
  3. For example, this is a new feature, but this is a bug.
  4. Take a spreadsheet for each subject system, and mark its issue IDs as either new feature (NF) or bug (B).
  5. We have 8 subject systems. So, please create separate files for the systems.
  6. We have 2,885 issue reports from 8 systems. This should take a few days I guess.

Phase II: Steps to be followed

  1. Check the changed files for each bug report or feature request. For example, this is the changeset for this bug report.

  2. Count the number of changed Java files in each change set.

  3. If it is only one, then it is a valid changeset.

  4. If the count<=5, take a close look at the files. Are they really related to the bug report/feature request? Do they look related to the bug report or feature request? If yes, mark it as a valid changeset. You should spend at most 3 minutes for this.

  5. If the count>5, take a closer look at the changeset. Are they really related to the bug report/feature request?. According to my experience, they could have changed files which are not related to the bug fix or the feature implementation. If you find that a changeset contains one or more files unrelated to either the bug fix or the feature implementation, mark the changeset as bloated changeset. For example, this is definitely a bloated changeset for the issue report #263537. You should spend at most 6 minutes for this. If you cannot decide within 6 minutes, just mark it as bloated changeset.

  6. The output format for each issue report entry:
    BugID, #ChangedFiles, ** #Valid/Bloated**
    Please create separate files for individual subject systems.

  7. You can use code such as Valid Changeset=VC and Bloated Changeset=BC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant