This project was developed during the course of measurement and experimentation laboratory in software engineering. In addition, the information has been translated and modified following this repository.
GitHub repositories have a space dedicated to Issues. Issues are topics submitted by users and people who contribute to a repository, and serve to report issues found, ask questions, report vulnerabilities and so on.
An example we can look at is the issues page in the React Repository. Note that some issues are labeled with a label (example: 'Type: Bug'), however often this label needs to be manually entered by the user submitting the issue. Since issues are not correctly labeled, many of the bugs reported by users and contributors are not identified by repository maintainers.
The aim of this project is to create a mechanism to identify whether a issue reports a bug or not, so that in the future they can be automatically classified. In this way, the developers responsible for the repository will be able to more effectively filter reported bugs.
To carry out the project, we will use a pre-processed sample of the dataset GitHub Bugs Prediction, made available on the community platform Kaggle.
The dataset consists of three columns:
- Title - The title of the GitHub Issue
- Body - The GitHub Issue body
- Label - Represents the label of that issue (Bug; Feature; Question)