Skip to content

Automated detection of Observed Behaviour (OB), Expected Behaviour (EB) and Steps to Reproduce (S2R) in issue report descriptions

License

Notifications You must be signed in to change notification settings

ethemutku/issueReportQuality

Repository files navigation

Improving the Quality of Software Issue Report Descriptions in Turkish: An Industrial Case Study

We labeled 1,041 software issue reports at Softtech Inc (Softtech Inc.) as having observed behaviour (OB), expected behaviour (EB) and steps to reproduce (S2R), and the related sentences that indicate them. The reports are written in Turkish, an agglutinative language, meaning that whole sentences can be formed by adding suffixes to roots, especially verbs. Thus, we utilize morphological analysis to extract the features. Furthermore, we extract the patterns that indicate OB, EB and S2R in these sentences. We use the Zemberek tool [1] for morhological analysis. Due to security reasons, we are not able to publish the issue reports used in the studies. However, the scripts, which we used in the experiments can be found here.

Getting Started

The repository includes the following Jupyter Notebook scripts coded with Python 3.9:

  • single_project_qual_eval.ipynb
  • cross_project_qual_eval.ipynb

Single project evaluation uses all the reports in a single project to predict on the issue reports on the same project, while cross project evaluation uses 1 out of n projects in the dataset for testing and the remaining for training.

The following notebooks use the bert model (distilBert [2], specifically) for evaluation, while the scripts above use Logistic Regression, Linear SVC and Random Forest algorithms.

  • single_project_qual_eval_bert.ipynb
  • cross_project_qual_eval_bert.ipynb

Before running any script, the issue reports should have been downloaded and saved as a csv file. If you have a stop-word list, assign the list to the variable "stop_word_list".

[1] Akın A A, Akın M D (2007) Zemberek, an open source nlp framework for turkic languages. Structure, 10(2007), 1-5. [2] Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

About

Automated detection of Observed Behaviour (OB), Expected Behaviour (EB) and Steps to Reproduce (S2R) in issue report descriptions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published