This artifact contains the dataset used in the paper Understanding the Role of External Pull Requests in the NPM Ecosystem.
We provide 3 script files
preliminary_A1_2.ipynbfor the preliminary A1 and A2preliminary_A3.ipynbfor the preliminary A3rq1.ipynbfor RQ1
Note that RQ2, we used seperated spreed among three authors and count the result.
The appendix contains our quantitative and qualitative datasets, shown in Figure 1 in the paper. The dataset are provided via this repository and zenodo for the large dataset* (file size more than 100MB).
There are seven data files as described below:
filtered_dataset_with_label.csv- A filtered dataset contains 945,291 PR.*external_PR_dataset.csv- List of the External PR used in preliminary study, RQ1, and RQ2.*internal_PR_dataset.csv- List of the Internal PR used in preliminary study, RQ1, and RQ2.*bot_PR_dataset.csv- List of the Bot PR used in preliminary study, RQ1, and RQ2.*attention_label_ranking.json- The attention labels of each PR type with count using in RQ1.ranking_label_bot.csv- The labels appearing in the Bot PR are already ranked and be used in RQ1.ranking_label_external.csv- The labels appearing in the External PR are already ranked and be used in RQ1.ranking_label_internal.csv- The labels appearing in the Internal PR are already ranked and be used in RQ1.sampled_dataset_external_PR.csv- Result of manual classification of the External PRs used in RQ2.sampled_dataset_inside_PR.csv- Result of manual classification of the Internal PRs used in RQ2.sampled_dataset_bot_PR.csv- Result of manual classification of the Bot PRs used in RQ2.
Note that, 1-4 are the large dataset files.