In this repository you will find all the necessary steps to replicate the method available in the paper "Who Drives Company-Owned OSS Projects: Internals or Externals Members?", published at JBCS 2018.
If you want to create your own version of the dataset execute the file "get_data.py" [1] using Python 2.7. After the script execution, all the files will be saved in a folder called "Dataset", and you may need to allow this process in your system. We have already made available a ready copy of this folder in this repository [2]. Feel free to add new projects to the dataset during the process execution by adding them in line 410 of "get_data.py".
β
β
* Dataset:
β
β
β
β
β
β
* Project:
β
β
β
β
β
β
β
β
β
β
* casual_contributors.csv (General information about casual contributors in the project)
β
β
β
β
β
β
β
β
β
β
* external_contributors.csv (General information about external contributors in the project)
β
β
β
β
β
β
β
β
β
β
* closed_pull_requests_summary.csv (General information about closed pull requests)
β
β
β
β
β
β
β
β
β
β
* merged_pull_requests_summary.csv (General information about merged pull requests)
β
β
β
β
β
β
β
β
β
β
* merged_pull_requests_reviews.csv (General information about reviews in merged pull requests)
β
β
β
β
β
β
β
β
β
β
* pull_requests_per_month.csv (Monthly distribution of pull-requests open, closed and merged)
All the analysis made in this paper, including the images, can be reproduced by executing the files available in the Scripts folder [3]. Use the R language to execute it. During the execution, a set of images will be saved in a "Figures" folder, and you may need to allow it in your system [4].
The "Crawler" folder contains the back-end scripts used to extract data from the GitHub API. Feel free to use the scripts of this folder in your methodology.
If you need any support: Send us an e-mail: fronchetti at usp . br