Who Drives Company-Owned OSS Projects: Internals or Externals Members? (2018)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Crawler
Dataset
Figures
Scripts
README.md
get_data.py

README.md

JBCS-2018

In this repository you will find all the necessary steps to replicate the method available in the paper "Who Drives Company-Owned OSS Projects: Internals or Externals Members?", published at JBCS 2018.

Reproducing the dataset:

If you want to create your own version of the dataset execute the file "get_data.py" [1] using Python 2.7. After the script execution, all the files will be saved in a folder called "Dataset", and you may need to allow this process in your system. We have already made available a ready copy of this folder in this repository [2]. Feel free to add new projects to the dataset during the process execution by adding them in line 410 of "get_data.py".

Dataset structure:

⋅⋅* Dataset:
⋅⋅⋅⋅⋅⋅* Project:
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* casual_contributors.csv (General information about casual contributors in the project)
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* external_contributors.csv (General information about external contributors in the project
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* closed_pull_requests_summary.csv (Summary with general information about closed pull requests)
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* merged_pull_requests_summary.csv (Summary with general information about merged pull requests)
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* merged_pull_requests_reviews.csv (Summary with information about reviews in merged pull requests)
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅* pull_requests_per_month.csv (Monthly distribution of pull-request open, closed and merged)

Reproducing images and statistical analysis:

All the analysis made in this paper, including the images, can be reproduced by executing the files available in the Scripts folder [3]. Use the R language to execute it. During the execution, a set of images will be saved in a "Figures" folder, and you may need to allow it in your system [4].

Author Notes:

The "Crawler" folder contains the back-end scripts used to extract data from the GitHub API. Feel free to use the scripts inside this folder in your research.

If you need any support: Send us an e-mail: fronchetti at usp . br