Artifacts (scripts) and data used in the SBES 2022 paper: Towards Merge Conflict Resolution by Combining Existing Lines of Code and JSERD 2023 submission: How code composition strategies affect merge conflict resolution?
These artifacts are related to the analysis of merge conflicts that were resolved by the developers using a combination of the conflicting lines. The detailed methodology is described in the paper.
To reproduce the results obtained in the study, follow these instructions:
- Clone this repository.
- Install the requirements using pip (requirements.txt)
Follow the execution order displayed in the table below. Just ignore the scripts dataCollection.py and extract_data.py if you do not have access to the original conflicts dataset collected by Ghiotto et al. 2020.
| Execution order | Script | Input | Output | Purpose |
|---|---|---|---|---|
| -1 | dataCollection.py | Conflicts database, Github API, github_keys, data/INITIAL_DATASET.csv | data/result.json | Script used to generate the dataset file containing the conflicting chunks information. Not required if using the provided JSON dataset. |
| 0 | extract_data.py | result.json, Conflicts database | dataset.json | Adds complementary data to the JSON dataset file. Not required if using the provided JSON dataset. |
| 1 | download_dataset_file.py | Google Drive (repository) | data/dataset.json, data/result.json, data/INITIAL_DATASET.csv | Downloads the JSON dataset file containing the conflicting chunks information. |
| 2 | partial_order.py | dataset.json | data/partial_order_result.csv | Script used to check the partial order in the conflicting chunks resolution lines. |
| 3 | select_chunk_sample.py | data/partial_order_result.csv | data/violate_partial_order_sample.csv | Script used to select a subsample of the conflicting chunks that violate the partial order to be analyzed manually. |
| 3 | v1_v2_percentage.py | data/partial_order_result.csv | data/resolution_composition.csv | Script to analyze the composition of the conflicting chunks resolution lines. |
| duplicated_lines.py | dataset.json | data/has_duplication_result.csv | (DEPRECATED) Script used to analyze if duplicated lines from the chunk are used in the resolution. | |
| inspect_util.py | data/dataset.json | Script used to support the manual analysis of conflicting chunks. | ||
| debug_chunk.py | data/dataset.json | Script for manually debugging other scripts. |
After executing the scripts in the order above, use each of the following notebooks to obtain the data used to answer the research questions.
| Execution order | Notebook | Input | Output | Purpose |
|---|---|---|---|---|
| inspect_chunk.ipynb | Notebook used to inspect chunks manually. | |||
| 1 | find_malformed_chunks.ipynb | data/dataset.json | data/malformed_chunks.csv | Notebook to filter malformed chunks in the dataset. |
| 2 | analyze_distributions.ipynb | data/all_chunks_ghiotto.csv, data/chunks_info.csv, data/partial_order_result.csv, data/resolution_composition.csv, data/projects_intersection.csv | Notebook to perform analysis about the distribution and characteristics of the conflicts (RQ1). | |
| 3 | partial_order_analysis.ipynb | data/partial_order_result.csv, data/malformed_chunks.csv | Notebook to perform analysis using the collected partial order data (RQ2). | |
| 4 | violation_inspection_analysis.ipynb | data/violate_partial_order_inspection.csv (generated after manual analysis of chunks' sample) | Notebook for analyzing cases where the partial order is violated (RQ2). | |
| 5 | resolution_composition_analysis.ipynb | data/resolution_composition.csv | Notebook for analyzing the composition of conflicting chunks resolution lines (RQ3). | |
| 6 | conflict_kind_analysis.ipynb | Notebook for analyzing the language constructs present in the conflicting chunks (RQ4). | ||
| 7 | rq5_analysis.ipynb | Notebook for analyzing the language constructs present in the conflicting chunks resolutions (RQ5). |
References:
Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and André van der Hoek. 2020. On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub. IEEE Transactions on Software Engineering 46, 8 (2020), 892–915. https://doi.org/10.1109/TSE.2018.2871083
Campos Junior, H. D. S., de Menezes, G. G. L., Barros, M. D. O., van der Hoek, A., & Murta, L. G. P. (2022, October). Towards Merge Conflict Resolution by Combining Existing Lines of Code. In Proceedings of the XXXVI Brazilian Symposium on Software Engineering (pp. 425-434). https://doi.org/10.1145/3555228.3555229