Skip to content

gems-uff/combination_conflicts_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artifacts (scripts) and data used in the SBES 2022 paper: Towards Merge Conflict Resolution by Combining Existing Lines of Code and JSERD 2023 submission: How code composition strategies affect merge conflict resolution?

These artifacts are related to the analysis of merge conflicts that were resolved by the developers using a combination of the conflicting lines. The detailed methodology is described in the paper.

To reproduce the results obtained in the study, follow these instructions:

  1. Clone this repository.
  2. Install the requirements using pip (requirements.txt)

Follow the execution order displayed in the table below. Just ignore the scripts dataCollection.py and extract_data.py if you do not have access to the original conflicts dataset collected by Ghiotto et al. 2020.

Scripts:

Execution order Script Input Output Purpose
-1 dataCollection.py Conflicts database, Github API, github_keys, data/INITIAL_DATASET.csv data/result.json Script used to generate the dataset file containing the conflicting chunks information. Not required if using the provided JSON dataset.
0 extract_data.py result.json, Conflicts database dataset.json Adds complementary data to the JSON dataset file. Not required if using the provided JSON dataset.
1 download_dataset_file.py Google Drive (repository) data/dataset.json, data/result.json, data/INITIAL_DATASET.csv Downloads the JSON dataset file containing the conflicting chunks information.
2 partial_order.py dataset.json data/partial_order_result.csv Script used to check the partial order in the conflicting chunks resolution lines.
3 select_chunk_sample.py data/partial_order_result.csv data/violate_partial_order_sample.csv Script used to select a subsample of the conflicting chunks that violate the partial order to be analyzed manually.
3 v1_v2_percentage.py data/partial_order_result.csv data/resolution_composition.csv Script to analyze the composition of the conflicting chunks resolution lines.
duplicated_lines.py dataset.json data/has_duplication_result.csv (DEPRECATED) Script used to analyze if duplicated lines from the chunk are used in the resolution.
inspect_util.py data/dataset.json Script used to support the manual analysis of conflicting chunks.
debug_chunk.py data/dataset.json Script for manually debugging other scripts.

After executing the scripts in the order above, use each of the following notebooks to obtain the data used to answer the research questions.

Notebooks:

Execution order Notebook Input Output Purpose
inspect_chunk.ipynb Notebook used to inspect chunks manually.
1 find_malformed_chunks.ipynb data/dataset.json data/malformed_chunks.csv Notebook to filter malformed chunks in the dataset.
2 analyze_distributions.ipynb data/all_chunks_ghiotto.csv, data/chunks_info.csv, data/partial_order_result.csv, data/resolution_composition.csv, data/projects_intersection.csv Notebook to perform analysis about the distribution and characteristics of the conflicts (RQ1).
3 partial_order_analysis.ipynb data/partial_order_result.csv, data/malformed_chunks.csv Notebook to perform analysis using the collected partial order data (RQ2).
4 violation_inspection_analysis.ipynb data/violate_partial_order_inspection.csv (generated after manual analysis of chunks' sample) Notebook for analyzing cases where the partial order is violated (RQ2).
5 resolution_composition_analysis.ipynb data/resolution_composition.csv Notebook for analyzing the composition of conflicting chunks resolution lines (RQ3).
6 conflict_kind_analysis.ipynb Notebook for analyzing the language constructs present in the conflicting chunks (RQ4).
7 rq5_analysis.ipynb Notebook for analyzing the language constructs present in the conflicting chunks resolutions (RQ5).

References:

Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and André van der Hoek. 2020. On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub. IEEE Transactions on Software Engineering 46, 8 (2020), 892–915. https://doi.org/10.1109/TSE.2018.2871083

Campos Junior, H. D. S., de Menezes, G. G. L., Barros, M. D. O., van der Hoek, A., & Murta, L. G. P. (2022, October). Towards Merge Conflict Resolution by Combining Existing Lines of Code. In Proceedings of the XXXVI Brazilian Symposium on Software Engineering (pp. 425-434). https://doi.org/10.1145/3555228.3555229

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •