Team

This is a reproduction project as part of the MSR course 2021/22 at UniKo, CS department, SoftLang Team

This repositories fork from https://github.com/gorjatschev/applying-apis for MSR 2021/22 Coursework

Team

Team: Mike

Prayuth Patumcharoenpol
Jorge Gavilan

Baseline study

Aspect of the reproduction project

Reproduction of the visualization stage of the empirical study, in which a representation of the applied API categorization is generated using treemaps for better understanding of the results.

Input data

The CSV file, which has been pre-processed by previous procedure (analyze the data), contains a list of APIs and corresponding APIs categories and the following data

filePath, packageName, className, methodName, line, column, javaParserTypeOfElement, usedClassOfElement, isAPIClass, api, mcrCategories, mcrTags

The original CSV file can be found in the project’s repo which contains the same data as the file used for this replication.

Output data

Treemaps with the hierarchical structure and size of abstractions
Colorized APIs and API categories

Finding of replication

Process delta

We believe that the process to create visualization is identical to the thesis since we relied on the external library (plotly) to create the visualization. In addition, the data we used is identical to the thesis itself, hence the process should not be distinct.

Output delta

Since the data used to generate the visualizations was the same used in the original [Gorjatschev21] work, the results did not present major differences The output data is identical only different in the figure structure which we don't consider relevant. We expect to be able to analyze a significant delta when processing the data generated by other teams which we will be able to compare then with the results obtained in this stage.

Implementation of replication

This replication uses the code from Gorjatschev21 repository as a baseline. Then we removed unnecessary files used for other parts different from visualization and restructured the project. After that, we installed all software requirements and we adjusted the application configuration to run according to requirements.

To generate the visualizations:

python process/repositories_visualizer.py

Hardware requirements

Operating system: Linux (recommended Ubuntu 16.04 or higher), MacOS, or Windows 7 to 10.
Memory: At least 4GB RAM (8GB preferable)

Software requirements

Java 11
Python 3.9.6 (plotly>=5.1.0, pyspark>=3.1.2)
kaleido (python module for image export)

Validation

By comparing the visualization results with the input data it can be noticed a reduction in complexity to interpret the hierarchy of data, particularly the dependence relationship between the APIs. As for the execution of the code, since we could not compare the runtime data ( data when code is running ) with the thesis itself, we had to check it by hand by going through each of the visualization figures and compare the final results.

Data

The input file is a analyzed data from the analyze part of the thesis, which contains the identical column as mention in input data

While the output is the Treemaps visulization figures in the form of HTML files and PDF files, each one of them generated for a particular visualization.

By running the main process it is also generated intermediate data:

Spark, used to analize the parsed repositories, generates a CSV file with [packageName, className,methodName,mcrCategories,mcrTags] as columns and a _SUCCESS flag file for the general process.
A "characterization" folder is also created, and its content are CSV files with [packageName,className,methodName,mcrCategories,mcrTags] as columns, these files are used in the main process to be able to visualize based on the characterization type and be able to group by dependance relationships when generating the visualizations.

Interaction

For interaction we selected the datasets from teams Whiskey and Xray, both teams generated a consistent set of CSV files following the original structure from [Gorjatschev21] with the following fields:

filePath, packageName, className, methodName, line, column, javaParserTypeOfElement, usedClassOfElement, isAPIClass, api, mcrCategories, mcrTags

and they also provide analyzed data, which we needed to run the visulization process.

We confirm that both datasets generated by previously mentioned teams allow us to reproduce visualizations similar to those generated by [Gorjatschev21]. In this matter we did not observe a big delta when comparing with the previous process of generating visualizations based on the original data from [Gorjatschev21], by representing different repositories one can observe differences in hierarchical data inherent to the components of each repository which was easier to interpret thanks to the generated visualizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Team

Baseline study

Aspect of the reproduction project

Input data

Output data

Finding of replication

Process delta

Output delta

Implementation of replication

Hardware requirements

Software requirements

Validation

Data

Interaction

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
process		process
README.md		README.md

guiaria/MSR_assignment_2

Folders and files

Latest commit

History

Repository files navigation

Team

Baseline study

Aspect of the reproduction project

Input data

Output data

Finding of replication

Process delta

Output delta

Implementation of replication

Hardware requirements

Software requirements

Validation

Data

Interaction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages