Use File_Level and Base_Feature with file granularity? #81

clhunsen · 2017-12-18T09:43:53Z

When looking at the File_Level and Base_Feature artifacts, we only look at them with project granularity, i.e., we do not distinguish the different instances by the very files they are changed in. But, in the Conway analysis, I have seen such differentiation of file and project granularity.

Should we add a possibility for differentiation to the network configuration?
@ecklbarb, do you need that for your studies?

The corresponding code preventing the differentiation (for File_Level, at least) is the following: Here, we basically convert file granularity back to project granularity.
https://github.com/se-passau/codeface-extraction-r/blob/a53d04c745add2e30cbfa0d06450485596ca071e/util-read.R#L80

The text was updated successfully, but these errors were encountered:

bockthom · 2017-12-18T10:06:45Z

@clhunsen Could you please provide a short example? Then it would be easier for everyone to understand, I guess. It took me a few minutes to get the point...

So, basically, we talk about splitting the Base_Feature (resp. File_Level) into several smaller artifacts here, right?
So, instead of having one File_Level node in the function network, we would get several File_Level nodes (each representing the file-level code of a different file). So, for File_Level that will be easy - and in some cases also useful.

However, for features, it is much more difficult, as we already have seen in previous work on call graphs some years ago. Here we had a lot of different splitting strategies (file-based, function-based, clustering functions into groups, ...). In the end, we should think whether a file-based splitting of the Base_Feature is useful enough, or whether we should also provide different base-feature splitting methods (which would make it much more complex as we would have to read proximity and feature data both simultaneously to get the needed information...)

clhunsen · 2017-12-18T16:18:15Z

So, basically, we talk about splitting the Base_Feature (resp. File_Level) into several smaller artifacts here, right?
So, instead of having one File_Level node in the function network, we would get several File_Level nodes (each representing the file-level code of a different file). So, for File_Level that will be easy - and in some cases also useful.

Right! Sorry for writing in a too confusing way. But you got the point. The same should basically apply for the Base_Feature artifact.

Here, an example: When constructing artifact names for the vertices in the network, we augment the function data with the relative path of the file containing the current function (see Lines 1 and 2 of the table below, column project granularity).
For the File_Level artifact, we currently take the original name (see Lines 3 and 4 of the table below, column project granularity), but this way, we cannot distinguish the artifact across different files, although we may want this (in a configurable way). To distinguish this artifact across files (and that's the proposed feature for the network library), we need to augment the relative path again to yield distinguishable results (see column file granularity).

file	artifact	project granularity	file granularity
foo.c	function1	foo.c::function1	foo.c::function1
folder/foo.c	function1	folder1/foo.c::function1	folder1/foo.c::function1
foo.c	File_Level	File_Level	foo.c::File_Level
folder/foo.c	File_Level	File_Level	folder/foo.c::File_Level

The same holds analogously for the Base_Feature artifact, although we do not augment relative paths for "ordinary" features for bot project and file granularity, but would only do that for the base feature.

However, for features, it is much more difficult, as we already have seen in previous work on call graphs some years ago. Here we had a lot of different splitting strategies (file-based, function-based, clustering functions into groups, ...). In the end, we should think whether a file-based splitting of the Base_Feature is useful enough, or whether we should also provide different base-feature splitting methods (which would make it much more complex as we would have to read proximity and feature data both simultaneously to get the needed information...)

I agree that it is easy to implement for the splitting strategy "file-based" (as you call it), for both File_Level and Base_Feature. And this is basically, what I thought about implementing.
Regarding the other splitting strategies you mentioned, I do not think we need implementations for that as they are also not that easy to achieve without any call-graph data.

clhunsen added help wanted question labels Dec 18, 2017

clhunsen added this to the v3.1 milestone Dec 18, 2017

bockthom modified the milestones: v3.1, v3.2 Feb 27, 2018

clhunsen modified the milestones: v3.2, v3.3 May 2, 2018

clhunsen modified the milestones: v3.3, Future Aug 8, 2018

clhunsen assigned jkronaw Nov 26, 2018

bockthom mentioned this issue Dec 15, 2018

Change commit filtering and network building regarding the untracked files and base artifact #149

Merged

6 tasks

clhunsen modified the milestones: Future, v3.5 Dec 17, 2018

clhunsen modified the milestones: v3.5, Future Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use File_Level and Base_Feature with file granularity? #81

Use File_Level and Base_Feature with file granularity? #81

clhunsen commented Dec 18, 2017

bockthom commented Dec 18, 2017

clhunsen commented Dec 18, 2017 •

edited

Loading

Use File_Level and Base_Feature with file granularity? #81

Use File_Level and Base_Feature with file granularity? #81

Comments

clhunsen commented Dec 18, 2017

bockthom commented Dec 18, 2017

clhunsen commented Dec 18, 2017 • edited Loading

clhunsen commented Dec 18, 2017 •

edited

Loading