Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use File_Level and Base_Feature with file granularity? #81

Open
clhunsen opened this issue Dec 18, 2017 · 2 comments
Open

Use File_Level and Base_Feature with file granularity? #81

clhunsen opened this issue Dec 18, 2017 · 2 comments
Assignees
Milestone

Comments

@clhunsen
Copy link
Collaborator

When looking at the File_Level and Base_Feature artifacts, we only look at them with project granularity, i.e., we do not distinguish the different instances by the very files they are changed in. But, in the Conway analysis, I have seen such differentiation of file and project granularity.

Should we add a possibility for differentiation to the network configuration?
@ecklbarb, do you need that for your studies?


The corresponding code preventing the differentiation (for File_Level, at least) is the following: Here, we basically convert file granularity back to project granularity.
https://github.com/se-passau/codeface-extraction-r/blob/a53d04c745add2e30cbfa0d06450485596ca071e/util-read.R#L80

@clhunsen clhunsen added this to the v3.1 milestone Dec 18, 2017
@bockthom
Copy link
Collaborator

@clhunsen Could you please provide a short example? Then it would be easier for everyone to understand, I guess. It took me a few minutes to get the point...

So, basically, we talk about splitting the Base_Feature (resp. File_Level) into several smaller artifacts here, right?
So, instead of having one File_Level node in the function network, we would get several File_Level nodes (each representing the file-level code of a different file). So, for File_Level that will be easy - and in some cases also useful.

However, for features, it is much more difficult, as we already have seen in previous work on call graphs some years ago. Here we had a lot of different splitting strategies (file-based, function-based, clustering functions into groups, ...). In the end, we should think whether a file-based splitting of the Base_Feature is useful enough, or whether we should also provide different base-feature splitting methods (which would make it much more complex as we would have to read proximity and feature data both simultaneously to get the needed information...)

@clhunsen
Copy link
Collaborator Author

clhunsen commented Dec 18, 2017

So, basically, we talk about splitting the Base_Feature (resp. File_Level) into several smaller artifacts here, right?
So, instead of having one File_Level node in the function network, we would get several File_Level nodes (each representing the file-level code of a different file). So, for File_Level that will be easy - and in some cases also useful.

Right! Sorry for writing in a too confusing way. But you got the point. The same should basically apply for the Base_Feature artifact.

Here, an example: When constructing artifact names for the vertices in the network, we augment the function data with the relative path of the file containing the current function (see Lines 1 and 2 of the table below, column project granularity).
For the File_Level artifact, we currently take the original name (see Lines 3 and 4 of the table below, column project granularity), but this way, we cannot distinguish the artifact across different files, although we may want this (in a configurable way). To distinguish this artifact across files (and that's the proposed feature for the network library), we need to augment the relative path again to yield distinguishable results (see column file granularity).

file artifact project granularity file granularity
foo.c function1 foo.c::function1 foo.c::function1
folder/foo.c function1 folder1/foo.c::function1 folder1/foo.c::function1
foo.c File_Level File_Level foo.c::File_Level
folder/foo.c File_Level File_Level folder/foo.c::File_Level

The same holds analogously for the Base_Feature artifact, although we do not augment relative paths for "ordinary" features for bot project and file granularity, but would only do that for the base feature.

However, for features, it is much more difficult, as we already have seen in previous work on call graphs some years ago. Here we had a lot of different splitting strategies (file-based, function-based, clustering functions into groups, ...). In the end, we should think whether a file-based splitting of the Base_Feature is useful enough, or whether we should also provide different base-feature splitting methods (which would make it much more complex as we would have to read proximity and feature data both simultaneously to get the needed information...)

I agree that it is easy to implement for the splitting strategy "file-based" (as you call it), for both File_Level and Base_Feature. And this is basically, what I thought about implementing.
Regarding the other splitting strategies you mentioned, I do not think we need implementations for that as they are also not that easy to achieve without any call-graph data.

@bockthom bockthom modified the milestones: v3.1, v3.2 Feb 27, 2018
@clhunsen clhunsen modified the milestones: v3.2, v3.3 May 2, 2018
@clhunsen clhunsen modified the milestones: v3.3, Future Aug 8, 2018
@clhunsen clhunsen modified the milestones: Future, v3.5 Dec 17, 2018
@clhunsen clhunsen modified the milestones: v3.5, Future Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants