Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI extension for automated collaboration analyses #303

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

nicolehoess
Copy link
Collaborator

The purpose of this pull request is to expand kaiaulu's exec scripts to support the automation of collaboration analyses with interfaces for:

  • git log entity analysis
  • parallelized git log entity analysis
  • bipartite graph projections
  • temporal collaboration networks

Adds CLI to parse entities (functions, classes, etc.) from a previously
parsed gitlog for the entire timespan of this log.

Adds CLI to parse git entities from a previously parsed gitlog for
multiple time windows in parallel. Time windows can be configured by
explicitly defined dates or by the number of days (see configuration
example kaiaulu_analysis.yml).

Git interfaces also perform identity matching and file filtering as
specified in the configuration file.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
The merge() function's argument "sorted" results in an "unknown
argument" warning. Replace the "sorted" argument by "sort" to fix this.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Adds interfaces to create bipartite projections and temporal
collaboration networks from a previously parsed gitlog or from gitlog
entities.

An additional configuration file is added to keep track of the CLI
parameter choices.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Replace absolute path to git repository by relative path.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Users may configure only a subset of possible filter option.

If a filter option was missing, it could corrupt the git log. For
instance, not specifying any file path substrings to remove
(remove_filepaths_containing) caused all substrings to be removed,
resulting in an empty git log.

Also, the commit size filter option is now respected.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
The CLI configuration (e.g. kaialu_cli.yml) now has a section for git
exec. This section allows to specify whether developer identities should
be matched or not. It also offers a configuration option to match
identities by names only.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
The author timestamp was accidentally overwritten by the committer
timestamp, causing the git log to be splitted according to the committer
timestamp instead of the author timestamp as suggested by the vignettes.

Also, make sure that the range boundary commits are included in both
ranges.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
In evolutionary analyses, users can generate time windows either based
on the author timestamp or the committer timestamp. Adds an option to
choose the desired timestamp.

Also removes the filtering options from the tabulation CLI, as we are
interested in the entire git log here.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Identity matching in the git CLI was so far limited to author names and
e-mail adresses. Now, the committer names and e-mail-addresses are
matched as well.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
@carlosparadis
Copy link
Member

@nicolehoess Thank you for the PR. So sorry I have been slow on this. I am now targeting the 3-week gap in mid-August, so I have some respite! Please let me know if there is anything else I can help you with on the meantime.

During entity analysis, we save an empty data frame in case no entities
were found in the respective time window. This indicates that a specific
range has not been skipped accidentally, but did not contain any changed
entities. Change the header of this data frame match the standard format
to facilitate subsequent analyses.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Allows the CLI users to choose whether to include the time window
boundaries (start and end time) in the parallel entity analysis.

Allows the CLI users to choose which columns to include in identity
matching.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
In the project configuration file, we can specify options such as file
filters which can be applied to file-based or entity-based analysis
modes or both.

So far, the application of these options was hard-coded in both git
CLIs.

Now, users may specify the desired options and their application to
file-based and entity-based analysis modes separately in the CLI
configuration file. This gives users more flexibility in their analyses.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Similar to the git CLI, users may want to choose different
configurations for file and entity network construction. Thus, add
separate options to the CLI configuration file.

Signed-off-by: Nicole Hoess <nicole.hoess@oth-regensburg.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants