Skip to content

Latest commit

 

History

History
237 lines (197 loc) · 41.9 KB

NEWS.md

File metadata and controls

237 lines (197 loc) · 41.9 KB

kaiaulu 0.0.0.9700 (in development)

NEW FEATURES

  • The GitHUB API has been expanded to use refresh, along with other functions. github_api_project_issue_search has been added that makes the search/issues endpoint API calls. github_api_project_issue_or_pr_comments_by_date and github_api_project_issue_by_date have been added to download issue data and comments by date ranges. github_parse_search_issues_refresh has been added that parses the issue data downloaded from the search endpoint in the refresh_issues folder. github_api_project_issue_refresh and github_api_project_issue_or_pr_comment_refresh were added to download issue data or comments respectively that have not already been downloaded. format_created_at_from_file was added to retrieve the greatest date from a JSON file. See the Reference Docs on GitHub section for more details. #282
  • config.R now contains a set of getter functions used to centralize the gathering of configuration data and these getter functions are used to refactor configuration file information gathering. For example, loading configuration file information with variable assignment is as follows git_repo_path <- config_file[["version_control"]][["log"]] but refactoring with a config.R getter function becomes git_repo_path <- get_git_repo_path(config_file). #230
  • refresh_jira_issues() had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded. #275
  • download_jira_issues(), download_jira_issues_by_issue_key(), and download_jira_issues_by_date() has been added. This allows for downloading of Jira issues without the use of JirAgileR and specification of issue Id and created ranges. It also interacts with parse_jira_latest_date() to implement a refresh capability. #275
  • make_jira_issue() and make_jira_issue_tracker() no longer create fake issues following JirAgileR format, but instead the raw data obtained from JIRA API. This is compatible with the new parser function for JIRA. #277
  • parse_jira() now parses folders containing raw JIRA JSON files without depending on JirAgileR. #276
  • The parse_jira_latest_date() has been added. This function returns the file name of the downloaded JIRA JSON containing the latest date for use by download_jira_issues() to implement a refresh capability. #276
  • Kaiaulu architecture has been refactored. Instead of using a parser, download, network module structure, Kaiaulu now uses a combination of data type and tool structure. In that manner, various parser functions of download,R, parser.R, and network.R now are separated in git.R, jira.R, git.R, etc. When only small functionality of a tool is required, functions are grouped based on the data type they are associated to, for example, src.R. Kaiaulu API documentation has been updated accordingly. Functions signature and behavior remain the same: The only modification was the new placement of functions into files. For further rationale and changes, see the issue for more details. #241
  • Temporal bipartite projections are now weighted. The temporal projection can be parameterized by weight_scheme_cum_temporal() weight_scheme_pairwise_cum_temporal() when all time lag edges are used, or the existing weight schemes can also be used when using a single lag. The all lag weight schemes reproduce the same behavior as Codeface's paper. See the issue for details. #229
  • The make_jira_issue() and make_jira_issue_tracker() have been added, alongside examples and unit tests for parse_jira(). #228
  • We can now generate fake mailing lists make_mbox_reply, and make_mbox_mailing_list for unit testing and tool comparison #238
  • A condition to test if the user points parse_gitlog() and git_log() to an empty repository and returns a more helpful message than "object 'data.Author' not found' was included. A unit test also verifies the behavior of the tryCatch on an empty repo. #108
  • Adds fake data generator infrastructure to Kaiaulu. Specifically, refines the git_create_sample_log which only can create a fixed example, by adding new commands to the git interface (which can also be used for other purposes). The new example.R module can now be used to document examples, using the extended git.R interface, that reflect edge cases on raw data. The unit tests can then rely on the example functions to temporarily create, test parser functionality against the fake minimal example, and subsequently delete it. This in essence allows for unit testing of parser data, and consequently evaluating behavior of 3rd party tools Kaiaulu may rely on for some functionality remains consistent across their updates on features we care about in an automated manner. Examples is in fact a way to create often requested minimal reproducible examples on Stack Overflow. Old unit tests which rely on the git_create_sample_log() will be updated in a subsequent commit to rely on the new interface via example datasets. #227
  • The dv8_mdsmb_to_flaws() function now offers an optional boolean parameter is_file_only_metric which can be used to compute file metrics more efficiently. Note this should not be used if the intent is to aggregate the file metrics, as they may be counted twice or more if the files participate in the same flaw pattern id. See the causal flaws notebook for an example on how to use it. #246.
  • Move Kaiaulu to R version 4.0 due to XML dependency #245.
  • An example jira fake data was added to jira.R along with a test-jira.R which identifies a bug in parse-jira.R (the function parse_jira() will be moved to jira.R at a later date for consistency). The bug is still present, so the test should fail on GitHub Actions. A solution to the bug will be added on a subsequent commit to pass the test. #244
  • Refactor io_create_folder(), io_make_sample_file(), git_init(), git_add(), git_commit() in R/git.Rand create test cases with unit tests in R/example.R and testthat/test-parser.R. #227
  • A parallel version of the git log entity analysis was added to process multiple time windows in parallel. See the new gitlog_entity_showcase_parallel.Rmd for details. #231
  • Refactor GoF Notebook in Graph GoF and Text GoF Notebooks #224
  • Adds GoF module and various utility functions to facilitate integrating identified pattern classes to files. #223
  • Adds parse_jira_rss_xml(), which enables reusing the full 26 projects dataset of our prior TSE work. #218
  • Adds metric_file_bug_frequency(), metric_file_non_bug_frequency(), metric_file_bug_churn(), metric_file_non_bug_churn(), metric_file_churn() to R/metric.R #214
  • Adds Gang of Four parser for Tsantalis' parser4.jar #211
  • A new text module was added. The module allows for extracting identifiers from source code. See the new src_text_showcase.Rmd for details. #206

MINOR IMPROVEMENTS

  • Issue #275, when introducing the concept of refresh on JIRA, affected some notebooks that still relied on data in that format. This issue change either notebook or config file to conform to the new JIRA downloader #312
  • The line metrics notebook now provides further guidance on adjusting the snapshot and filtering.
  • The R File and R Function parser can now properly parse R folders which contain folders within (not following R package structure). Both .r and .R files are also now captured (previously only one of the two were specified, but R accepts both). #235
  • Refactor GoF Notebook in Graph GoF and Text GoF Notebooks #224
  • Parameterize the dsm type in parse_dv8_architectural_flaws so users can specify the dsm that should be used when reconstructing the architectural flaw instances per file. #222
  • Adds additional label to reflect an issue is closed (i.e. "Resolved" used in Cassandra) #221
  • Added line metrics, triangle and square motifs to causal_flaws.Rmd notebook #220
  • Added Anti-Motif from prior paper analysis #210
  • parse_gof_patterns() now includes the instance id of a given pattern. The column names were also renamed to match the XML. This should now fully tabulate all information provided by the XML. Note patterns which at least one instance are not reported are not included in the table (in the XML they occur but with no instances reported). #206
  • Bugzilla API now allows for output file to be specified. #202
  • Paired parser functions now expects a filepath instead of a json string character. #202
  • A new filter, filter_by_commit_size(), has been added. This filter mitigates outline co-change resulting from git log projections, which may lead to a "all-vs-all" explosion of edges. E.g. Apache Geronimo SVN to Git migration contains a commit which modifies 1522 files. Said 1522 files would be co-changed with each other generating 1522 Choose 2 = 1,157,481 alone, which not accurately reflect actual "co-change". Use of this filter is strongly encouraged for graph_to_dsmj or any operations that require git log projection. #209
  • Re-Implements Motif Analysis from prior TSE paper. #210
  • Adds tags column to github_parse_project_issue(), github_parse_project_pull_request() so bug count can also be computed from GitHub API. #216
  • A progress bar has been added to parse_dv8_architectural_flaws(). Each tick tracks one folder of flaws (progressBar auto resets the tick to 0 on loop completion, so instance progress bar requires further function refactoring and is deferred for now). #209
  • graph_to_dsmj is now vectorized, increasing performance #209
  • Bugzilla API now allows for output file to be specified. #202
  • Paired parser functions now expects a filepath instead of a json string character. #202
  • refactored file organization in config files for clearer hierarchy. #230

BUG FIXES

  • keyword internal is now required to ommit functions in the docs API. #241
  • Fixes duplication of issue rows due to multiple components in component field #244
  • Fixes mismatch of filepath due to leading / remaining in relative filepath of parse_dependencies(). #219

DOCUMENTATION FIXES

  • A few unit tests have been added to sanity check the metrics module as a consequence of bug 244. #244
  • Improved the documentation of the line metrics Notebook#240
  • Documentation was improved for the Causal Flaws Notebook #220
  • Moved learning resources to the wiki. Minor editing to guidelines for clarity and common mistakes. #150

__kaiaulu 0.0.0.9600 __

NEW FEATURES

  • Adds Bugzilla Notebook showcasing various Bugzilla Functions. #164
  • Adds bugzilla crawler downloader and parser functions parse_bugzilla_rest_issues, parse_bugzilla_rest_comments, download_bugzilla_rest_issues_comments, and parse_bugzilla_rest_issues_comments. #164
  • Adds bugzilla crawler donwloader functions download_bugzilla_rest_issues and download_bugzilla_rest_comments to download project data from bugzilla site using REST API. #177
  • Adds bugzilla functions download_bugzilla_perceval_traditional_issue_comments, download_bugzilla_perceval_rest_issue_comments, parse_bugzilla_perceval_traditional_issue_comments, and parse_bugzilla_perceval_rest_issue_comments to download and parse project data from bugzilla site using perceval. #155
  • Adds milestone 3.4 DV8 functions graph_to_dsmj, transform_dependencies_to_sdsmj, transform_gitlog_to_hdsmj, transform_temporal_gitlog_to_adsmj to convert a dsm into a json format. #184
  • DV8 Showcase vignette now uses gitlog and dependency functions transformers, which enable using Kaiaulu filters. #184
  • R/graph.R function was modularized to allow for different weight schemes. #184
  • Adds DV8 Notebook showcasing various DV8 Functions. The project configuration file of APR has been expanded to demonstrate available parameters for DV8. #186 and #182.
  • Adds functions dv8_clsxb_to_clsxj, parse_dv8_clusters, dependencies_to_sdsmj, and gitlog_to_hdsmj for DV8 integration with Kaiaulu. #168
  • Adds functions dv8_gitlog_to_gitnumstat dv8_gitnumstat_to_hdsmb dv8_hsdsmb_to_decoupling_level dv8_hsdsmb_to_hierclsxb dv8_hsdsmb_drhier_to_excel parse_dv8_metrics_decoupling_level #169
  • Adds issue commit flow. See issue_social_smell_showcase.Rmd vignette for details. #144
  • Adds a new download_mod_mbox_per_month() function which allows for the intermediate mbox downloaded files to be saved to the chosen folder (as opposed to tmp). The function is showcases on download_mod_mbox.Rmd vignette. #141
  • A CLI interface for calculating smells over multiple branches was added. Consistent with other interfaces, the input is the tools.yml, the project configuration file, and the file save path. #132
  • Re-implements the socio technical congruence metric, using the built-in graph model. #137
  • Social smell notebook now uses the project configuration file to determine which kind of reply data (mbox, github issues and pull requests comments or jira issue comments) to use. Moreover, in case multiple branches are specified only the first (top will be executed). This fully automates the notebook based on the project configuration file. #132
  • Kaiaulu now uses a new format for project configuration files which improves readability and account for new notebooks added during previous releases. More documentation was also added as comments to the project configuration file so it is more self contained. #111
  • download_github_comments.Rmd now include author and committer name and e-mail to support identity matching. #133
  • The social smell notebook now performs git checkout before parse_gitlog. The branch parameter, which is also used later in the notebook to reset the branch after performing git checkout to calculate line metrics, is now a project configuration file parameter. #132
  • Combining JIRA Issue Comments, GitHub Issue Comments, GitHub Pull Request Comments, and Mailing Lists is now possible and showcased on the social_smell_showcase.Rmd Notebook. Moreover, both download_jira_data.Rmd and download_github_comments.Rmd have been standardized to provide the raw json data, whereas parse_jira_replies() and parse_github_replies() provide the same formatted reply table as parse_mbox(), which allows combining the various sources simply by using native rbind() function. #133.
  • Kaiaulu can now download project's communication from GitHub issue comments and GitHub pull request comments. See the new notebook download_github_comments.Rmd for example usage. #130
  • A module to use the GitHub API has been added, built on top of the gh library. Three types of functions were added on as need basis: Functions to obtain an end point response, functions to iterate over the pages of the responses, and functions to parse the raw data format (json) into tables. The iterator function also provides an optional parameter to save the raw data. Note while the json data is provided "as-is", the parser functions only tabulates a portion of the data in the interest of time. If additional columns are needed for your particular, please open an issue. #86
  • Kaiaulu is now public. #124
  • A CLI has been added using Kaiaulu API. With this, users can use some of Kaiaulu features directly, without requiring knowledge of R. Available functionality is currently limited, and more will be added in the future based on user preference. #123
  • With version 0.0.0.9600, Three social smells (org silo, missing link, and radio silence) are refactored to the master branch. The social smells no longer have a dependency to igraph, and OSLOM is used for community detection instead of igraph's random walk. Because of the closer integration to source, the social smell notebook includes a new section where any time slice can be explored to assess the social metrics, including coloring by community. A separate branch will contain a notebook comparing the re-implementation to the existing metric. While org silo and missing link should result in the same metric value, radio silence results will be different due to the use of a different community detection algorithm. #114.

MINOR IMPROVEMENTS

  • Flaws folder is no longer hardcoded to "apr-"
  • DV8 Notebook no longer hardcodes project names to "apr-". #190
  • Updated parse_dependencies() with output_dir folder for more flexibility. #168
  • Adds SetUp and TearDown unit tests for a sample git log. #154
  • Added new citation for Kaiaulu work on README and references of works using Kaiaulu. #143
  • Moves suggested rawdata folder assumed on configuration files one level above to avoid rawdata git logs being incorrectly parsed when parsing Kaiaulu architecture during documentation generation. Minor function documentation was also fixed. #142
  • For multi-branch analysis, specifying a single commit hash will not work as it will only apply to a single branch. The CLI has been modified to rely on the start_datetime and end_datetime instead. #132
  • a kaiaulu.conf has been added. Now that GitHub API is available, we can measure the social smells of the tool via the social_smell_showcase.Rmd. #133
  • Sometimes, mbox files contain the e-mail body under the body.plain or body.plain.simple. The parse_mbox() function now handles both cases. #133

BUG FIXES

  • In the Gitlog explore Notebook, co-change was being calculated with the wrong weight scheme (weight sum). This is now fixed to eliminated node counts. #184
  • parse_dependencies() now returns a list of nodes and edgelist as opposed to just edgelist tables. Prior to this change, files that contain no dependency to other files would be missed (as they would not exist in the edgelist table, and only in the node table). This is consistent to both Depends and DV8 tables, as there is a 1 to 1 mapping from Depends/DV8 variables field on the generated JSON and the node table in Kaiaulu's graph memory representation, and the Depends/DV8 cells field, and Kaiaulu's graph edgelist table. #189.
  • parse_gitlog() now renames files on their first commit renames, instead of when the renamed file is first modified. Prior logic led to situations where, when a file is renamed and never again modified, the new file name is never mentioned on the git log. This was due to how Perceval encoded file renaming, by including in a rarely present field called "newfile" instead of including the new file name under "file". #184
  • parse_dependencies() no longer truncates full file paths, but instead turn them into relative paths. Dependencies notebook also now show a sample of the table and dependency graph. #172
  • parse_mbox() can now parse .mbox files that contain less fields #185
  • The CLI interface for git and mailinglist has been updated to conform to the new project configuration file format. #111
  • Fixes incorrect column name usage when calculating churn, which resulted in churn returning 0 as metric. #135
  • If OSLOM detected developers to belong to more than one community, the radio silence function would throw warnings when said developer was a neighboor of others, choosing the first of the available groups. This is because the original smell function used a community detection algorithm which did not assign multiple groups. The warning is fixed by implementing what it did by default, i.e. choosing the first group of those assigned. In the future, the smell function can be improved to account for more than one group. #134
  • The ordering of rows was currently done alphabetically over the date. It is now correctly done based on time. #126
  • In social_smell_showcase.Rmd, the variables i_commit_hash and j_commit_hash were subject to the ordering of the rows as input. The code now correctly chooses the earliest date and latest date within a time window, instead of assuming the first row and last row are such. In turn, this now reflects in the correct commit hash interval being reported in the final table, and the correct git checkouts being applied to line metrics. #126
  • In smells.R, used by social_smell_showcase.Rmd, smell_organizational_silo, smell_missing_links, and smell_radio_silence mapping of text to numerical identities was incorrect or missing. One side effect of this error as reported in the issue, is that different orderings of the rows provided as input to the function caused different metric values. However, the metric should be independent of the ordering regardless. This issue address the ordering side effect and corrects the metric value. #126
  • In download_jira_data.Rmd, the jira issue downloader's output contained a mismatch between column names and values when converting the json to table. The conversion is now done in Kaiaulu instead of the external package, and the external package is only used to obtain the json. In addition, parse_jira_comments() has been refactored into parse_jira(), which handles both issues and/or comments jsons obtained from the external package. #120
  • OSLOM now assign cluster ids to isolated nodes for consistency #115

DOCUMENTATION FIXES

  • Added new paper citation, and moved references from .bib to a vignette.
  • README was substantially updated, and made more concise. Additional third party tool documentation was moved to the wiki where it can expand more freely. #191
  • Kaiaulu docs now use the most recent version of pkgdown, which now includes a search field.
  • Forward Referencing and Backward Referencing navigation across all functions have been added. "Check" was also used for incorrect referencing on ghost parameters or functions. Dangling stringr call was replaced by stringi. #190
  • Add Malia Liu, and Nicholas Lee as contributors on DESCRIPTION file.
  • Fixes various inconsistencies across documentation, missing parameter hyperlinking, seealso, etc. Functions in R/dv8.R were re-order to follow expected order of function call in an analysis, which is consistent to their ordering in _pkgdown.yml. #186
  • New vignette for DV8 functions called "dv8.Rmd" #168
  • New configuration file for the Apache Thrift project called "thrift.yml". #148
  • Adds unit tests for parse_mbox() and parse_jira(). #154
  • Adds unit tests for git_checkout() and parse_gitlog(). #154
  • Fixed minor grammar mistakes and vague wording. [#151] (#151)
  • Adds unit tests for functions get_date_from_commit_hash() and filter_by_file_extensions() #154

__kaiaulu 0.0.0.9500 __

NEW FEATURES

  • mailinglist_showcase.Rmd has been renamed to reply_communication_showcase.Rmd to account for issue tracker network communication. Likewise, transform_mbox_to_bipartite_network has been renamed to transform_reply_to_bipartite_network to reflect accepting both mbox and jira reply data as parameter. The notebook also now presents how to load jira issue comment networks (obtained using download_jira_data.Rmd, and combining the networks. A new function, parse_jira_comments() was also added to standardized the input to conform to Kaiaulu nomenclature of communication data. #113

  • mod_mbox_downloader now accepts a save_path compatible with the project configuration file instead of saving to working directory, and has a verbose mode to display progress. A new notebook showcasing how to use the function has also been added download_mod_mbox.Rmd. #112

  • Two new R notebooks, download_jira_data.Rmd and bug_count.Rmd, and one project configuration file, geronimo.yml now demonstrate how JIRA issue data can be downloaded and used to calculate file bug count using existing Kaiaulu functionality and an external JIRA API R package. In combination with the existing gitlog_vulnerabilities_showcase.Rmd, Kaiaulu can now download and parse both software vulnerabilities (CVEs) and issue IDs. The download_jira_data.Rmd can also be used to obtain issue comment data, which may be used to construct communication networks in combination to mailing list data. #110

  • Existing network visualizations can now be re-colored using recolor_network_by_community. #94

  • Added download_mod_mbox() function to download.R module, allowing the composition of .mbox files from Apache mod_mbox archives. #99.

  • Added download.R module enabling downloading and conversion of pipermail archives into the .mbox format using the download_pipermail() and convert_pipermail_to_mbox functions #93.

  • adds built-in bipartite graph projection transformation to graph.R bipartite_graph_projection() #75.

  • parser.R and network.R API now abide by a standardized nomenclature for the data columns, instead of using third party software nomenclature, which led to multiple names when data overlapped among third party software. The Network module function prefix was also replaced from parse_*network to transform*_network. Various transformation functions were also renamed to explicitly indicate it generates bipartite networks (previously it did not), instead of temporal. The network functions to transform git logs, be it bipartite or temporal now account for all types of networks (i.e. author-file, author-entity, committer-file, committer-entity, etc). The "mode" parameter is also more explicit on what types of functions it can create. #43

  • Parser functions no longer normalize the timezone to UTC. This is now exemplified in all Notebooks instead for when time slices are needed. Therefore, it is now possible to implement the socio-technical metric num.tz. To minimize risk timestamps are no longer aligned, datetimes are left as strings instead of parsed as posix.ct objects. #89

MINOR IMPROVEMENTS

  • All notebooks now use the new identity match interface from #56, consequently users can now choose to display to either bipartite or temporal transformations whether to display the nodes with the project's name and e-mail or their id, if publishing information online to protect the project's developers privacy. #90

  • Fixes the column naming for the parse_dependencies(). Previously src and dest, and now from and to, consistent to other networks derived from graph.R. #75

  • Fixes tools.yml to use the correct undir and dir of OSLOM (previously the paths were inverted). #75

BUG FIXES

  • Fixes incorrect datetime assignment from committer to author in gitlog_showcase.Rmd. #110
  • Fixes outdated column names in commit_message_id_coverage. #110
  • Fixes download_mod_mbox missing leading zeros. #107

DOCUMENTATION FIXES

  • CONTRIBUTING.md now contains details on how to contribute code to Kaiaulu. #102
  • README.md has been updated to reflect current functionality, examples and how to cite. A pointer to this NEWS.md file has also been added. #105
  • the gitlog_showcase Notebook was renamed to "Explore Git Log", and now contains extensive textual documentation explaining all the file functions, both bipartite and temporal. It also briefly introduces the information used from the project configuration file. Some notebooks which had redundant content were also deleted and re-organized on this one. The software vulnerabilities notebook was also renamed to "Issues, Software Vulnerabilities and Weaknesses", and now focuses on commit log message parsing only. The notebook which presents the method to parse git log entities was renamed to "Extending Git Logs from Files to Entities", it was also reorganized so as to not depend on a saved local rds file. It now loads a very small amount of data so the documentation generation does not take too long as the processing of a full log takes awhile. #91

kaiaulu 0.0.0.9000 (04/24/2021)

NEW FEATURES

  • added a dependencies parser for R using utils abstract syntax tree parser, and API functions. Kaiaulu architecture is used for showcase, which should facilitate understanding new functions being added and their dependencies. See kaiaulu_architecture.Rmd for details. #84
  • added an interface to OSLOM community detection algorithm. #81
  • added git log parser at lower granularity, entity (e.g. function git log parser) parse_gitlog_entity(), and associated network functions to visualize both author-entity bipartite network parse_gitlog_entity_network() and temporal parse_gitlog_entity_temporal_network(). It is therefore now possible to compare networks at file or any type of entity of interest, with different network construction methods. See vignettes/gitlog_entity_showcase.Rmd for details. #79
  • added a new parse_gitlog_temporal_network() which provides a directed network for collaboration at file level. #78
  • modify parse_line_type() to parse_line_type_file() to take as input information from git history instead of a local computer file, so it can be used to analyze git log changes. #2
  • add git_blame() wrapper and parser. #68
  • several fixes and improvements to R/string.R to assign identity under different ways to define name and e-mail between different sources. All tests now pass, and assign_exact_identity() can also perform identity match based on name only, should the e-mails be redacted (e.g. Google Groups) or missing. #72
  • add unit tests framework testthat and several tests for identity service. #38
  • add new module R/git.R to facilitate checking current branch git_head() and checkout to a particular commit git_checkout(), the later required to analyze multiple intervals with static code analysis such as parse_line_metrics() and parse_dependencies(). A vignette will be added at a later date showcasing the functions. #62
  • add source code line type identification using universal ctags parse_line_type(). See line_type_showcase.Rmd for usage. #60
  • adds various file line metrics parse_line_metrics(). See line_metrics_showcase.Rmd for example usage. #59
  • adds gitlog parser for java code refactorings parse_java_code_refactoring_json(). See refactoringminer_showcase.Rmd for example usage. #57
  • file-cve-cwe networks can now be obtained by parsing nvd feeds for cve-cwe mapping parse_nvdfeed() and parse_cve_cwe_file_network(). See gitlog_showcase.Rmd for example usage. #51
  • users can now specify the dependency types they wish to see for Depends on config file #49
  • the number of commit messages which contains a given id can now be computed with commit_message_id_coverage(). See example on gitlog_showcase.Rmd. #46
  • git log commit messages can now be parsed parse_commit_message_id_network(). example of interesting labels are issue ids and cve-ids. You can now also specify them directly on the config files (see conf folder). Vignettes/gitlog_showcase.Rmd has been updated to showcase a cve-id network. #46
  • adds a built-in R static code parser relying on base R Abstract Syntax Tree Parser (vignette will be added in a future commit showcasing the network). #47
  • a new vignettes/interval_and_metric_showcase.Rmd was added to replace vignettes/churn_metrics.Rmd #19.
  • churn metric functions in R/metric.R, metric_churn_per_commit_interval() metric_churn_per_commit_per_file() logic were substantially simplified, and can now be used with interval/R #19.
  • add minimal interval analysis support with interval.R/interval_commit_metric() and parsers.R/filter_by_commit_interval(). #44
  • add filters filter_by_file_extension() filter_by_filepath_substring() for files not relevant for metrics. config file schema also has been extended to provide parameters to the filters. #30
  • config files per project have been defined, and used across all showcase vignettes. #41
  • add a simple identity mapping function, assign_identity(), which assigns a single id from authors who use different names and emails in parse_gitlog(), parse_mbox(), or across both data. This allows parse_gitlog_network() and parse_mbox_network() to be merged into a single network. See vignettes/merging_networks_showcase.Rmd for details. A normalized edit distance function was also added for future implementation of partial matching normalized_levenshtein(). #31
  • add CONTRIBUTION.md and some tips on signed-off-by via commit -s. #36
  • add NEWS.md #37
  • churn metric is now available. metric_churn() and metric_commit_interval_churn(). See vignettes/churn_metrics.Rmd for details. #19
  • provides gitlog data via interface to Perceval parse_gitlog(), and edgelist export for network libraries parse_gitlog_network(). See vignettes/gitlog_showcase.Rmd for details. #1
  • provides mailing list data via interface to Perceval parse_mbox(), and edgelist export for network libraries parse_mbox_network(). See vignettes/mailinglist_showcase.Rmd for details. #4
  • provides file dependency data via interface to Depends parse_dependencies(), and edgelist export for network libraries parse_dependencies_network(). See vignettes/depends_showcase.Rmd for details. #8
  • project is now licensed under MPL 2.0 #12

MINOR IMPROVEMENTS

  • heavily refactored R/network.R API into R/graph.R to separate graph representation and algorithms from various types of networks that can be constructed from git logs, mailing lists, etc. #81
  • refactored git_log() from parse_gilog and parse_commit_message_id() from R/parsers.R. #74
  • various functions were moved to different files to clarify the API. #73
  • config files were refactored for clarity and to accomodate dv8 wrapper. #50
  • vignettes dependencies_showcase.Rmd and gitlog_showcase.Rmd now also make use of the chosen heuristics to filter files. Up to this point only interval_and_metric_showcase.Rmd used them. #30
  • various functions which assumed tables to have certain column names now require the name by parameter. this is work in progress to define a common interface as more data parsing is added to this codebase. #43
  • added a logo to the project. #35
  • removed unecessary step to parse gitlogs from Perceval. #33
  • igraph is no longer a dependence of the package. parse_log_*() functions now provide edgelist instead of igraph objects. vignettes were adjusted to showcase usage. #14
  • lubridate dependency was removed, this package now uses base R POSIXct to handle dates. #13
  • stringr was replaced by stringi to respect license terms of this and stringr packages. #21

BUG FIXES

  • non defined function parameter on mbox has been fixed. #25
  • incorrect parameter removal of .git has been fixed. #22

DOCUMENTATION FIXES

  • add pkgdown documentation to repo #26
  • README.md now provides example vignettes according to data of interest. #24