Skip to content

JohT/code-graph-analysis-examples

Repository files navigation

Code Graph Analysis Pipeline Examples

This repository provides examples of how to analyze TypeScript code and Java artifacts using a fully automated GitHub Actions workflow pipeline with the code-graph-analysis-pipeline.

The process involves three steps:

  1. Extract: Upload TypeScript source code and/or Java artifacts, optionally including their Git history, using actions/upload-artifact.

  2. Analyze: Use the shared workflow JohT/code-graph-analysis-pipeline/.github/workflows/public-analyze-code-graph.yml to analyze the code and artifacts, then upload the results.

  3. Use: Download the analysis results with actions/download-artifact and consume them as needed.

Table of Contents

πŸš€ TypeScript Code Pipeline

This example demonstrates how to analyze TypeScript code in a GitHub Actions workflow.

  1. The first job, prepare-code-to-analyze, in the workflow typescript-code-analysis.yml, shows how to extract TypeScript code from a repository and upload it for analysis.

  2. The second job, analyze-code-graph, calls the shared analysis workflow using the uploaded artifacts' names as parameters. Example:

name: Analyze Code Graph
needs: [prepare-code-to-analyze]
uses: JohT/code-graph-analysis-pipeline/.github/workflows/public-analyze-code-graph.yml
with:
  analysis-name: ${{ needs.prepare-code-to-analyze.outputs.analysis-name }}
  sources-upload-name: ${{ needs.prepare-code-to-analyze.outputs.sources-upload-name }}
  1. The third job, analyze-code-graph, demonstrates how to download the analysis results and commit them back to the repository.

β˜• Java Artifacts Pipeline

Java artifacts are analyzed similarly to TypeScript code. The main difference is that Java artifacts are downloaded from a Maven repository instead of being part of the repository.

To include Git history in the analysis, checkout the corresponding source repository and upload it as the source artifact, as in the TypeScript example. The Java source code isn't used in the analysis, so a bare git clone is sufficient.

The first job, prepare-code-to-analyze, in the workflow java-code-analysis.yml, shows how to prepare the Java artifacts and Git history for analysis.

The second and third jobs are the same as in the TypeScript example.

πŸ“‘ CSV Report Reference

CSV_REPORTS.md lists all CSV Cypher query result reports inside the analysis-results directory. It can be generated as described in Generate CSV Report Reference.

πŸ““ Jupyter Notebook Report Reference

JUPYTER_REPORTS.md lists all Jupyter Notebook reports inside the analysis-results directory. It can be generated as described in Generate Jupyter Notebook Report Reference.

πŸ–ΌοΈ Image Reference

IMAGES.md lists all PNG images inside the analysis-results directory. It can be generated as described in Generate Image Reference.

♻️ Update Analysis Workflow with Renovate

This repository uses Renovate to automatically update the analysis workflow to the latest version. To enable this, add the following extension to your Renovate configuration:

"extends": [
  "github>JohT/code-graph-analysis-pipeline//renovate-presets/code-graph-analysis-workflow-latest-digest.json5"
]

You can find the complete configuration in the renovate.json file.

πŸ“„ License

This repository is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

πŸ“Š Analysis Results

Below are examples drawn from more than a hundred reports produced by the analysis. They illustrate results from analyzing AxonFramework, a Java framework for evolutionary, message-driven microservices on the JVM. For the complete set of reports, see the analysis-results directory.

External Dependencies of Java Packages

External dependencies of Java packages

Dependencies Graph of Java Artifacts

Dependencies graph of Java artifacts

Longest Path(s) of Java Artifacts

Longest path of Java artifacts

All Pairs Shortest Paths of Java Packages per Artifact

All pairs shortest paths of Java packages per artifact

Object-Oriented Design Metrics for Java Packages

Object-oriented design metrics for Java packages

Effective Line Count of Java Methods

Effective line count of Java methods

Cyclomatic Complexity Distribution for Java Methods

Cyclomatic complexity distribution for Java methods

Visibility of Java Types

Visibility of Java types

Communities and Node Embeddings of Java Packages

Communities and node embeddings of Java packages

Word Cloud of Git Authors

Word cloud of Git authors

Number of distinct commit authors

Number of distinct commit authors

Main Authors with highest number of commits

Main authors with highest number of commits

Clustering coefficient vs. Page Rank

The scatter plot below compares the importance of Java types to the density of their connections. The Y axis shows the PageRank score (higher values indicate more important and frequently used types). The X axis shows the clustering coefficient (higher values indicate more densely connected neighborhoods). Important bridge or hub types appear toward the top-left; highly influential nodes in dense communities appear toward the top-right.

Clustering Coefficient vs. PageRank

Java Types that are surprisingly central or popular

Surprisingly central or popular Java Types

Largest Java Type Clusters

Largest Java Type Clusters

Java Type Anomalies

Based on a fully fledged anomaly detection model combining multiple graph-based features (centrality, clustering, node embeddings), the following visualization highlights various types of anomalous Java types in the codebase in contrast to some "very normal" types.

Java Type Anomalies

The full Markdown report describing all detected anomalies readable for humans and large language models can be found here: Anomaly Detection Report.

Java Type Top 1 Authority

An "Authority" is a code unit many important parts depend on: it has high global importance (PageRank) but low local support (ArticleRank). A large PageRank βˆ’ ArticleRank gap flags widely used utilities or entry points that are central but not well supported locally.

Top 1 Java Type Authority Graph Visualization

Java Type Top 1 Bottleneck

A "Bottleneck" is a code unit with exceptionally high Betweenness centrality β€” it lies on many shortest paths between other nodes, so it mediates a large fraction of dependency flows and is a potential single point of failure or architectural hotspot. Potentially an unintended dependency concentration: if removed, communication between modules breaks.

Top 1 Java Type Bottleneck Graph Visualization

Java Type Top 1 Bridge

A "Bridge" is a code unit that connects different parts of the codebase. It is detected as an anomaly with a high contribution of node embedding features, which encode the structural position in the graph. It shows code that might integrate various layers or boundaries (e.g., API facades) or violates architecture (tangled dependencies).

Top 1 Java Type Bridge Graph Visualization

Java Type Top 1 Hub

A "Hub" is a code unit with a high out-degree (many dependencies) but low clustering coefficient (its neighbors are not well connected). Hubs are central dependencies that many other parts rely on, making them potential fragile hotspots in the architecture. The low clustering coefficient indicates that these hubs may not be well integrated into the surrounding code, increasing the risk of failure if the hub encounters issues.

Top 1 Java Type Hub Graph Visualization

Java Type Top 1 Outlier

A "Outlier" is a code unit that significantly deviates from typical patterns in the codebase. It has a low clustering probability and a high distance to the nearest cluster centroid in the node embedding space. This indicates that the outlier has a unique structural position in the dependency graph, potentially representing specialized functionality or an architectural anomaly.

Top 1 Java Type Outlier Graph Visualization

Releases

No releases published