This repository provides examples of how to analyze TypeScript code and Java artifacts using a fully automated GitHub Actions workflow pipeline with the code-graph-analysis-pipeline.
The process involves three steps:
-
Extract: Upload TypeScript source code and/or Java artifacts, optionally including their Git history, using actions/upload-artifact.
-
Analyze: Use the shared workflow JohT/code-graph-analysis-pipeline/.github/workflows/public-analyze-code-graph.yml to analyze the code and artifacts, then upload the results.
-
Use: Download the analysis results with actions/download-artifact and consume them as needed.
- Table of Contents
- π TypeScript Code Pipeline
- β Java Artifacts Pipeline
- π CSV Report Reference
- π Jupyter Notebook Report Reference
- πΌοΈ Image Reference
- β»οΈ Update Analysis Workflow with Renovate
- π License
- π Analysis Results
- External Dependencies of Java Packages
- Dependencies Graph of Java Artifacts
- Longest Paths of Java Artifacts
- All Pairs Shortest Paths of Java Packages per Artifact
- Object-Oriented Design Metrics for Java Packages
- Effective Line Count of Java Methods
- Cyclomatic Complexity Distribution for Java Methods
- Visibility of Java Types
- Communities and Node Embeddings of Java Packages
- Word Cloud of Git Authors
- Number of distinct commit authors
- Main Authors with highest number of commits
- Clustering coefficient vs. Page Rank
- Java Types that are surprisingly central or popular
- Largest Java Type Clusters
- Java Type Anomalies
- Java Type Top 1 Authority
- Java Type Top 1 Bottleneck
- Java Type Top 1 Bridge
- Java Type Top 1 Hub
- Java Type Top 1 Outlier
This example demonstrates how to analyze TypeScript code in a GitHub Actions workflow.
-
The first job, prepare-code-to-analyze, in the workflow typescript-code-analysis.yml, shows how to extract TypeScript code from a repository and upload it for analysis.
-
The second job, analyze-code-graph, calls the shared analysis workflow using the uploaded artifacts' names as parameters. Example:
name: Analyze Code Graph
needs: [prepare-code-to-analyze]
uses: JohT/code-graph-analysis-pipeline/.github/workflows/public-analyze-code-graph.yml
with:
analysis-name: ${{ needs.prepare-code-to-analyze.outputs.analysis-name }}
sources-upload-name: ${{ needs.prepare-code-to-analyze.outputs.sources-upload-name }}- The third job, analyze-code-graph, demonstrates how to download the analysis results and commit them back to the repository.
Java artifacts are analyzed similarly to TypeScript code. The main difference is that Java artifacts are downloaded from a Maven repository instead of being part of the repository.
To include Git history in the analysis, checkout the corresponding source repository and upload it as the source artifact, as in the TypeScript example. The Java source code isn't used in the analysis, so a bare git clone is sufficient.
The first job, prepare-code-to-analyze, in the workflow java-code-analysis.yml, shows how to prepare the Java artifacts and Git history for analysis.
The second and third jobs are the same as in the TypeScript example.
CSV_REPORTS.md lists all CSV Cypher query result reports inside the analysis-results directory. It can be generated as described in Generate CSV Report Reference.
JUPYTER_REPORTS.md lists all Jupyter Notebook reports inside the analysis-results directory. It can be generated as described in Generate Jupyter Notebook Report Reference.
IMAGES.md lists all PNG images inside the analysis-results directory. It can be generated as described in Generate Image Reference.
This repository uses Renovate to automatically update the analysis workflow to the latest version. To enable this, add the following extension to your Renovate configuration:
"extends": [
"github>JohT/code-graph-analysis-pipeline//renovate-presets/code-graph-analysis-workflow-latest-digest.json5"
]You can find the complete configuration in the renovate.json file.
This repository is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Below are examples drawn from more than a hundred reports produced by the analysis. They illustrate results from analyzing AxonFramework, a Java framework for evolutionary, message-driven microservices on the JVM. For the complete set of reports, see the analysis-results directory.
The scatter plot below compares the importance of Java types to the density of their connections. The Y axis shows the PageRank score (higher values indicate more important and frequently used types). The X axis shows the clustering coefficient (higher values indicate more densely connected neighborhoods). Important bridge or hub types appear toward the top-left; highly influential nodes in dense communities appear toward the top-right.
Based on a fully fledged anomaly detection model combining multiple graph-based features (centrality, clustering, node embeddings), the following visualization highlights various types of anomalous Java types in the codebase in contrast to some "very normal" types.
The full Markdown report describing all detected anomalies readable for humans and large language models can be found here: Anomaly Detection Report.
An "Authority" is a code unit many important parts depend on: it has high global importance (PageRank) but low local support (ArticleRank). A large PageRank β ArticleRank gap flags widely used utilities or entry points that are central but not well supported locally.
A "Bottleneck" is a code unit with exceptionally high Betweenness centrality β it lies on many shortest paths between other nodes, so it mediates a large fraction of dependency flows and is a potential single point of failure or architectural hotspot. Potentially an unintended dependency concentration: if removed, communication between modules breaks.
A "Bridge" is a code unit that connects different parts of the codebase. It is detected as an anomaly with a high contribution of node embedding features, which encode the structural position in the graph. It shows code that might integrate various layers or boundaries (e.g., API facades) or violates architecture (tangled dependencies).
A "Hub" is a code unit with a high out-degree (many dependencies) but low clustering coefficient (its neighbors are not well connected). Hubs are central dependencies that many other parts rely on, making them potential fragile hotspots in the architecture. The low clustering coefficient indicates that these hubs may not be well integrated into the surrounding code, increasing the risk of failure if the hub encounters issues.
A "Outlier" is a code unit that significantly deviates from typical patterns in the codebase. It has a low clustering probability and a high distance to the nearest cluster centroid in the node embedding space. This indicates that the outlier has a unique structural position in the dependency graph, potentially representing specialized functionality or an architectural anomaly.







