Java performance profiling for Kubernetes services. Find where a HotSpot JVM is spending CPU, allocating memory, waiting on locks, pausing for GC, or blocking on Java I/O, using real async-profiler/JFR-derived data and a service-focused UI.
Docs · 中文文档 · Quickstart · Analyze a service · Contributing
Most observability stacks tell you that a Java service is slow. java-profiler is for the next question: which Java stack is responsible?
- Kubernetes-native opt-in: enable profiling with annotations or labels. No application code changes.
- Real JVM profile data: CPU, Wall Clock, allocation, lock-delay, Java I/O wait, and GC evidence come from async-profiler/JFR-derived collection.
- Expert Java workbench: Top Table, Flame Graph, selected-frame details, native-frame filtering, target status, deadlocks, and ingestion health in one workflow.
- Ownable storage: profile data lands in ClickHouse with retention bounded to 7 days or less.
- Focused scope: no required Pyroscope, Parca, or Grafana backend.
- Built for proof: real acceptance requires non-empty CPU, Wall Clock, Java I/O wait, GC, allocation, lock, ClickHouse, ingestion, and browser UI evidence.
Enable temporary profiling on a workload pod template:
metadata:
annotations:
java-profiler.io/profile-mode: temporary
java-profiler.io/profile-duration: 15mOpen the Web UI, select the namespace, service, and time range, then start with:
statusto confirm the JVM was accepted.cputo find expensive Java methods.wallwhen latency is not explained by CPU alone.ioto isolate Java-owned socket or file blocking paths.gcto correlate JVM pause evidence with allocation pressure.memoryto inspect allocation pressure.locksanddeadlocksto investigate contention.ingestionto confirm profile batches were accepted.
See the Quickstart and Performance Analysis Manual.
- CPU hotspots: high-cost Java methods, self time, total time, and sampled stack context.
- Wall Clock latency: Java stack time spent runnable, blocked, waiting, sleeping, or doing I/O.
- Java I/O wait: socket or file blocking paths when JVM/JFR evidence preserves Java ownership.
- GC pauses: JVM GC event evidence correlated with allocation profiles and the incident window.
- Allocation hotspots: methods and call paths creating allocation pressure.
- Lock delay: synchronized or monitor paths that block under contention.
- Thread evidence: snapshots for CPU, lock, sleep, blocked, and waiting states.
- Deadlock evidence: deadlock cycles reported by the target JVM.
- Profiling health: accepted, disabled, unsupported, attach failure, profiler conflict, rejected upload, or dropped ingestion data.
Kubernetes metadata
|
v
Node-local collector DaemonSet
|
v
async-profiler/JFR + thread diagnostics
|
v
Backend API -> ClickHouse
|
v
Service diagnosis UI
The first version targets Java services running on Kubernetes, HotSpot-compatible JVMs first. Profiling is controlled through Kubernetes metadata, collected node-locally, stored in ClickHouse, and exposed through a compact UI for service owners and platform engineers.
These screenshots come from a real Kubernetes acceptance environment, not mocked UI state.
- Target status evidence
- CPU profile analysis
- Allocation evidence
- Wall Clock latency evidence
- Java I/O wait evidence
- GC pause and allocation correlation
- Deadlock diagnosis surface
- Ingestion health evidence
Regenerate them from a port-forwarded real UI:
export REAL_ACCEPTANCE_BASE_URL=http://127.0.0.1:18081
export REAL_ACCEPTANCE_NAMESPACE=java-profiler-qa
export REAL_ACCEPTANCE_SERVICE=jdk17-http-demo
node scripts/capture-doc-screenshots.mjsRun local checks before changing profiling, ingestion, backend APIs, or UI behavior:
go test ./...
javac --release 11 java-helper/thread-diagnostics/src/main/java/com/ebpfjava/threads/*.java
cd examples/jdk17-http-demo && mvn test
cd ../../web && npm ci && npm test && npm run buildBuild the docs site:
cd docs
npm install
npm run docs:buildFor changes touching collector profiling, ingestion, ClickHouse storage, backend query APIs, deployment, the demo service, or profile UI, run real Kubernetes acceptance. See Contributing and the Real Profiling Acceptance Standard.
- Online docs
- 中文文档
- Quickstart
- Analyze a Java service
- Enable profiling
- Deploy and operate the platform
- Development setup
- Localization policy
- Architecture
The first version does not include non-Java profiling, OpenJ9 support, heap dump analysis, distributed ClickHouse, tracing, log analysis, service maps, dashboarding, alerting, or Prometheus metric storage.
Metrics may be exposed by collector/backend exporters, but Prometheus-series systems own metric storage, dashboards, alerting, and retention.
