Skip to content

makriss/deepscope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepscope

Granular in-service Node.js monitoring that tells you which subsystem is causing high memory or CPU — not just that something is wrong.

npm version license node


What & Why

Standard observability tools (APM platforms, default prom-client metrics) are excellent at alerting you that memory is high or that the process is under load. What they rarely tell you is which subsystem inside your service is responsible. During an incident you need to know: is it the logger pipeline backing up? The Kafka consumer batch processor? Mongoose operations on a hot collection? The external API call pool?

deepscope adds a targeted instrumentation layer that sits inside your process and exposes exactly that signal. It tracks HTTP request queue depth, Kafka consumer lag and batch timing, MongoDB per-collection operation latency, outbound HTTP call duration by host, event-loop time attribution per async subsystem, and your logger pipeline's queue depth and drop rate — all surfaced as Prometheus metrics that any scraper can collect.

The library was extracted from a production monitoring layer built to triage a real incident: a downstream log-shipping agent began exerting write-pressure, causing the in-process logger queue to grow unboundedly, which exhausted heap and eventually OOM-crashed the service. Neither the APM tool nor the default prom-client Node.js metrics could localize the cause — they showed "memory is rising" but not "the logger queue has 400 000 pending messages." deepscope was built to fill that gap. It is now a standalone, generic library you can drop into any Node.js service.


Features

  • 7 first-party monitors: process, eventLoop, httpQueue, mongo, kafka, externalCall, logger
  • Tiered model: Tier 1 metrics are always-on and cheap; Tier 2 metrics are togglable at runtime for deeper per-operation instrumentation during an incident
  • Pluggable architecture: extend [BaseMonitor](src/core/baseMonitor.logic.ts) for lifecycle scaffolding, or implement IMonitorPlugin from scratch
  • Pull-based via prom-client: metrics are exposed through a standard Registry you own — wire it to any Prometheus scraper or your existing /metrics endpoint
  • Optional debug endpoint: opt-in worker-thread HTTP server for heap snapshots, CPU profiles, and runtime tier toggling — off by default, localhost-bound
  • No global state: all state lives on a Deepscope instance; safe to run multiple instances in the same process
  • TypeScript-first: strict mode, ESM output, full type exports
  • Peer-dep optional: prom-client is the only required peer; express, mongoose, kafkajs, axios are optional and only activated when you call .use() with their monitor

Installation

npm install deepscope prom-client

# Optional peers — install only the ones you need:
npm install express       # for httpQueueMonitor
npm install mongoose      # for mongoMonitor
npm install kafkajs       # for kafkaMonitor
npm install axios         # for externalCallMonitor

prom-client ^15.0.0 is the only required peer dependency. The framework/library peers are optional — deepscope never imports them at module load time; it receives the instance you pass into the factory function.


Quick start

import express from 'express';
import { Registry } from 'prom-client';
import {
  Deepscope,
  processMonitor,
  httpQueueMonitor,
  kafkaMonitor,
  loggerMonitor,
} from 'deepscope';
import type { ILoggerStatsProvider } from 'deepscope';

// Your express app and Kafka consumer (created elsewhere)
const app = express();
const consumer = getKafkaConsumer(); // kafkajs Consumer

// Adapt your logger to the ILoggerStatsProvider interface
const loggerAdapter: ILoggerStatsProvider = {
  getQueueDepth: () => myLogger.internalQueue.length,
  getDroppedCount: () => myLogger.droppedCount,
  getEnqueueRate: () => myLogger.enqueueRate,
};

const registry = new Registry();

const scope = new Deepscope({ registry, serviceName: 'my-service', tier: 'tier1' })
  .use(processMonitor())
  .use(httpQueueMonitor({ app }))
  .use(kafkaMonitor({ consumer }))
  .use(loggerMonitor({ provider: loggerAdapter, pollIntervalMs: 5000 }))
  .start();

// Expose metrics on your existing express app
app.get('/metrics', async (_req, res) => {
  res.set('Content-Type', registry.contentType);
  res.end(await registry.metrics());
});

// Upgrade to tier2 during an incident for deeper instrumentation
scope.setTier('tier2');

// Graceful shutdown
process.on('SIGTERM', async () => {
  await scope.stop();
});

API overview

Deepscope class

import { Deepscope } from 'deepscope';
import type { IDeepscopeOptions } from 'deepscope';

Constructor

new Deepscope(opts: IDeepscopeOptions)

interface IDeepscopeOptions {
  readonly registry: Registry;      // prom-client Registry you own
  readonly serviceName: string;     // injected as the `service` label on every metric
  readonly tier?: Tier;             // 'tier1' (default) | 'tier2'
}

Methods

Method Signature Description
.use() use(plugin: IMonitorPlugin): this Register a monitor. Must be called before .start(). Returns this for chaining.
.start() start(): this Register and start all monitors for the current tier.
.stop() stop(): Promise<void> Stop all monitors and shut down the debug endpoint if running.
.setTier() setTier(tier: Tier): void Toggle between 'tier1' and 'tier2' at runtime. Starts/stops tier2 monitors accordingly.
.getTier() getTier(): Tier Returns the current active tier.
.takeHeapSnapshot() takeHeapSnapshot(): Readable Returns a readable stream of a V8 heap snapshot. Drains to a .heapsnapshot file.
.startCpuProfile() startCpuProfile(durationMs: number): Promise<string> Runs a CPU profile for durationMs milliseconds and returns the JSON string.
.enableDebugEndpoint() enableDebugEndpoint(opts: IDebugEndpointOptions): void Starts the opt-in debug HTTP server in a worker thread.
.debugPort() `debugPort(): number null`

BaseMonitor — extension base class

[src/core/baseMonitor.logic.ts](src/core/baseMonitor.logic.ts)

import { BaseMonitor } from 'deepscope';
import type { IPluginContext, Tier } from 'deepscope';

abstract class BaseMonitor implements IMonitorPlugin {
  abstract readonly name: string;
  abstract readonly tier: Tier;

  // Final lifecycle methods (do not override):
  register(ctx: IPluginContext): void { /* calls onRegister */ }
  start(): void                       { /* calls onStart */ }
  stop(): void                        { /* calls onStop */ }

  // Override these in your subclass:
  protected abstract onRegister(ctx: IPluginContext): void;
  protected abstract onStart(): void;
  protected abstract onStop(): void;
}

Lifecycle order: register(ctx)start()stop(). Create metrics in onRegister, attach listeners in onStart, detach in onStop. stop() is safe to call multiple times.

IMonitorPlugin — minimal interface

import type { IMonitorPlugin, IPluginContext, Tier } from 'deepscope';

interface IMonitorPlugin {
  readonly name: string;
  readonly tier: Tier;
  register(ctx: IPluginContext): void;
  start(): void;
  stop(): void;
}

Implement this directly if you don't want the BaseMonitor lifecycle scaffolding.

IMetricFactory — metric creation

interface IMetricFactory {
  counter(opts: ICounterOpts): ICounter;
  gauge(opts: IGaugeOpts): IGauge;
  histogram(opts: IHistogramOpts): IHistogram;
}

Available through ctx.metrics inside onRegister. Never import prom-client directly in your monitors — use this factory so the backend remains swappable.


Built-in monitors

Monitor Factory Tier Peer dep Key metrics
ProcessMonitor processMonitor() tier1 nodejs_heap_space_used_bytes, nodejs_rss_bytes, nodejs_heap_used_vs_total_ratio, nodejs_gc_runs_total, nodejs_gc_pause_seconds, nodejs_event_loop_lag_seconds, nodejs_event_loop_utilization_ratio, nodejs_active_handles
EventLoopMonitor eventLoopMonitor() tier2 nodejs_eventloop_time_share_ratio, nodejs_eventloop_time_seconds_total, nodejs_async_resource_active, nodejs_async_resource_created_total
HttpQueueMonitor httpQueueMonitor({ app }) tier1 express http_request_queue_depth, http_request_queue_seconds, http_active_requests
MongoMonitor mongoMonitor({ mongoose }) tier2 mongoose mongodb_op_seconds, mongodb_op_errors_total
KafkaMonitor kafkaMonitor({ consumer }) tier1 kafkajs kafka_consumer_lag, kafka_consumer_batch_size, kafka_consumer_batch_seconds, kafka_consumer_messages_total, kafka_consumer_errors_total
ExternalCallMonitor externalCallMonitor({ axios }) tier2 axios external_http_seconds, external_http_total, external_http_errors_total
LoggerMonitor loggerMonitor({ provider }) tier1 logger_queue_depth, logger_dropped_total, logger_enqueue_rate*

*logger_enqueue_rate is only emitted when the provider implements getEnqueueRate().

For the full metric catalog including label names, types, and bucket definitions, see [docs/metrics.md](docs/metrics.md).


Tiered monitoring

deepscope uses a two-tier model so you can keep production overhead low and escalate instrumentation depth only when needed.

Tier 1 — always-on

Tier 1 monitors run continuously. They are designed to be cheap: polling intervals are coarse, they use perf_hooks and process.memoryUsage() rather than async hooks, and they do not intercept every operation. These are the metrics you leave on in production at all times.

Tier 2 — togglable

Tier 2 monitors add per-operation granularity: Mongoose per-collection query timing, Axios per-host call latency, and async_hooks-based event-loop attribution by subsystem. These have higher overhead and are intended to be enabled on-demand — either when an incident is in progress, or in non-production environments.

Switching tiers

// At construction time (default is 'tier1')
const scope = new Deepscope({ registry, serviceName: 'svc', tier: 'tier2' }).use(...).start();

// At runtime — programmatic
scope.setTier('tier2');  // enables tier2 monitors
scope.setTier('tier1');  // stops tier2 monitors

// At runtime — via debug endpoint (see below)
// POST http://127.0.0.1:9091/tier?value=tier2

Tier transitions are safe to call while the process is under load. Tier 2 monitors are stopped cleanly (listeners detached) when you downgrade back to tier 1.


Debug endpoint

The debug endpoint is off by default. Enable it explicitly:

scope.enableDebugEndpoint({ port: 9091 });           // binds to 127.0.0.1 by default
scope.enableDebugEndpoint({ port: 9091, host: '0.0.0.0' }); // explicit external bind

The endpoint runs in a worker thread so it remains responsive even when the main thread is under load — which is exactly the scenario where you need it most.

Routes

Method Path Description
GET /healthz Returns ok. Use to confirm the worker is alive.
GET /heap Triggers a V8 heap snapshot on the main thread and streams the .heapsnapshot file as a download. Allow up to 2 minutes.
POST /cpu?ms=<duration> Runs a CPU profile for ms milliseconds (default 10 000) and returns the .cpuprofile JSON as a download.
POST /tier?value=<tier> Sets the active tier. value must be tier1 or tier2.

Security note: there is no built-in authentication. The default bind address is 127.0.0.1. Do not bind 0.0.0.0 in production without placing the port behind a reverse proxy, VPN, or SSH tunnel that provides its own auth layer. This endpoint is a diagnostics surface — treat it accordingly.

The programmatic API (takeHeapSnapshot(), startCpuProfile(), setTier()) works independently of the HTTP server and is the preferred approach when you can deploy a code change.


Extending: write your own monitor

Any subsystem not covered by the built-in monitors can be instrumented by extending BaseMonitor. The extension contract is the same whether you are adding a first-party monitor or a user-written one — Deepscope treats them identically.

Example: a BullMQ job queue monitor

import { BaseMonitor } from 'deepscope';
import type { IPluginContext, Tier, IGauge, ICounter } from 'deepscope';
import type { Queue } from 'bullmq'; // type-only import; instance passed at runtime

interface IBullMQMonitorOptions {
  queue: Queue;
  pollIntervalMs?: number;
}

class BullMQMonitor extends BaseMonitor {
  readonly name = 'bullmq';
  readonly tier: Tier = 'tier1';

  private waitingGauge!: IGauge;
  private activeGauge!: IGauge;
  private failedTotal!: ICounter;
  private timer: NodeJS.Timeout | null = null;

  constructor(private readonly opts: IBullMQMonitorOptions) {
    super();
  }

  protected onRegister(ctx: IPluginContext): void {
    const f = ctx.metrics;
    this.waitingGauge = f.gauge({
      name: 'bullmq_jobs_waiting',
      help: 'Jobs currently waiting in the queue',
      labelNames: ['queue'],
    });
    this.activeGauge = f.gauge({
      name: 'bullmq_jobs_active',
      help: 'Jobs currently being processed',
      labelNames: ['queue'],
    });
    this.failedTotal = f.counter({
      name: 'bullmq_jobs_failed_total',
      help: 'Total failed jobs since start',
      labelNames: ['queue'],
    });
  }

  protected onStart(): void {
    const interval = this.opts.pollIntervalMs ?? 5000;
    this.timer = setInterval(async () => {
      const counts = await this.opts.queue.getJobCounts();
      const name = this.opts.queue.name;
      this.waitingGauge.set({ queue: name }, counts.waiting ?? 0);
      this.activeGauge.set({ queue: name }, counts.active ?? 0);
    }, interval);

    this.opts.queue.on('failed', () => {
      this.failedTotal.inc({ queue: this.opts.queue.name }, 1);
    });
  }

  protected onStop(): void {
    if (this.timer) clearInterval(this.timer);
    this.timer = null;
    this.opts.queue.removeAllListeners('failed');
  }
}

// Export a factory function (same pattern as first-party monitors)
export function bullmqMonitor(opts: IBullMQMonitorOptions): BullMQMonitor {
  return new BullMQMonitor(opts);
}

Register it the same way as any other monitor:

scope.use(bullmqMonitor({ queue: myQueue }));

Rules for user-written monitors:

  • Declare all metrics in onRegister, never in onStart or hot paths
  • Use only ctx.metrics (the IMetricFactory) — never import prom-client directly
  • Accept peer-dep instances via constructor options — never import library packages at module top level
  • Name must be unique within a Deepscope instance
  • onStop must be idempotent

Logger integration

LoggerMonitor is decoupled from any specific logger via the ILoggerStatsProvider interface:

interface ILoggerStatsProvider {
  getQueueDepth(): number;       // required: current queue size
  getDroppedCount(): number;     // required: cumulative dropped messages since process start
  getEnqueueRate?(): number;     // optional: msgs/sec; omit if your logger doesn't track it
}

Write a thin adapter for your logger and pass it to loggerMonitor:

import type { ILoggerStatsProvider } from 'deepscope';

// Hypothetical logger with an internal stats object
class MyLoggerAdapter implements ILoggerStatsProvider {
  constructor(private readonly logger: MyLogger) {}

  getQueueDepth(): number {
    return this.logger.stats.pendingCount;
  }

  getDroppedCount(): number {
    return this.logger.stats.totalDropped;
  }

  getEnqueueRate(): number {
    return this.logger.stats.enqueuePerSecond;
  }
}

scope.use(loggerMonitor({ provider: new MyLoggerAdapter(myLogger), pollIntervalMs: 5000 }));

The adapter pattern means deepscope has no compile-time dependency on any specific logger. Any logger that can expose these three numbers — even via a custom stats scraper — can be monitored.


Contributing

Open an issue first for any non-trivial change so the approach can be agreed before implementation. Follow the conventions in CLAUDE.md strictly — in particular the file-naming convention, OOP/SOLID requirements, and the rule that every metric change (add, rename, remove) must update docs/metrics.md in the same commit and keep the reference Grafana dashboard in docs/grafana/ consistent.

Bug fixes and monitor additions are welcome. For new monitors, follow the pattern in src/monitors/ and add coverage in both test/unit/monitors/ and test/integration/.


Roadmap

Items out of scope for the current release, but tracked for future work:

  • Fastify, Koa, and Hapi adapters for httpQueueMonitor
  • OpenTelemetry and StatsD / Datadog metric backends (the IMetricFactory chokepoint makes this a one-file swap)
  • Auto-detection of installed peer dependencies (deliberately excluded — explicit .use() calls only)
  • Hosting the /metrics scrape endpoint (the host application owns this)

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors