Skip to content

orchenginex/cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

OrchEngineX

Structural Fragility Modeling for Distributed Systems
Model, simulate, and quantify cascading failure risk — before production.

Platform · Documentation · Research · Simulations · CLI


What is OrchEngineX?

OrchEngineX is an infrastructure simulation engine that helps engineers understand how failures propagate through distributed systems.

Instead of looking at services in isolation, OrchEngineX models your architecture as a dependency graph and simulates how latency, packet loss, or service failures ripple across the system.

This makes it possible to identify hidden reliability risks — such as cascading failures and critical bottlenecks — before they appear in production.

The core thesis: Most production outages aren't caused by hardware failures — they're caused by structural coordination problems that are invisible to monitoring dashboards. OrchEngineX makes those structural risks visible and quantifiable before they reach production.

Key Capabilities

Capability Description
Architecture Builder Visual DWFG editor with 22 node types across 7 infrastructure layers
Fragility Scoring Composite risk model (0–100) combining topology, mechanics, and data plane analysis
Simulation Engine Node removal, cascade propagation, retry storm, and partition sensitivity modeling
System Mechanics 6-engine analysis: compute, consistency, replication, transactions, distribution, flow control
Data Plane Simulator Consistency/replication/sharding/transaction impact modeling
Growth Mode Analysis Scaling efficiency prediction with coordination overhead estimation
Failure Case Studies Reproducible research on pool saturation cascades, queue collapse, and more
CLI Tooling Full command-line interface for automation, CI/CD integration, and batch analysis

Architecture Model

OrchEngineX models distributed systems as Directed Weighted Failure Graphs (DWFGs) — a graph-theoretic representation where:

  • Nodes represent infrastructure components (API gateways, databases, caches, message brokers, etc.)
  • Edges represent communication paths with latency, packet loss, retry, and timeout characteristics
  • Weights encode failure probability, resource capacity, and retry amplification potential

Node Types (22 types across 7 layers)

Layer Node Types
Edge CDN, Edge Cache, Geo DNS / Traffic Router
Ingress API Gateway, Load Balancer, Reverse Proxy, WAF
Compute Stateless Service, Stateful Service, Background Worker, Batch Processor
Data Database (Primary), Replica Set, Cache, Distributed KV Store, Search Index
Messaging Message Broker, Stream Processor, Dead Letter Queue
Mesh Service Mesh Proxy
Egress External API, Payment Gateway, Third-Party Service

Edge Types

Type Description Default Timeout
sync Synchronous request/response 3,000ms
async Asynchronous event delivery 30,000ms
replication Data replication between stores 60,000ms
mesh-hop Service mesh sidecar routing 5,000ms

System Constraints

  • Maximum 40 nodes per architecture
  • Maximum 80 edges per architecture
  • No self-loops permitted
  • Unique node and edge IDs enforced
  • File size limit: 2MB for JSON imports

Fragility Score

The Architecture Fragility Score is a composite metric (0–100) that quantifies how susceptible a distributed architecture is to cascading failure. It is computed identically across the web UI, API, and CLI.

Composition

Fragility = Cascade Susceptibility × 0.30    (topology)
           + Partition Sensitivity  × 0.25    (data plane)
           + Quorum Fragility       × 0.20    (quorum)
           + Retry Storm Potential   × 0.15    (mechanics)
           + SPOF Penalty                      (nodes × 3, max 15)

Sub-Scores

Factor Weight What It Measures
Cascade Susceptibility 30% Fan-out amplification × dependency density × SCC cycle penalty
Partition Sensitivity 25% Data plane contribution from consistency/replication/sharding/transaction configs
Quorum Fragility 20% Risk from nodes with replicaCount < 3 (quorum threshold)
Retry Storm Potential 15% Retry amplification risk from retry policies across all edges
SPOF Penalty +3/node Single points of failure detected via BFS disconnection analysis

Risk Thresholds

Score Classification Interpretation
0–39 🟢 Low Architecture has structural redundancy and controlled retry geometry
40–69 🟡 Moderate Some structural risks present — review SPOFs and retry policies
70–100 🔴 High Architecture is structurally fragile — cascading failure likely under stress

Simulation Engine

Node Removal Impact

Simulates the removal of one or more nodes and computes:

  • Availability Drop — percentage of unreachable nodes post-failure
  • Latency Shift — propagated latency increase through dependent paths
  • Retry Amplification Delta — multiplier effect from retry policies on failed paths
  • Partition Risk Delta — change in network partition sensitivity

Cascade Propagation

Models how failure spreads through the graph using:

  1. BFS traversal from failed node(s)
  2. Weighted propagation based on edge failure probabilities
  3. Retry amplification at each hop
  4. Timeout threshold enforcement

Growth Mode

Predicts the efficiency of horizontal scaling by analyzing:

  • Load distribution per replica
  • Retry propagation fan-out under scale
  • Coordination overhead (consensus latency for stateful nodes)
  • Mesh hop latency increase from additional replicas
  • Overall scaling efficiency (0–100)

System Mechanics

Six composable simulation engines model different aspects of distributed system behavior:

Engine What It Models
Compute CPU/memory pressure, thread pool exhaustion, GC amplification
Consistency Linearizable vs. eventual consistency trade-offs, read/write conflict rates
Replication Sync vs. async replication lag, split-brain probability, failover timing
Transaction 2PC vs. Saga coordination overhead, lock contention, deadlock probability
Distribution Hash vs. range vs. directory sharding, hotspot probability, rebalance cost
Flow Control Backpressure propagation, buffer saturation, admission control effectiveness

Each engine produces a composite risk score that feeds into the architecture's overall fragility assessment.


Data Plane Simulator

Models the impact of data layer configuration choices:

Configuration Axes

Axis Options
Consistency linearizable, sequential, causal, eventual
Replication sync-all, sync-quorum, async-primary, async-mesh
Sharding none, hash, range, directory
Transaction 2pc, saga, tcc, none

Impact Metrics

  • Latency Impact — additional latency from coordination
  • Availability Impact — availability reduction from consistency requirements
  • Partition Tolerance — behavior during network splits
  • Throughput Impact — write/read throughput changes

CLI

The oex CLI provides full command-line access to OrchEngineX capabilities. Distributed as the @orchenginex/cli npm package.

Installation

npm install -g @orchenginex/cli

Authentication

# Browser-based OAuth (recommended)
oex login

# Token-based (CI/CD environments)
oex login --token <your-api-token>

# Verify authentication
oex whoami

Commands

Architecture Management

# List all saved architectures
oex arch list

# Import architecture from JSON
oex arch import ./my-architecture.json --name "Production Topology"

# Inspect architecture with structural analysis
oex arch inspect <architecture-id>

# Delete an architecture
oex arch delete <architecture-id> --force

oex arch inspect output:

  Production Topology
  ─────────────────────────────────────
  Nodes:                 18
  Edges:                 24
  Density:               15.7%
  Fan-Out Index:         2.4
  Critical Path Depth:   6
  Spectral Radius:       3.1

  Fragility Score:       42/100
  Cascade Susceptibility:38
  Partition Sensitivity: 45
  Retry Amp Risk:        31

  SPOFs:                 db-primary, api-gateway

Fragility Analysis

# Get current fragility score
oex fragility score <architecture-id>

# View fragility trend (historical snapshots)
oex fragility trend <architecture-id> --limit 20

# Export trend data
oex fragility trend <architecture-id> --output trend.csv --format csv

Simulation

# Run failure simulation
oex simulate --failure-node api-gateway --latency 150 --packet-loss 5 --retries 3

# Run with saved architecture
oex simulate --arch <architecture-id> --failure-node db-primary

Experiments (Matrix Sweeps)

# Run parameter sweep experiment
oex experiments run --arch <architecture-id>

# List experiment runs
oex experiments list

# Generate experiment report
oex experiments report <experiment-id>

Visualization

# Generate architecture graph
oex visualize graph <architecture-id> --format svg --output arch.svg
oex visualize graph <architecture-id> --format png --output arch.png

# Generate fragility heatmap
oex visualize heatmap <architecture-id> --format svg

# Generate trend chart
oex visualize trend <architecture-id> --format svg

Data Export

# Export architecture data
oex export architecture <id> --format json
oex export architecture <id> --format yaml

# Export simulation results
oex export simulation <id> --format csv

# Export fragility snapshots
oex export fragility <id> --format csv

Project Scaffolding

# Initialize new project with starter template
oex init

# Validate architecture JSON offline
oex validate ./architecture.json

# Direct structural analysis
oex analyze structure ./architecture.json

Rate Limits

Endpoint Limit
/simulate 30 req/min
/visualize 20 req/min
/data-api 60 req/min
/experiments 10 req/min

Configuration

  • Project manifest: .oex.yml
  • Credentials: ~/.oex/credentials.json (0o600 permissions)
  • API token env var: OEX_API_TOKEN

Architecture JSON Schema

Architectures are defined as JSON files conforming to the CustomArchitecture interface:

{
  "name": "My Architecture",
  "nodes": [
    {
      "id": "gw-1",
      "type": "api-gateway",          // one of 22 node types
      "label": "API Gateway",
      "latencyBaseline": 5,            // ms
      "retryPolicy": 2,               // max retries
      "failureProbability": 1,         // 0-100%
      "resourceCapacity": 95,         // 0-100%
      "replicaCount": 2,              // optional, default 1
      "dataPlaneConfig": {             // optional
        "consistency": "eventual",
        "replication": "async-primary",
        "sharding": "hash",
        "transaction": "saga"
      }
    }
  ],
  "edges": [
    {
      "id": "e-1",
      "source": "gw-1",
      "target": "svc-orders",
      "type": "sync",                  // sync | async | replication | mesh-hop
      "latencyDistribution": 10,       // avg ms
      "packetLossProbability": 0.5,    // 0-100%
      "retryPolicy": 2,               // max retries
      "timeoutThreshold": 3000         // ms
    }
  ]
}

See examples/ for complete importable architectures.


Research & Publications

OrchEngineX publishes failure case studies with reproducible simulations:

Publication Key Finding
The 4-Minute Queue Collapse 10% traffic spike → total system collapse in 3:47 via queue depth amplification
Pool Saturation Cascade 2% latency spike → 20-service cascading failure via connection pool exhaustion

Each publication includes interactive simulations, architecture graphs, and exportable diagrams.


Project Structure

orchenginex
│   CHANGELOG.md
│   CONTRIBUTING.md
│   LICENSE
│   README.md
│
├── cli
│   └── cli-reference.md
│
├── docs
│   ├── architecture-modeling.md
│   ├── data-plane.md
│   ├── fragility-scoring.md
│   ├── methodology.md
│   ├── simulation-engine.md
│   └── system-mechanics.md
│
├── examples
│   ├── event-driven-pipeline.json
│   ├── full-stack-hft.json
│   ├── microservices-basic.json
│   └── multi-region-ha.json
│
└── .github
    ├── pull_request_template.md
    └── ISSUE_TEMPLATE
        ├── bug_report.md
        └── feature_request.md

Tech Stack

Layer Technology
Frontend React 18, TypeScript, Vite
Styling Tailwind CSS, shadcn/ui
Charts Recharts
Animations Framer Motion
Backend Supabase (Edge Functions, PostgreSQL, Auth)
CLI Node.js, Commander.js
Deployment Vercel

Development

Prerequisites

  • Node.js 18+
  • npm or bun

Setup

# Clone the repository
git clone https://github.com/orchenginex/orchenginex.git
cd orchenginex

# Install dependencies
npm install

# Start development server
npm run dev

Contributing

See CONTRIBUTING.md for guidelines on:

  • Reporting bugs and requesting features
  • Submitting pull requests
  • Code style and architecture conventions
  • Adding new node types or simulation engines

License

MIT — see LICENSE for details.


OrchEngineX — Structural Fragility Modeling for Distributed Systems
www.orchenginex.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors