Skip to content

lvonguyen/cloudforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

616 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud Aegis

Go License Development Status Implementation

Enterprise Cloud Governance Platform with Self-Service Provisioning

Cloud Aegis is a reference architecture and implementation for an Internal Developer Platform (IDP) that enables self-service cloud resource provisioning with built-in governance, compliance guardrails, and exception management workflows.

Live Demo | API

About this project — Cloud Aegis demonstrates enterprise security patterns I've designed, built, and operated across identity, infrastructure, governance, and software lifecycle domains throughout my career. My background has always been project-based: assess the current state gaps, design a solution mapped to business requirements, present trade-offs to leadership, then drive implementation hands-on across infra, dev, and ops teams through to production handoff. This project reflects that same end-to-end ownership — I don't stop at design docs, I ship working systems backed by threat models and ADRs (the 19 ADRs capture the same decision-making process I'd use to brief a CISO or engineering VP). It is a portfolio-grade reference architecture, not a production SaaS product — select vertical slices (ServiceNow GRC, JWT auth, S3/SSH remediation) are fully implemented while others are architectural stubs that document the design intent. I use security-focused systems design as my core discipline and agentic coding workflows (Claude Code) as a force multiplier for delivery.

Development rigor — Code quality is enforced through a layered toolchain: golangci-lint with gosec/gocritic/revive in CI, shared coding standards governing Go patterns, error handling, and security rules across all repos, pre-commit hooks blocking credential leaks, and systematic multi-pass QA reviews (quality, security, bug discovery) before merge. The emphasis is on elegant, maintainable code paired with comprehensive documentation and detailed architecture diagrams — minimizing tech debt throughout the SDLC rather than accruing it for later.


[/] Implementation Status

Current State: Active development (~92% feature-complete). Core API functional, GRC integration working, remediation dispatcher operational, CI/CD pipeline hardened with Lighthouse CI budgets, IaC deploy layer with multi-cloud Terraform modules and policy-as-code, self-service portal built and deployed with error states, accessibility, and performance optimizations.

Component Status Notes
Core API
HTTP handlers Done Full API surface implemented
Configuration Done Environment variables + custom YAML loader with env overrides
Health endpoints Done /health, /ready, /live
GRC Integration
Provider abstraction Done Interface + factory pattern
RSA Archer client Done Full workflow integration
ServiceNow GRC Done Native integration
PostgreSQL provider Done Lightweight option
In-Memory provider Done For testing
GetExceptionsByRequestor Done Endpoint: GET /exceptions/mine (RBAC: requester+)
Compliance
Framework engine Done 20+ frameworks supported
Finding deduplication Done Cross-framework dedup
Control mapping Done Framework-to-control mapping
AI Integration
Provider abstraction Done Claude/OpenAI interface
Risk analysis Done AI-powered scoring
Remediation generation Partial Basic implementation
AI Governance
OPA engine (embedded) Done In-process OPA for agent tool/data-flow control
Agent registry Done Observability, status tracking, lifecycle
STRIDE/ATLAS threat models Done Structured threat modeling per agent type
Maturity assessment Done Governance maturity scoring
Policy Engine
OPA integration Done Policy evaluation working
Rego policies Done Region, cost, network policies
Observability
Structured logging (zap) Done JSON format
Prometheus metrics Done /metrics endpoint
OpenTelemetry tracing Done 72 handler spans across 24 files; enrichment, threat intel, and AI provider sub-spans; Jaeger in docker-compose (localhost-bound); AEGIS_TRACING_ENABLED + AEGIS_OTLP_ENDPOINT configuration; pprof dev endpoint (127.0.0.1:6060)
Remediation Dispatcher
Executor engine Done Concurrent batch execution with semaphore
Handler interface Done Remediate, Validate, DryRun, Tier
Network handlers Done BlockPublicSSH (SSH/RDP/SG finding types)
Security services Done GuardDuty enablement, Azure Defender (stub)
Storage handlers Done S3 public access block
Compute handlers Done IMDSv2 enforcement
Identity handlers Done IAM key rotation (Tier 2)
Secrets handlers Done Manual rotation guidance (no-op)
Patching handlers Done SSM patch compliance (query-only, Tier 3)
Rollback engine Done 48h rollback window, state snapshots
Findings bridge Done Temporary bridge to cspm-aggregator types
Execute/Retry UI Done useExecuteRemediation mutation hook, button wiring in RemediationQueue + RemediationDetail
Security
Rate limiting Done Redis-backed, tier-based, wired into /api/v1 routes
JWT authentication Done HS256/RS256 validation, JWKS caching, wired into router
OIDC provider integration Done Okta JWKS auto-derived from OKTA_DOMAIN, Entra ID provider interface; full SSO requires Okta app config
Authorization (RBAC) Done Role-based middleware (admin/operator/requester), dev header override with enum validation
IaC / Deploy
Terraform modules (compute) Done Cloud Run + ECS Fargate + Azure Container Apps
Terraform modules (database) Done Cloud SQL + RDS + Azure PostgreSQL
Terraform modules (redis) Done Memorystore + ElastiCache + Azure Cache
Rego policies (IaC) Done 5 policies, 27 rules (security, cost, network, naming, AI governance)
Policy gate script Done terraform plan + conftest pipeline
Deploy Dockerfiles Done Multi-stage frontend (nginx) + backend (Go)
Environment configs Done Dev environment with GCS remote state
Portal
React SPA (frontend/) Done React 19 + Vite 7 + Tailwind CSS v4 + shadcn/ui
36 route pages Done Admin, Operator, Requester role views + attack paths + containers
Dark mode Done CSS variable overrides, anti-flash script
Cloudflare Pages deploy Done cloudaegis-demo.lvonguyen.com
API hook migration Partial MyRequests, useFindings (R2 fallback), useAttackPaths (mock fallback), useCostAnomalies cache fix, Execute/Retry mutations wired; remaining hooks fall back to mock on 401 in dev
Risk Intelligence
Contextual risk schema Done AttackPathContext, ToxicComboDetails, MITRE fields
LLM severity re-scoring Done Claude-powered with blast radius + EPSS + KEV inputs
Severity normalization Done Per-CSP normalization (AWS ASFF, Azure, GCP)
Attack path computation Done In-memory BFS graph engine + ReactFlow DAG visualization (ADR-008)
EPSS scoring Done HTTP client with 12h cache, batch fetching from FIRST API
CISA KEV catalog Done In-memory catalog with auto-refresh from CISA feed
GreyNoise integration Done HTTP client with 12h cache, classification enrichment
Testing
Unit tests 1894+ passing 34 Go packages (1,474 tests), 420+ frontend tests (51 test files), 8 benchmarks. 3 packages at 100% coverage (workflow, remediation/secrets, finops/aggregator). v8 coverage thresholds (lines: 70, functions: 75, branches: 65)
Integration tests Done 12-step server lifecycle + 34-subtest RBAC authorization matrix (go test -tags=integration)

Package Maturity

Package Status Description
internal/api Production HTTP handlers, RBAC, rate limiting
internal/grc Production GRC provider abstraction (Archer, ServiceNow, PostgreSQL, Memory)
internal/compliance Production 20+ framework engine, dedup, control mapping
internal/ai Production Claude/OpenAI provider abstraction
internal/ai-governance Production Embedded OPA, agent registry, STRIDE/ATLAS
internal/policy Production OPA integration, Rego evaluation
internal/observability Production Structured logging (zap), Prometheus metrics
internal/findings Production Finding types, bridge to CSPM aggregator
pkg/remediation Production Executor engine, 10 handlers, rollback
internal/cicd Partial SAST/VCS interfaces, basic integrations
internal/finops Production Cost aggregation (AWS Cost Explorer wirable via FINOPS_PROVIDER=aws), anomaly detection, chargeback
internal/container Production K8s topology (Trivy parser wirable via TRIVY_OUTPUT_PATH), image scan interface
internal/secrets Interface + Mock Vault integration interface, mock provider
internal/waf Interface + Mock Golden template validation, compliance scanner
internal/identity Interface + Mock Okta/Entra ID provider stubs with mock returns
internal/workflow Stub Temporal workflow definitions, not wired

[+] Quality & Testing

Metric Value
Go packages 34 (all passing with -race)
Go tests 1,474
Frontend tests 447+ (52 test files)
Benchmarks 8
CI gates 8 (lint, gosec, Trivy, vitest, npm audit, integration, Codecov, Lighthouse)
Coverage thresholds v8 lines: 70%, functions: 75%, branches: 65%
100% coverage packages workflow, remediation/secrets, finops/aggregator

[!] Known Limitations

This is a platform reference implementation, not production software:

  1. Temporal Workflows — Workflow definitions exist, orchestration layer not wired into request flow
  2. Stub Packages — secrets, waf modules have interfaces and mock implementations but no production wiring
  3. RoleViewerRoleViewer (rank 0) is implemented with read-only surface (/findings, /compliance/frameworks, /agents + traces); fine-grained per-resource viewer scoping is not yet enforced
  4. Chrome QA Findings — 35/36 routes passing with error states, focus rings, footer landmark, OG meta tags; React 19 lazy() context edge case under Playwright (pre-existing, not prod)
  5. OIDC Auth Flow — JWT middleware is production-ready (HS256/RS256, JWKS); Okta JWKS URL auto-derives from OKTA_DOMAIN env var. Full SSO login flow requires Okta app configuration.

Production Requirements:

  • Wire Okta/Entra ID OIDC application for full SSO login flow
  • Expand RBAC with fine-grained permissions
  • Test and validate Temporal workflows

[*] What This Solves

Enterprise cloud environments face a constant tension:

  • Developers want fast, self-service access to infrastructure
  • Security needs guardrails, approvals, and audit trails
  • Finance requires cost controls, tagging, and chargeback
  • Compliance demands policy enforcement and exception documentation

Cloud Aegis bridges these needs with a unified platform that provides:

  • Self-service portal for requesting cloud resources
  • Policy-as-code guardrails (OPA/Rego)
  • Golden path Terraform modules (pre-approved, versioned)
  • Exception workflow integration with enterprise GRC tools
  • Multi-cloud support (AWS, Azure, GCP)

[/] Architecture

Cloud Aegis Architecture


[/] Repository Structure

cloudforge/
├── cmd/
│   ├── server/                    # API server entrypoint
│   └── remediation-dispatcher/    # Remediation dispatcher service
├── internal/
│   ├── ai/                        # AI provider integration (Claude, OpenAI)
│   ├── ai-governance/             # AI governance module (OPA engine, agent registry, STRIDE/ATLAS)
│   ├── api/                       # API handlers and rate limiting
│   ├── cicd/                      # CI/CD security scanning
│   │   ├── sast/                  # SAST integrations (SonarQube, Checkov, Veracode)
│   │   └── vcs/                   # VCS integrations (GitHub, GitLab, Azure DevOps)
│   ├── compliance/                # Compliance frameworks and deduplication
│   ├── container/                 # Container security module
│   ├── finops/                    # FinOps cost management
│   │   ├── aggregator/            # Multi-cloud cost aggregation
│   │   ├── anomaly/               # Cost anomaly detection
│   │   ├── chargeback/            # Cost allocation engine
│   │   └── reporter/              # Showback/chargeback reports
│   ├── grc/                       # GRC provider abstraction (Archer, ServiceNow)
│   ├── identity/                  # Identity providers (Entra ID, Okta) + Zero Trust
│   ├── observability/             # Logging, metrics, tracing, health checks
│   ├── policy/                    # OPA integration
│   ├── remediation/               # Remediation domain handlers
│   │   ├── compute/               # EC2 IMDSv2 enforcement
│   │   ├── identity/              # IAM key rotation
│   │   ├── network/               # SSH/RDP ingress blocking
│   │   ├── patching/              # OS patch compliance (SSM)
│   │   ├── private_cloud/         # Private cloud remediation (planned)
│   │   ├── secrets/               # Exposed secret rotation guidance
│   │   ├── security_services/     # GuardDuty, Azure Defender
│   │   └── storage/               # S3 public access blocking
│   ├── waf/                       # WAF golden templates and compliance scanner
│   └── workflow/                  # Temporal workflow definitions
├── pkg/
│   └── remediation/               # Executor engine, Remediator interface, types
├── rust/
│   └── libaegispath/              # Rust FFI library for attack path BFS (rayon parallelism)
│       └── bridge.go              # CGo bridge: ComputeAttackPaths, LoadAndSerializeFindings
├── migrations/                    # Database migrations
├── deploy/
│   ├── terraform/
│   │   ├── modules/               # Multi-cloud Terraform modules
│   │   │   ├── compute/           # Cloud Run / ECS Fargate / Azure Container Apps
│   │   │   ├── database/          # Cloud SQL / RDS / Azure PostgreSQL
│   │   │   ├── iam/               # GCP SA / AWS IAM Roles / Azure Managed Identity
│   │   │   ├── monitoring/        # Cloud Monitoring / CloudWatch / Azure Monitor
│   │   │   ├── redis/             # Memorystore / ElastiCache / Azure Cache
│   │   │   └── secrets/           # GCP Secret Manager / AWS Secrets Manager / Azure Key Vault
│   │   ├── environments/          # Per-environment configs (dev, staging, prod)
│   │   └── policies/              # Rego policies for IaC validation (conftest)
│   ├── scripts/                   # plan-with-policy.sh, deploy.sh
│   └── docker/                    # Frontend (nginx) + Backend (Go) Dockerfiles
├── policies/                      # OPA/Rego runtime policies
├── configs/                       # Configuration templates
├── frontend/                      # Self-service portal (React 19 + Vite 7)
│   ├── src/
│   │   ├── pages/                 # 36 route pages (admin, ops, portal views)
│   │   ├── components/            # shadcn/ui component layer
│   │   ├── hooks/                 # Custom hooks (deploy preview, etc.)
│   │   ├── lib/                   # API client, auth, utilities
│   │   └── types/                 # TypeScript type definitions
│   └── public/                    # Static assets and logos
├── docs/
│   ├── core/
│   │   ├── architecture/          # HLD, DDD, DR-BC, data models
│   │   │   └── adr/               # Architecture Decision Records (19 ADRs)
│   │   ├── diagrams/              # Architecture diagrams (SVG + Mermaid + Figma)
│   │   └── runbooks/              # Operational procedures (9 runbooks)
│   ├── api/                       # OpenAPI 3.1 specification (82 operations)
│   ├── cspm/                      # CSPM aggregator HLD, DDD, schema reference
│   ├── research/                  # Technical research and POC notes
│   └── archive/                   # Historical planning docs
├── scripts/                       # Seed pipeline + build scripts
├── docs-site/                     # Docusaurus documentation site
├── k6/                            # Load testing (smoke, stress)
├── rust/                          # Rust FFI bridge (libaegispath)
└── Makefile                       # Build targets

[+] Key Features

Self-Service Portal

  • Application registration with metadata capture
  • Infrastructure request catalog (golden modules)
  • Exception request workflow
  • Compliance dashboards

Policy-as-Code

  • Region restrictions (data residency)
  • Instance size limits (cost control)
  • Network exposure rules (security)
  • Tagging requirements (governance)
  • Exception validation (GRC integration)

GRC Integration

Pluggable providers for enterprise GRC platforms:

  • RSA Archer - Full exception workflow integration
  • ServiceNow GRC - Native ServiceNow integration
  • PostgreSQL - Lightweight option for smaller orgs
  • In-Memory - For demos and testing

AI Intelligence

  • Contextual risk scoring with business context
  • Finding explanation generation
  • Remediation runbook generation
  • Request triage and routing

AI Governance (Merged from AgentGuard)

  • Embedded OPA engine — in-process Rego evaluation for AI agent tool and data-flow control (namespace: aegis)
  • Agent registry — lifecycle tracking, observability, status management across agent fleet
  • Threat modeling — STRIDE + ATLAS threat models per registered agent type
  • Maturity assessment — governance readiness scoring across 5 maturity dimensions
  • Dual-track OPA — cloud provisioning path uses external OPA server; AI governance uses embedded Go library — complementary, not conflicting

Infrastructure as Code (Deploy Layer)

  • Multi-cloud Terraform modules — compute (Cloud Run / ECS Fargate / Azure Container Apps), database (Cloud SQL / RDS / Azure PostgreSQL), redis (Memorystore / ElastiCache / Azure Cache)
  • Policy-as-code gate — 5 Rego policies (27 rules) validated via conftest against terraform plan JSON before any apply
  • Three-layer OPA governance — (1) plan-time IaC validation, (2) runtime policy evaluation via external OPA server, (3) in-process embedded OPA for AI agent governance
  • Deploy scripts — dry-run-by-default deployment with policy violation gate and human-readable remediation guidance
  • Container images — multi-stage Dockerfiles for frontend (nginx + SPA routing) and backend (Go + healthcheck)

Dual-OPA Architecture

Risk Intelligence

  • Contextual risk scoring — LLM-powered severity re-scoring that considers asset tier, environment (prod/dev/sandbox), internet exposure, blast radius, and compensating controls
  • Severity normalization — per-CSP normalization (AWS ASFF normalized scores, Azure severity labels, GCP attack exposure scores) into unified severity taxonomy
  • Threat intel enrichment — EPSS scoring (FIRST API, 12h cache) and CISA KEV catalog (auto-refresh) integrated into risk pipeline; GreyNoise IP classification enrichment (12h cache)
  • Attack path schemaAttackPathContext with blast radius count, IAM escalation path, chokepoint detection, toxic combination flag (graph computation engine in roadmap)
  • MITRE ATT&CK mapping — tactic and technique fields on findings for kill-chain context

Multi-Cloud Support

  • AWS (multiple Organizations, 2,400+ accounts)
  • Azure (750+ Subscriptions)
  • GCP (350+ Projects)
  • Extensible provider pattern

Automated Remediation

  • Tiered Execution: Tier 1 (auto-safe), Tier 2 (requires verification), Tier 3 (change window)
  • 10 Handlers: GuardDuty, SSH/RDP blocking, S3 public access, IMDSv2, IAM key rotation, Azure Defender, secrets guidance, OS patching
  • Dry-Run Default: All remediations preview actions before execution
  • 48-Hour Rollback: State snapshots for every remediation with automated rollback scripts
  • Concurrent Batch Execution: Semaphore-controlled parallel processing

FinOps Cost Management

  • Cost Aggregation: Multi-cloud cost data from AWS Cost Explorer, Azure Cost Management, GCP Billing
  • Anomaly Detection: ML-based spend anomaly alerting with configurable thresholds
  • Chargeback/Showback: Tag-based cost allocation with automated reports
  • Budget Tracking: Proactive budget alerts via Slack/PagerDuty
  • Optimization: Resource rightsizing and savings recommendations

[+] Tech Stack

Component Technology Purpose
API Server Go 1.25 Core platform API
Portal React 19 / Vite 7 Self-service SPA — Tailwind CSS v4, shadcn/ui, Cloudflare Pages
Workflows Temporal Orchestration, approvals
Policies OPA / Rego Guardrails, validation
IaC Terraform Resource provisioning
Database PostgreSQL 16 State, audit logs
Cache Redis Session, caching
AI Anthropic Claude Intelligence services
Identity OIDC (Okta/Entra ID) Authentication
Attack Path Engine Rust / CGo FFI High-performance BFS computation via libaegispath (rust/bridge.go)
Observability OpenTelemetry Tracing, metrics

[>] Quick Start

Prerequisites

  • Go 1.25+
  • Docker & Docker Compose
  • Terraform 1.5+
  • OPA CLI

Local Development

# Clone repository
git clone https://github.com/lvonguyen/cloudforge.git
cd cloudforge

# Start dependencies (Postgres, OPA, Temporal)
docker-compose up -d

# Run migrations
make migrate

# Start API server
make run

# Run tests
make test

# Build / test / bench Rust FFI library (requires Rust toolchain)
make rust-build     # cargo build --release
make rust-test      # cargo test
make rust-bench     # Criterion benchmarks
make rust-clean     # cargo clean

# Start frontend dev server
cd frontend
npm install
npm run dev       # http://localhost:5173

Configuration

# configs/config.yaml
server:
  port: 8080

database:
  host: localhost
  port: 5432
  name: aegis

grc:
  provider: memory  # memory | postgres | archer | servicenow

policy:
  opa_url: http://localhost:8181

workflow:
  temporal_host: localhost:7233

[/] Documentation

Document Description
High-Level Design System architecture overview (v3.0)
Detailed Design API specs, data models
DR/BC Plan Disaster recovery procedures (v2.1)
Component Rationale Build vs buy decisions
Dual-OPA Architecture Cloud provisioning OPA (HTTP) vs AI governance OPA (embedded)
Attack Path Enhancements Graph-based attack path analysis roadmap
Compliance Deployment Models Multi-cloud compliance topology
Failover Sequence DR failover steps and timing
Global Deployment Multi-region deployment layout
IaC Deploy Pipeline Terraform/conftest CI/CD flow
Remediation Dispatcher Automated remediation routing
Risk Intelligence Pipeline Risk scoring data pipeline

Architecture Decision Records (19 ADRs)

ADR Decision
ADR-001 Programming Language (Go)
ADR-002 Database Selection (PostgreSQL)
ADR-003 Caching Strategy (Redis)
ADR-004 AI Provider (Anthropic Claude)
ADR-005 Rate Limiting Strategy
ADR-006 Authentication (OIDC + JWT)
ADR-007 GRC Integration Pattern
ADR-008 Attack Path Computation (BFS + ReactFlow)
ADR-009 Remediation Dispatcher Architecture
ADR-010 FinOps Multi-Cloud Cost Aggregation
ADR-011 Toxic Combination Detection Strategy
ADR-012 Whitelabel/Multi-Tenant Architecture
ADR-013 Resource-Scoped RBAC (ABAC)
ADR-014 Event-Driven Finding Ingestion
ADR-015 Graph Query Engine (PuppyGraph)
ADR-016 Container Security Scanning
ADR-017 Secrets Management Architecture
ADR-018 Threat Intelligence Feed Integration
ADR-019 Multi-Tenant Data Isolation

Runbooks

Runbook Purpose
01-deployment Deployment procedures
02-incident-response Incident handling
03-dr-failover DR failover procedures
04-performance Performance issues
05-remediation-operations Remediation operations
06-policy-management OPA policy management
07-secrets-rotation Secrets rotation procedures
08-finops-budget-alerts FinOps budget alerting
09-identity-provider-setup Okta/Entra ID setup

[!] Security

  • All API endpoints require authentication (OIDC via Entra ID/Okta)
  • Service-to-service communication planned for mTLS
  • Secrets managed via environment variables (HashiCorp Vault integration planned)
  • Audit logging for all provisioning actions
  • RBAC with Zero Trust policy enforcement
  • API rate limiting and throttling
  • Container security scanning
  • CI/CD pipeline security (SAST/DAST integration)

[+] Observability

Capability Implementation
Logging Structured JSON logging with zap
Metrics Prometheus metrics at /metrics
Tracing OpenTelemetry distributed tracing
Health Kubernetes probes at /health, /ready, /live
Dashboards Grafana dashboards included

[+] Compliance Frameworks

Built-in support for 20+ frameworks:

Category Frameworks
General CIS, NIST CSF, ISO 27001, PCI-DSS
Cloud AWS Security Best Practice, GCP CIS, Azure MCSB
Healthcare HIPAA, HITRUST
Finance SOX, GLBA, FFIEC
Government FedRAMP, CMMC, NIST 800-53/800-171
AI NIST AI RMF, ISO 42001
Automotive ISO 21434, UN ECE R155, TISAX

[/] Roadmap

Phase 1: Core Platform (Complete)

  • Core API and HTTP handlers
  • GRC abstraction layer (Archer, ServiceNow, PostgreSQL)
  • OPA policy engine integration
  • AI-powered risk analysis (Claude/OpenAI)
  • Multi-cloud provider support patterns
  • Compliance framework engine (20+ frameworks)
  • Structured logging and Prometheus metrics

Phase 2: Security, Remediation & AI Governance (Complete)

  • Wire rate limiting to API routes
  • CI/CD pipeline with security scanning
  • Remediation dispatcher with 10 handlers across 8 domains
  • Tiered execution model (auto-safe / verify / change window)
  • 48-hour rollback state engine
  • Unit tests — 30 packages, 590+ functions (cspm, grc, remediation, ai, compliance, finops, server benchmarks)
  • AI governance module — embedded OPA engine, agent registry, STRIDE/ATLAS threat models
  • Security audit fixes (SEC-001 through SEC-012)
  • Architecture hardening — BOLA fix, N+1 queries, CI pinning
  • JWT authentication middleware (HS256/RS256, JWKS caching)
  • Wire Okta/Entra ID providers into auth flow (config-driven, falls back to mock)
  • RBAC authorization middleware (role-based endpoint access)
  • Handler-level unit tests (31 coverage tests across all endpoints)
  • Integration test suite (12-step lifecycle + 34-subtest RBAC matrix)
  • Merge cspm-aggregator into monorepo (cmd/cspm-aggregator)

Phase 3: IaC, Portal & Workflows (Complete)

  • Multi-cloud Terraform modules (compute, database, redis, network)
  • Rego policy gate for IaC validation (5 policies, 27 rules)
  • Deploy scripts with dry-run-by-default and policy violation gate
  • Container Dockerfiles (frontend nginx + backend Go)
  • Self-service portal UI (React 19 / Vite 7 + shadcn/ui) — deployed to cloudaegis-demo.lvonguyen.com
  • Temporal workflow testing and validation (23 tests, concurrent + lifecycle + error cases)
  • Terraform networking module and staging/prod environments

Phase 4: Risk Intelligence & Attack Path Analysis (Complete)

  • Contextual severity validation engine (environment-aware re-scoring)
  • EPSS scoring integration (FIRST API, batch fetching, 12h cache)
  • CISA KEV catalog integration (auto-refresh, known exploit lookup)
  • GreyNoise integration (API client for IP classification)
  • Attack path computation engine (in-memory BFS + ReactFlow DAG)
  • Toxic combination detection (4 patterns: public storage, IAM+noMFA, internet+CVE, SG+DB)
  • Blast radius computation (account/VPC/transit reachability)
  • False-severity edge case detection (3 FP suppression + 3 FN escalation rules)

Phase 5: FinOps & Reporting (Complete)

  • Cloud cost API integration (AWS Cost Explorer, Azure Cost Management, GCP Billing — multi-cloud aggregator)
  • Cost estimation integration (21-resource lookup table with low/mid/high ranges)
  • Chargeback report generation (GenerateReport + CSV export in finops/chargeback)
  • Compliance reporting dashboard (React frontend at /ops/compliance)
  • Budget alerting (Slack Block Kit + PagerDuty Events API v2 + BudgetMonitor)

[/] Update History

Phase Description
Phase 5: Risk Intelligence + FinOps EPSS/KEV/GreyNoise/HIBP/OTX threat intel, attack path BFS engine + ReactFlow viz, toxic combo detection, blast radius computation, PuppyGraph graph query integration, AWS Bedrock enrichment. FinOps multi-cloud cost aggregation, anomaly detection, chargeback engine, budget alerting
Phase 4: Frontend + QA Hardening Self-service portal (React 19 + Vite 7, 36 routes, 3 role views, dark mode), Cloudflare Pages deploy, investigation board, DSPM classification, kanban remediation pipeline, NLQ bar, demo mode hardening. Multi-pass QA reviews (quality 4.5+, security 4.5+, bugs 4.3+)
Phase 3: IaC + Security Multi-cloud Terraform modules (compute, database, redis, IAM, monitoring, secrets), 5 Rego policies (27 rules), policy gate script, resource-scoped RBAC, integrity hashing, audit logging, rollback encryption (AES-256-GCM), CI enforcement (gosec, Trivy, Codecov)
Phase 2: Remediation + AI Governance 10 remediation handlers across 8 domains, batch executor with dry-run + 48h rollback, AI governance module (embedded OPA, agent registry, STRIDE/ATLAS threat models), JWT auth (HS256/RS256 + JWKS), RBAC middleware, security fixes SEC-001 through SEC-012
Phase 1: Core Platform API server, GRC provider abstraction (Archer, ServiceNow, PostgreSQL), 20+ compliance frameworks, OPA/Rego policy engine, AI provider abstraction (Claude/OpenAI), identity module (Okta + Entra ID), container security, structured logging (zap), PostgreSQL migrations, architecture docs (HLD, DDD, 19 ADRs, DR/BC, 9 runbooks)

[*] License

MIT License - See LICENSE


[+] Contributing

Contributions welcome! Please read CONTRIBUTING.md first.


Note: This is a reference architecture demonstrating enterprise cloud governance patterns. Production deployments require additional hardening, testing, and customization for your organization's specific requirements.

About

Enterprise Cloud Governance Platform - Self-Service Provisioning + Policy-as-Code + FinOps

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors