Usage‐Based Billing Architecture Guide

If you're building a SaaS or AI product, you've probably hit the point where flat-rate subscriptions don't cut it anymore.

Customers want to pay for what they use, and your billing system needs to handle that without becoming a maintenance nightmare.

This guide breaks down the architecture behind usage-based billing systems, walks through the core design patterns, and shows how Flexprice implements each layer so you don't have to build it from scratch.

What Is Usage-Based Billing?

Usage-based billing (UBB) charges customers based on actual consumption rather than a fixed monthly fee. Think API calls, compute hours, tokens processed, storage consumed, or messages sent. The billing amount is derived from metered usage data, not a static price tag.

This model is now standard across infrastructure and AI companies. AWS, Snowflake, OpenAI, and Twilio all operate on some form of consumption pricing. The reason is straightforward: it aligns revenue with the value customers receive.

From an engineering perspective, UBB introduces complexity that subscription billing doesn't have. You need to:

Ingest and store high-volume event streams in real time
Aggregate raw events into billable metrics
Apply tiered, volume, or graduated pricing logic
Generate accurate invoices at the end of each billing cycle
Handle edge cases like retries, duplicates, late-arriving events, and mid-cycle plan changes

Let's look at how this works architecturally.

The Four Layers of a Usage-Based Billing Pipeline

A well-designed UBB system decomposes into four distinct layers. Each layer has a clear responsibility, and the boundaries between them matter, especially at scale.

Layer 1: Event Ingestion

This is where raw usage data enters the system. Every billable action in your application (an API request, a model inference call, a file upload) emits an event to the billing pipeline.

A typical event payload looks like this:

{
  "event_name": "api.request",
  "external_customer_id": "cust_8xk29f",
  "properties": {
    "tokens": 1500,
    "model": "gpt-4",
    "region": "us-east-1"
  },
  "timestamp": "2026-04-04T14:30:00Z",
  "source": "api-gateway"
}

Key design decisions at this layer:

Async over sync. Event ingestion should never block your application's critical path. Fire-and-forget with at-least-once delivery guarantees is the standard pattern.
Idempotency. Events may be retried. Include a unique event_id so the pipeline can deduplicate.
Schema flexibility. The properties bag should be schemaless. You'll want to add new dimensions (region, model version, tier) without migrating your event schema.
Throughput. This layer needs to handle traffic spikes without dropping events. A message broker like Kafka sits between your application and the processing pipeline to absorb bursts.

At scale, you're looking at millions of events per day. The ingestion layer must be horizontally scalable and decoupled from downstream processing.

Layer 2: Aggregation

Raw events aren't directly billable. You need to aggregate them into metrics that map to your pricing model. This is where the metering logic lives.

Aggregation typically runs on a columnar OLAP database optimized for fast analytical queries, such as ClickHouse, Druid, or BigQuery. The aggregation layer reads from the event store and computes billable metrics over configurable time windows (hourly, daily, or per billing cycle).

A critical subtlety: aggregation must be idempotent and replayable. If a batch of late-arriving events lands after the initial aggregation window, the system needs to recompute without double-counting.

Layer 3: Rating (Pricing)

Rating takes aggregated usage and applies pricing rules to compute the monetary amount owed. This is where your pricing model is codified.

Common pricing structures:

Per-unit: $0.002 per API call
Tiered: First 10,000 calls at $0.002, next 90,000 at $0.0015, remainder at $0.001
Volume: Total volume determines the per-unit rate (all units priced at the tier they fall into)
Package: $10 per block of 1,000 calls
Graduated: Each tier is priced independently (tiered but cumulative)

The rating engine must support per-customer overrides, promotional pricing, and mid-cycle plan changes. In practice, this layer is where most billing bugs hide. Off-by-one errors in tier boundaries, timezone mismatches in billing cycles, and rounding inconsistencies are all common failure modes.

Layer 4: Invoicing

The final layer assembles rated line items into an invoice, applies credits or prepayments, calculates taxes, and triggers payment collection.

Invoices need to be:

Itemized. Customers should see exactly what they're paying for, not a single opaque line item.
Auditable. Every line item should trace back to the underlying aggregated metrics and raw events.
Idempotent. Regenerating an invoice for the same period should produce the same result.

This layer also handles payment provider integration (Stripe, Razorpay), webhook notifications, and retry logic for failed charges.

Common Pitfalls Engineers Hit

Before jumping into implementation, here are the failure modes that catch most teams:

Clock skew and timezone hell. Your application servers, event pipeline, and billing engine all need to agree on what "this billing period" means. Store all timestamps in UTC. Define billing periods server-side, not based on client clocks.

Aggregation drift. If your aggregation runs on a snapshot and late events arrive after the window closes, you'll under-bill. Design for recomputation. Your aggregation layer should be able to reprocess a billing period and produce a corrected result without manual intervention.

Retry amplification. A transient 500 from your ingestion endpoint causes your SDK to retry. If the server processed the first request but failed to acknowledge it, you now have a duplicate event. Every event must carry an idempotency key, and your pipeline must deduplicate before aggregation.

Price change propagation. When you update your pricing tier boundaries mid-cycle, which events get the old price and which get the new one? Most systems apply the price at invoice generation time (not ingestion time), but you need to decide this upfront and make it explicit in your rating engine.

How Flexprice Implements Each Layer

Flexprice is an open-source billing platform built specifically for this architecture. Here's how it maps to the four layers above:

Event Ingestion

Flexprice exposes a REST API for event ingestion with SDKs in Go, Python, and JavaScript:

curl -X POST https://us.api.flexprice.io/v1/events \
  -H "Content-Type: application/json" \
  -H "x-api-key: $FLEXPRICE_API_KEY" \
  -d '{
    "event_name": "api.request",
    "external_customer_id": "cust_8xk29f",
    "properties": { "tokens": 1500 },
    "source": "api-gateway"
  }'

Under the hood, the API service publishes events to Kafka, which decouples ingestion from processing. The Worker service consumes from Kafka and writes to ClickHouse for aggregation. This architecture handles millions of events with minimal latency. Ingestion returns 202 Accepted immediately, and processing happens asynchronously.

For bulk workloads, there's a dedicated endpoint:

POST /v1/events/bulk   # Up to 1,000 events per request

Rate limits are 1,000 single events/minute and 100 bulk requests/minute. These are configurable on self-hosted deployments.

Base URLs: US region: https://us.api.flexprice.io/v1 · India region: https://api.cloud.flexprice.io/v1

Aggregation

Flexprice uses ClickHouse as its OLAP engine for fast aggregation. When you define a metered feature (in the dashboard or via API), you configure:

Event name: The event to track (e.g., api.request)
Aggregation function: COUNT, SUM, AVERAGE, MAX, COUNT_UNIQUE, LATEST, SUM_WITH_MULTIPLIER, or WEIGHTED_SUM
Aggregation field: Which property to aggregate on (e.g., tokens)
Reset behavior: Periodic (resets each billing cycle) or cumulative

Flexprice supports event filters so you can scope metrics. For example, you can count only api.request events where region = "us-east-1". The aggregation is replayable: if events arrive late, metrics are recomputed correctly.

Rating & Invoicing

Pricing plans in Flexprice support flat-rate, per-unit, tiered, volume, and package models, including per-customer overrides. The rating engine is powered by Temporal workflows, which handle billing cycle orchestration, invoice generation, and retry logic with built-in durability guarantees.

Infrastructure Stack

Component	Role
Go API Service	Event ingestion, CRUD operations, and authentication
Kafka	Event streaming, decoupling ingestion from processing
ClickHouse	OLAP engine for event storage and aggregation
PostgreSQL	Transactional data: customers, subscriptions, plans
Temporal	Workflow orchestration: billing cycles, invoice generation

Self-hosting is a single command:

git clone https://github.com/flexprice/flexprice
cd flexprice
make dev-setup

This spins up Postgres, Kafka, ClickHouse, Temporal, runs migrations, and starts the API at localhost:8080.

Build vs. Buy: The Engineering Trade-Off

Here's the honest breakdown of what you're signing up for if you build this in-house:

Concern	Build In-House	Use Flexprice
Event pipeline	Build Kafka consumers, handle backpressure, dedup	SDK + API call
Aggregation engine	Build and maintain ClickHouse queries, handle late events	Configure via dashboard or API
Pricing logic	Code every tier/volume/package model, handle edge cases	Declarative plan configuration
Invoice generation	Build PDF generation, line-item calculation, and tax logic	Automated with Temporal workflows
Payment integration	Build Stripe/Razorpay adapters, handle retries	Pre-built integrations
Idempotency & auditability	Design from scratch	Built-in event debugger and tracing
Time to production	3-6 months	Days

If billing is your core product, build it. If billing is infrastructure that supports your actual product, you're better off using a purpose-built system.

API Reference: Full endpoint documentation.

Try it yourself → flexprice.io/pricing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage‐Based Billing Architecture Guide

What Is Usage-Based Billing?

The Four Layers of a Usage-Based Billing Pipeline

Layer 1: Event Ingestion

Layer 2: Aggregation

Layer 3: Rating (Pricing)

Layer 4: Invoicing

Common Pitfalls Engineers Hit

How Flexprice Implements Each Layer

Event Ingestion

Aggregation

Rating & Invoicing

Infrastructure Stack

Build vs. Buy: The Engineering Trade-Off

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally