-
Notifications
You must be signed in to change notification settings - Fork 172
Usage‐Based Billing Architecture Guide
If you're building a SaaS or AI product, you've probably hit the point where flat-rate subscriptions don't cut it anymore.
Customers want to pay for what they use, and your billing system needs to handle that without becoming a maintenance nightmare.
This guide breaks down the architecture behind usage-based billing systems, walks through the core design patterns, and shows how Flexprice implements each layer so you don't have to build it from scratch.
Usage-based billing (UBB) charges customers based on actual consumption rather than a fixed monthly fee. Think API calls, compute hours, tokens processed, storage consumed, or messages sent. The billing amount is derived from metered usage data, not a static price tag.
This model is now standard across infrastructure and AI companies. AWS, Snowflake, OpenAI, and Twilio all operate on some form of consumption pricing. The reason is straightforward: it aligns revenue with the value customers receive.
From an engineering perspective, UBB introduces complexity that subscription billing doesn't have. You need to:
- Ingest and store high-volume event streams in real time
- Aggregate raw events into billable metrics
- Apply tiered, volume, or graduated pricing logic
- Generate accurate invoices at the end of each billing cycle
- Handle edge cases like retries, duplicates, late-arriving events, and mid-cycle plan changes
Let's look at how this works architecturally.
A well-designed UBB system decomposes into four distinct layers. Each layer has a clear responsibility, and the boundaries between them matter, especially at scale.
This is where raw usage data enters the system. Every billable action in your application (an API request, a model inference call, a file upload) emits an event to the billing pipeline.
A typical event payload looks like this:
{
"event_name": "api.request",
"external_customer_id": "cust_8xk29f",
"properties": {
"tokens": 1500,
"model": "gpt-4",
"region": "us-east-1"
},
"timestamp": "2026-04-04T14:30:00Z",
"source": "api-gateway"
}Key design decisions at this layer:
- Async over sync. Event ingestion should never block your application's critical path. Fire-and-forget with at-least-once delivery guarantees is the standard pattern.
-
Idempotency. Events may be retried. Include a unique
event_idso the pipeline can deduplicate. -
Schema flexibility. The
propertiesbag should be schemaless. You'll want to add new dimensions (region, model version, tier) without migrating your event schema. - Throughput. This layer needs to handle traffic spikes without dropping events. A message broker like Kafka sits between your application and the processing pipeline to absorb bursts.
At scale, you're looking at millions of events per day. The ingestion layer must be horizontally scalable and decoupled from downstream processing.
Raw events aren't directly billable. You need to aggregate them into metrics that map to your pricing model. This is where the metering logic lives.
Aggregation typically runs on a columnar OLAP database optimized for fast analytical queries, such as ClickHouse, Druid, or BigQuery. The aggregation layer reads from the event store and computes billable metrics over configurable time windows (hourly, daily, or per billing cycle).
A critical subtlety: aggregation must be idempotent and replayable. If a batch of late-arriving events lands after the initial aggregation window, the system needs to recompute without double-counting.
Rating takes aggregated usage and applies pricing rules to compute the monetary amount owed. This is where your pricing model is codified.
Common pricing structures:
- Per-unit: $0.002 per API call
- Tiered: First 10,000 calls at $0.002, next 90,000 at $0.0015, remainder at $0.001
- Volume: Total volume determines the per-unit rate (all units priced at the tier they fall into)
- Package: $10 per block of 1,000 calls
- Graduated: Each tier is priced independently (tiered but cumulative)
The rating engine must support per-customer overrides, promotional pricing, and mid-cycle plan changes. In practice, this layer is where most billing bugs hide. Off-by-one errors in tier boundaries, timezone mismatches in billing cycles, and rounding inconsistencies are all common failure modes.
The final layer assembles rated line items into an invoice, applies credits or prepayments, calculates taxes, and triggers payment collection.
Invoices need to be:
- Itemized. Customers should see exactly what they're paying for, not a single opaque line item.
- Auditable. Every line item should trace back to the underlying aggregated metrics and raw events.
- Idempotent. Regenerating an invoice for the same period should produce the same result.
This layer also handles payment provider integration (Stripe, Razorpay), webhook notifications, and retry logic for failed charges.
Before jumping into implementation, here are the failure modes that catch most teams:
Clock skew and timezone hell. Your application servers, event pipeline, and billing engine all need to agree on what "this billing period" means. Store all timestamps in UTC. Define billing periods server-side, not based on client clocks.
Aggregation drift. If your aggregation runs on a snapshot and late events arrive after the window closes, you'll under-bill. Design for recomputation. Your aggregation layer should be able to reprocess a billing period and produce a corrected result without manual intervention.
Retry amplification. A transient 500 from your ingestion endpoint causes your SDK to retry. If the server processed the first request but failed to acknowledge it, you now have a duplicate event. Every event must carry an idempotency key, and your pipeline must deduplicate before aggregation.
Price change propagation. When you update your pricing tier boundaries mid-cycle, which events get the old price and which get the new one? Most systems apply the price at invoice generation time (not ingestion time), but you need to decide this upfront and make it explicit in your rating engine.
Flexprice is an open-source billing platform built specifically for this architecture. Here's how it maps to the four layers above:
Flexprice exposes a REST API for event ingestion with SDKs in Go, Python, and JavaScript:
curl -X POST https://us.api.flexprice.io/v1/events \
-H "Content-Type: application/json" \
-H "x-api-key: $FLEXPRICE_API_KEY" \
-d '{
"event_name": "api.request",
"external_customer_id": "cust_8xk29f",
"properties": { "tokens": 1500 },
"source": "api-gateway"
}'Under the hood, the API service publishes events to Kafka, which decouples ingestion from processing. The Worker service consumes from Kafka and writes to ClickHouse for aggregation. This architecture handles millions of events with minimal latency. Ingestion returns 202 Accepted immediately, and processing happens asynchronously.
For bulk workloads, there's a dedicated endpoint:
POST /v1/events/bulk # Up to 1,000 events per request
Rate limits are 1,000 single events/minute and 100 bulk requests/minute. These are configurable on self-hosted deployments.
Base URLs: US region: https://us.api.flexprice.io/v1 · India region: https://api.cloud.flexprice.io/v1
Flexprice uses ClickHouse as its OLAP engine for fast aggregation. When you define a metered feature (in the dashboard or via API), you configure:
-
Event name: The event to track (e.g.,
api.request) -
Aggregation function:
COUNT,SUM,AVERAGE,MAX,COUNT_UNIQUE,LATEST,SUM_WITH_MULTIPLIER, orWEIGHTED_SUM -
Aggregation field: Which property to aggregate on (e.g.,
tokens) - Reset behavior: Periodic (resets each billing cycle) or cumulative
Flexprice supports event filters so you can scope metrics. For example, you can count only api.request events where region = "us-east-1". The aggregation is replayable: if events arrive late, metrics are recomputed correctly.
Pricing plans in Flexprice support flat-rate, per-unit, tiered, volume, and package models, including per-customer overrides. The rating engine is powered by Temporal workflows, which handle billing cycle orchestration, invoice generation, and retry logic with built-in durability guarantees.
| Component | Role |
|---|---|
| Go API Service | Event ingestion, CRUD operations, and authentication |
| Kafka | Event streaming, decoupling ingestion from processing |
| ClickHouse | OLAP engine for event storage and aggregation |
| PostgreSQL | Transactional data: customers, subscriptions, plans |
| Temporal | Workflow orchestration: billing cycles, invoice generation |
Self-hosting is a single command:
git clone https://github.com/flexprice/flexprice
cd flexprice
make dev-setupThis spins up Postgres, Kafka, ClickHouse, Temporal, runs migrations, and starts the API at localhost:8080.
Here's the honest breakdown of what you're signing up for if you build this in-house:
| Concern | Build In-House | Use Flexprice |
|---|---|---|
| Event pipeline | Build Kafka consumers, handle backpressure, dedup | SDK + API call |
| Aggregation engine | Build and maintain ClickHouse queries, handle late events | Configure via dashboard or API |
| Pricing logic | Code every tier/volume/package model, handle edge cases | Declarative plan configuration |
| Invoice generation | Build PDF generation, line-item calculation, and tax logic | Automated with Temporal workflows |
| Payment integration | Build Stripe/Razorpay adapters, handle retries | Pre-built integrations |
| Idempotency & auditability | Design from scratch | Built-in event debugger and tracing |
| Time to production | 3-6 months | Days |
If billing is your core product, build it. If billing is infrastructure that supports your actual product, you're better off using a purpose-built system.
API Reference: Full endpoint documentation.