pricing-engine

An agent-orchestrated dynamic pricing engine for independent hospitality operators.

Boutique hotels and small resorts leave an estimated 15–30% of potential revenue on the table because they price rooms by gut feel. Enterprise revenue-management systems (Duetto, IDeaS) start at $1k+/month and are built for chains with a revenue team. This project closes the gap: a lightweight, explainable, API-first pricing service that a 10-room property can plug into.

Given a property and a stay window, it spins up a small system of cooperating agents to pull competitor rates, local events, and seasonality signals in parallel, assembles a 19-dimensional feature vector, and runs it through a trained XGBoost regressor to return an optimal nightly rate — with a confidence interval and the top drivers behind the recommendation.

POST /price
   │
   ▼
┌──────────────┐   asyncio.gather    ┌───────────────────────────────┐
│ Coordinator  │ ──────────────────▶ │  MarketRatesAgent             │
│              │                     │  EventsAgent                  │──▶ DataProvider
│              │ ◀────────────────── │  SeasonalityAgent             │    (synthetic | SerpAPI)
└──────┬───────┘   AgentResults      └───────────────────────────────┘
       │
       ▼
 FeatureBuilder ─▶ XGBoost.predict ─▶ { rate, CI, reasoning, drivers }

Quickstart

git clone https://github.com/liam-davis/pricing-engine
cd pricing-engine
pip install -r requirements.txt
uvicorn api.main:app --reload

On first boot, the API trains the XGBoost model against two years of synthetic market history (~20s) and caches the artifact to model/artifacts/. Subsequent boots load it directly.

In a second terminal:

python -m scripts.demo

Or hit the endpoint directly:

curl -s http://127.0.0.1:8000/price \
  -H 'content-type: application/json' \
  -d '{
    "property_id": "charleston_p99",
    "location": "charleston",
    "check_in":  "2026-07-18",
    "check_out": "2026-07-19",
    "room_type": "suite",
    "base_rate": 320.0,
    "star_rating": 4.3,
    "capacity": 2
  }' | jq

How it works

For a full walkthrough — system diagram, request lifecycle, module tour, and a guide to swapping in a real data source — see docs/architecture.md. What follows is the short version.

1. Agents

Three agents implement a common PricingAgent protocol (agents/base.py). Each is a self-contained async component with a single responsibility:

Agent	Responsibility	Output
`MarketRatesAgent`	Pull competitor rates for the location + date window, compute distribution stats	`comp_median`, `comp_p25/p75`, `comp_std`, `comp_count`, `occupancy_proxy`
`EventsAgent`	Detect local events (conferences, festivals, holidays) during the stay window	`has_event`, `event_magnitude`, `event_names`
`SeasonalityAgent`	Compute month-of-year seasonality index and YoY demand trend from history	`month_seasonality_index`, `yoy_trend`

The Coordinator (agents/coordinator.py) fans out to all three agents via asyncio.gather with a 3-second per-agent timeout. Partial failures degrade gracefully — if the events agent times out, the system falls back to a documented default and still returns a price. Total market-data blackout (no competitor rates at all) surfaces as a 404 upstream.

2. Features

Every feature in the 19-dimensional vector has a hospitality-pricing rationale:

Feature	Why it matters
`day_of_week`, `is_weekend`	Weekend premium is the strongest repeating signal in leisure markets
`month`, `month_seasonality_index`	Captures yearly demand curve (summer for beach, winter for ski)
`days_until_checkin`	Lead time — late bookings in a tight market command a premium
`stay_length`	Multi-night stays often earn a small discount
`comp_median`, `comp_p25`, `comp_p75`	Anchor to the market; p25/p75 capture spread
`comp_std`, `rate_dispersion`	Tight dispersion = consensus pricing → inelastic demand
`comp_count`	Signals market depth and confidence in the median
`occupancy_proxy`	Tight dispersion + high level → sold-out city
`has_event`, `event_magnitude`	Conference/festival weeks can 2× baseline demand
`yoy_trend`	Captures secular market shifts (e.g. post-COVID recovery)
`star_rating`, `room_capacity`	Property-level quality and size
`base_rate`	The operator's own anchor — the model recommends relative to this

3. Model

XGBoost regressor, trained on three years of synthetic per-city history with per-property nightly rates as the label and lead-time data augmentation (each night emitted at 0, 7, and 30 days out). 600 trees max, depth 6, 5% learning rate, early stopping at 30 rounds on a 20% held-out set.

Full evaluation: docs/model_evaluation.md — holdout RMSE/MAPE on unseen market instances, five scenario sanity checks, and five directional monotonicity checks (including the deliberately inverted "winter > summer" check for the ski market to confirm the model isn't memorizing "summer = high"). Regenerate with python -m scripts.benchmark.

Confidence intervals are computed by evaluating the ensemble at 20 step points across its trees and taking the 15/85 quantile of the resulting predictions — a lightweight alternative to quantile regression that gives the client a plausible range to show the operator.

4. API

Endpoint	Purpose
`POST /price`	The main endpoint. Request: property + stay details. Response: recommended rate, CI, per-agent status, top feature contributions.
`GET /health`	Liveness + `model_loaded` flag.
`GET /model/info`	Training metrics and feature importances for the loaded artifact.

Every response includes:

recommended_rate with an 85%-band confidence_interval
reasoning.market_context — what the competitor landscape looked like
reasoning.event_factors — which events drove a lift
reasoning.seasonality_factors — where in the yearly curve the stay falls
reasoning.top_feature_contributions — SHAP-style per-feature pull on this prediction
agents_consulted[].ok / latency_ms / error — full observability into the agent layer

Modeling the market

Because this ships with a synthetic provider by default, the generator is the interface the model is trained against — which means its realism matters. The goal isn't "numbers that look like prices" but a faithful reduction of the mechanics a revenue manager actually tracks. Concretely:

Competitor tiers. Each synthetic market is populated with a realistic mix of budget (~35%), midscale (~45%), and luxury (~20%) properties. Each tier has its own rate multiplier (0.62× / 1.00× / 1.85× of the market baseline), noise scale, event elasticity, promo frequency, and last-minute-booking sensitivity. A reviewer looking at a competitor snapshot sees a proper rate ladder — not 18 clones of the same hotel.

Market tightness. A latent per-(city, night) variable in (0, 1) drives the whole system. It's a logistic of seasonality + day-of-week + holiday + named-event signal. Tightness pushes baseline rates up, and simultaneously pushes within-tier noise down, so that high-demand nights show up as both higher rates and compressed p25–p75 bands.

Rate-ladder compression. On top of the tightness mechanism, tight markets pull the budget end of the ladder up disproportionately (_TIER_COMPRESSION): budget gets a +38% lift at tightness=1 while luxury is nearly unaffected. This is how real STAR-reported markets behave during compression — the $99 room disappears before the $500 suite does.

Lead-time curves. Applied at provider query time, not baked into history. The curve is flat beyond 30 days, mild inside 14, steep inside 7, and aggressive inside 48 hours. In soft markets it inverts into a modest walk-in discount (hotels rarely fire-sale — they protect ADR). Per-tier leadtime_sensitivity scales the deviation: budget barely moves, luxury moves most, matching how automated revenue management behaves at the top of the ladder.

Convention-city DOW profile. Austin and Charleston carry is_convention=True on top of is_leisure=True, giving them a bimodal weekly pattern — Fri–Sat leisure peaks and Tue–Wed business peaks. Pure leisure (Aspen) gets a single Fri–Sat bump. Pure business cities would get the inverse.

Occupancy proxy — the signal, not the spread. The proxy is computed from the p25/median ratio, not the full p25–p75 band. Widening at the top of the ladder (luxury spiking during events) and widening at the bottom (a flash promo) look identical if you only look at dispersion — but they mean opposite things for supply. Compression is the canonical sold-out signature, so that's what we measure. Training and serving use the exact same formula (occupancy_proxy_from_market) so the feature distribution stays consistent.

Location resolution. A single data/locations.py registry normalizes free-form inputs ("Austin, TX", "atx", "austin texas" → "austin") and enumerates known markets in the error payload when a caller sends something unresolvable. Every provider uses it, so swapping in a real data source inherits this layer for free.

Swapping in real data

The entire data surface is a three-method abstract class: data/providers/base.py. Any real source — SerpAPI's google_hotels engine, a Booking.com affiliate feed, Google Hotels, or your own warehouse — is a drop-in replacement for SyntheticProvider. See data/providers/serpapi.py for a stub showing the contract.

To go live with SerpAPI:

# api/main.py
from data.providers.serpapi import SerpAPIProvider
provider = SerpAPIProvider()  # instead of SyntheticProvider()

Nothing else in the repo needs to change.

Tests

pytest

Covered:

Synthetic generator produces realistic distributions (weekend > weekday in leisure cities).
Feature engineering is deterministic and preserves vector order.
Coordinator merges successful agent results, applies documented defaults on partial failure, enforces per-agent timeouts, and surfaces total market-data blackout.
API smoke test: /health comes up, /price returns a sensible rate with all three agents reporting, invalid date ranges get a 422, unknown locations get a 404.

Project structure

api/               FastAPI app, routes, pydantic schemas
agents/            PricingAgent protocol + coordinator + 3 concrete agents
data/              DataProvider ABC + synthetic provider + SerpAPI stub
features/          Feature-vector builder (pure function)
model/             Training entrypoint, inference wrapper, cached artifact
scripts/           Three-scenario demo harness
tests/             Smoke + unit tests

Docker

docker build -t pricing-engine .
docker run -p 8000:8000 pricing-engine

Roadmap

SHAP explanations returned verbatim in the response for full per-prediction attribution.
Contextual bandit on top of the regressor to A/B-test small rate perturbations and learn from realised bookings.
Rolling retrain — weekly cron that pulls the last 30 days of observed bookings and fine-tunes.
Multi-property pooling — share signal across properties in the same city to bootstrap new operators from day one.
Stayza integration — surface this engine as a premium feature in Stayza, the AI-native PMS I'm building for independent hospitality.

About

Built by Liam Davis, founder of Stayza — an AI-native property management platform for independent hotels and small-scale resorts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pricing-engine

Quickstart

How it works

1. Agents

2. Features

3. Model

4. API

Modeling the market

Swapping in real data

Tests

Project structure

Docker

Roadmap

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
agents		agents
api		api
data		data
docs		docs
features		features
model		model
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

pricing-engine

Quickstart

How it works

1. Agents

2. Features

3. Model

4. API

Modeling the market

Swapping in real data

Tests

Project structure

Docker

Roadmap

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages