Skip to content

lmdav/pricing-engine

Repository files navigation

pricing-engine

An agent-orchestrated dynamic pricing engine for independent hospitality operators.

Boutique hotels and small resorts leave an estimated 15–30% of potential revenue on the table because they price rooms by gut feel. Enterprise revenue-management systems (Duetto, IDeaS) start at $1k+/month and are built for chains with a revenue team. This project closes the gap: a lightweight, explainable, API-first pricing service that a 10-room property can plug into.

Given a property and a stay window, it spins up a small system of cooperating agents to pull competitor rates, local events, and seasonality signals in parallel, assembles a 19-dimensional feature vector, and runs it through a trained XGBoost regressor to return an optimal nightly rate — with a confidence interval and the top drivers behind the recommendation.

POST /price
   │
   ▼
┌──────────────┐   asyncio.gather    ┌───────────────────────────────┐
│ Coordinator  │ ──────────────────▶ │  MarketRatesAgent             │
│              │                     │  EventsAgent                  │──▶ DataProvider
│              │ ◀────────────────── │  SeasonalityAgent             │    (synthetic | SerpAPI)
└──────┬───────┘   AgentResults      └───────────────────────────────┘
       │
       ▼
 FeatureBuilder ─▶ XGBoost.predict ─▶ { rate, CI, reasoning, drivers }

Quickstart

git clone https://github.com/liam-davis/pricing-engine
cd pricing-engine
pip install -r requirements.txt
uvicorn api.main:app --reload

On first boot, the API trains the XGBoost model against two years of synthetic market history (~20s) and caches the artifact to model/artifacts/. Subsequent boots load it directly.

In a second terminal:

python -m scripts.demo

Or hit the endpoint directly:

curl -s http://127.0.0.1:8000/price \
  -H 'content-type: application/json' \
  -d '{
    "property_id": "charleston_p99",
    "location": "charleston",
    "check_in":  "2026-07-18",
    "check_out": "2026-07-19",
    "room_type": "suite",
    "base_rate": 320.0,
    "star_rating": 4.3,
    "capacity": 2
  }' | jq

How it works

For a full walkthrough — system diagram, request lifecycle, module tour, and a guide to swapping in a real data source — see docs/architecture.md. What follows is the short version.

1. Agents

Three agents implement a common PricingAgent protocol (agents/base.py). Each is a self-contained async component with a single responsibility:

Agent Responsibility Output
MarketRatesAgent Pull competitor rates for the location + date window, compute distribution stats comp_median, comp_p25/p75, comp_std, comp_count, occupancy_proxy
EventsAgent Detect local events (conferences, festivals, holidays) during the stay window has_event, event_magnitude, event_names
SeasonalityAgent Compute month-of-year seasonality index and YoY demand trend from history month_seasonality_index, yoy_trend

The Coordinator (agents/coordinator.py) fans out to all three agents via asyncio.gather with a 3-second per-agent timeout. Partial failures degrade gracefully — if the events agent times out, the system falls back to a documented default and still returns a price. Total market-data blackout (no competitor rates at all) surfaces as a 404 upstream.

2. Features

Every feature in the 19-dimensional vector has a hospitality-pricing rationale:

Feature Why it matters
day_of_week, is_weekend Weekend premium is the strongest repeating signal in leisure markets
month, month_seasonality_index Captures yearly demand curve (summer for beach, winter for ski)
days_until_checkin Lead time — late bookings in a tight market command a premium
stay_length Multi-night stays often earn a small discount
comp_median, comp_p25, comp_p75 Anchor to the market; p25/p75 capture spread
comp_std, rate_dispersion Tight dispersion = consensus pricing → inelastic demand
comp_count Signals market depth and confidence in the median
occupancy_proxy Tight dispersion + high level → sold-out city
has_event, event_magnitude Conference/festival weeks can 2× baseline demand
yoy_trend Captures secular market shifts (e.g. post-COVID recovery)
star_rating, room_capacity Property-level quality and size
base_rate The operator's own anchor — the model recommends relative to this

3. Model

XGBoost regressor, trained on three years of synthetic per-city history with per-property nightly rates as the label and lead-time data augmentation (each night emitted at 0, 7, and 30 days out). 600 trees max, depth 6, 5% learning rate, early stopping at 30 rounds on a 20% held-out set.

Full evaluation: docs/model_evaluation.md — holdout RMSE/MAPE on unseen market instances, five scenario sanity checks, and five directional monotonicity checks (including the deliberately inverted "winter > summer" check for the ski market to confirm the model isn't memorizing "summer = high"). Regenerate with python -m scripts.benchmark.

Confidence intervals are computed by evaluating the ensemble at 20 step points across its trees and taking the 15/85 quantile of the resulting predictions — a lightweight alternative to quantile regression that gives the client a plausible range to show the operator.

4. API

Endpoint Purpose
POST /price The main endpoint. Request: property + stay details. Response: recommended rate, CI, per-agent status, top feature contributions.
GET /health Liveness + model_loaded flag.
GET /model/info Training metrics and feature importances for the loaded artifact.

Every response includes:

  • recommended_rate with an 85%-band confidence_interval
  • reasoning.market_context — what the competitor landscape looked like
  • reasoning.event_factors — which events drove a lift
  • reasoning.seasonality_factors — where in the yearly curve the stay falls
  • reasoning.top_feature_contributions — SHAP-style per-feature pull on this prediction
  • agents_consulted[].ok / latency_ms / error — full observability into the agent layer

Modeling the market

Because this ships with a synthetic provider by default, the generator is the interface the model is trained against — which means its realism matters. The goal isn't "numbers that look like prices" but a faithful reduction of the mechanics a revenue manager actually tracks. Concretely:

Competitor tiers. Each synthetic market is populated with a realistic mix of budget (~35%), midscale (~45%), and luxury (~20%) properties. Each tier has its own rate multiplier (0.62× / 1.00× / 1.85× of the market baseline), noise scale, event elasticity, promo frequency, and last-minute-booking sensitivity. A reviewer looking at a competitor snapshot sees a proper rate ladder — not 18 clones of the same hotel.

Market tightness. A latent per-(city, night) variable in (0, 1) drives the whole system. It's a logistic of seasonality + day-of-week + holiday + named-event signal. Tightness pushes baseline rates up, and simultaneously pushes within-tier noise down, so that high-demand nights show up as both higher rates and compressed p25–p75 bands.

Rate-ladder compression. On top of the tightness mechanism, tight markets pull the budget end of the ladder up disproportionately (_TIER_COMPRESSION): budget gets a +38% lift at tightness=1 while luxury is nearly unaffected. This is how real STAR-reported markets behave during compression — the $99 room disappears before the $500 suite does.

Lead-time curves. Applied at provider query time, not baked into history. The curve is flat beyond 30 days, mild inside 14, steep inside 7, and aggressive inside 48 hours. In soft markets it inverts into a modest walk-in discount (hotels rarely fire-sale — they protect ADR). Per-tier leadtime_sensitivity scales the deviation: budget barely moves, luxury moves most, matching how automated revenue management behaves at the top of the ladder.

Convention-city DOW profile. Austin and Charleston carry is_convention=True on top of is_leisure=True, giving them a bimodal weekly pattern — Fri–Sat leisure peaks and Tue–Wed business peaks. Pure leisure (Aspen) gets a single Fri–Sat bump. Pure business cities would get the inverse.

Occupancy proxy — the signal, not the spread. The proxy is computed from the p25/median ratio, not the full p25–p75 band. Widening at the top of the ladder (luxury spiking during events) and widening at the bottom (a flash promo) look identical if you only look at dispersion — but they mean opposite things for supply. Compression is the canonical sold-out signature, so that's what we measure. Training and serving use the exact same formula (occupancy_proxy_from_market) so the feature distribution stays consistent.

Location resolution. A single data/locations.py registry normalizes free-form inputs ("Austin, TX", "atx", "austin texas""austin") and enumerates known markets in the error payload when a caller sends something unresolvable. Every provider uses it, so swapping in a real data source inherits this layer for free.

Swapping in real data

The entire data surface is a three-method abstract class: data/providers/base.py. Any real source — SerpAPI's google_hotels engine, a Booking.com affiliate feed, Google Hotels, or your own warehouse — is a drop-in replacement for SyntheticProvider. See data/providers/serpapi.py for a stub showing the contract.

To go live with SerpAPI:

# api/main.py
from data.providers.serpapi import SerpAPIProvider
provider = SerpAPIProvider()  # instead of SyntheticProvider()

Nothing else in the repo needs to change.

Tests

pytest

Covered:

  • Synthetic generator produces realistic distributions (weekend > weekday in leisure cities).
  • Feature engineering is deterministic and preserves vector order.
  • Coordinator merges successful agent results, applies documented defaults on partial failure, enforces per-agent timeouts, and surfaces total market-data blackout.
  • API smoke test: /health comes up, /price returns a sensible rate with all three agents reporting, invalid date ranges get a 422, unknown locations get a 404.

Project structure

api/               FastAPI app, routes, pydantic schemas
agents/            PricingAgent protocol + coordinator + 3 concrete agents
data/              DataProvider ABC + synthetic provider + SerpAPI stub
features/          Feature-vector builder (pure function)
model/             Training entrypoint, inference wrapper, cached artifact
scripts/           Three-scenario demo harness
tests/             Smoke + unit tests

Docker

docker build -t pricing-engine .
docker run -p 8000:8000 pricing-engine

Roadmap

  • SHAP explanations returned verbatim in the response for full per-prediction attribution.
  • Contextual bandit on top of the regressor to A/B-test small rate perturbations and learn from realised bookings.
  • Rolling retrain — weekly cron that pulls the last 30 days of observed bookings and fine-tunes.
  • Multi-property pooling — share signal across properties in the same city to bootstrap new operators from day one.
  • Stayza integration — surface this engine as a premium feature in Stayza, the AI-native PMS I'm building for independent hospitality.

About

Built by Liam Davis, founder of Stayza — an AI-native property management platform for independent hotels and small-scale resorts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors