Portfolio demo, not a commercial product. Voyager is a technical demonstration of an agentic AI travel planning pattern, built as a portfolio piece to show engineering judgment, not to sell trips. No bookings happen. No payments flow. No user data is shared beyond the API calls required to fetch live search results. The business model, unit economics, and strategic weaknesses of the concept are analyzed honestly in the 2026-04-06 Criticism audit, which is published in this repo as a deliberate artifact, not hidden.
One-line stack: Express 5 + TypeScript; Next.js 15 + React 19; Postgres on Neon; Redis; Claude Sonnet 4 via Anthropic SDK; SerpApi (Google Flights + Google Hotels) and Google Places for live travel data.
I audited my own work on 2026-04-06 with a set of eight autonomous specialist reviewers (engineering, security, design, UX, marketing, financial, legal, criticism). The reports are in docs/audits/ and are worth reading if you are evaluating this as a demo.
- Engineering (CTO) -- architecture, test coverage, operational basics, bug-fix discipline retrospective
- Security (CISO) -- auth, LLM key handling, prompt injection surface, dependency CVEs
- Design (CDO) -- visual system, typography, components, motion, responsive
- UX (CXO) -- user story coverage table, destructive action guardrails, accessibility
- Marketing (CMO) -- positioning, competitive analysis, copy quality
- Financial (CFO) -- unit economics, spending caps, burn rate, blast radius
- Legal & Compliance -- missing documents checklist, DPA status, marketing claim substantiation
- Criticism (Devil's Advocate) -- the brutal truth about whether this thing should exist
Consolidated triage with severity and fix queue: docs/audits/2026-04-06-triage.md. Rolling log of deferred issues: ISSUES.md.
Quoting the Brutal Truth section: Voyager is a technically impressive demo of an agentic tool-use loop, wrapped around a product nobody needs and nobody will pay for as a standalone consumer service. The unit economics do not work without a booking path, the moat vs. ChatGPT-plus-browsing does not exist, and there are already a dozen well-funded agentic travel startups chasing the same space. This is not hidden; it is the honest assessment, published in the repo, because the point of the demo is to show strong engineering taste, not to pretend the business case is strong.
Voyager is a full-stack application that demonstrates agentic tool-use loops -- the capstone pattern in a portfolio of eight progressive AI applications. Users describe a trip in natural language (destination, dates, budget, preferences), and an AI agent autonomously searches live flight, hotel, and experience APIs, reasons about budget constraints between each step, and assembles a complete itinerary, all streamed to the frontend in real time.
Unlike simple chatbot wrappers, the agent makes 3 to 8 sequential tool calls per turn, examining results, adjusting strategy, and proactively suggesting alternatives when a selection exceeds the budget. This is the key differentiator: the AI is not just answering questions; it is acting on the user's behalf across multiple external systems.
The agent loop is the marquee feature, but it is not the part that took the most judgment. The decisions a less thoughtful engineer would skip:
Self-audit as a published artifact, not a private exercise. The 8 specialist audits in docs/audits/ were not commissioned by anyone. I ran them on myself and published the results, including the Criticism audit that argues the business case is weak. Saying so in the repo is the cost of intellectual honesty; pretending otherwise would be the recognizable pattern.
Test-first bug fixing, enforced by a commit hook. lefthook.yml blocks any fix:/bug:/hotfix: commit whose staged diff contains no test file. The 2026-04-06 retrospective measured a 68.6 percent violation rate across the first 51 fix commits even after the rule was codified in plain English; the only thing that changed behavior was promoting the warning to a blocking hook. Discipline lives in the toolchain, not in the wiki.
LLM output is treated as data, not as ground truth. calculate_remaining_budget ignores the agent-supplied costs when a trip context is available and recomputes from the DB (getActualCostsForTrip). The agent has been observed quoting cheaper numbers than the user actually selected; the DB query is the source of truth. The same posture drives the plan-card interest allowlist (SEC-03), the Zod-validated planConfirmation body with hard caps on payload size (SEC-04), and the tool-input allowlist that prevents prompt-coerced SerpApi scraping.
Data-model invariants over ad-hoc fixes. After a 9-commit fix storm in 85 minutes on the ChatBox surface, ChatBox.invariants.test.tsx was added as the only place to test the streaming-overlay merge rules. Every subsequent ChatBox fix extends that spec rather than spawning a new ad-hoc test, so symptom-level regressions are structurally impossible.
Cost containment in the toolchain. SerpApi has a 250-request monthly cap on the free tier; the quota counter (serpApiQuota.service.ts) blocks the loop at 200 to leave headroom, and every successful response is cached for 6 hours so the same flight search does not burn quota across reloads. The destinations JSON asset has a build-time accessSync smoke because the production crash on 2026-04-04 was traced to tsc not transitively copying JSON, and the team learned that lesson twice before the smoke landed.
The repo is meant to read like a system someone took seriously, not a portfolio piece tuned to look impressive on first scan.
Users interact through a chat interface. A message like "Plan a week in Barcelona for two, $4,000 budget, we love food and architecture" kicks off a multi-step agent loop that searches flights, hotels, and experiences without further prompting.
The heart of the application. When the user sends a message:
- The server builds a message array (system prompt + conversation history + current message) and sends it to Claude with five tool definitions.
- Claude responds with one or more
tool_useblocks (e.g.,search_flights). - The server executes each tool, streams progress events to the frontend, and returns the results to Claude.
- Claude examines the results, reasons about what to do next, and either calls another tool or produces a final text response.
- The loop continues until Claude issues an
end_turnstop reason or the 8-call safety limit is reached.
This produces a natural reasoning chain: search flights → calculate remaining budget → search hotels within budget → search experiences → present itinerary with cost breakdown.
The agent calls real APIs, not mocked data:
- SerpApi Google Flights -- origin/destination IATA codes, dates, cabin class. Returns best and alternative flight options with prices, airlines, times, and layover info.
- SerpApi Google Hotels -- city-based hotel search with check-in/out dates. Returns properties with star ratings, prices per night, and amenities.
- Google Places API (New) -- text search for experiences ("museums in Paris", "cooking classes in Barcelona"). Returns place names, ratings, price levels, and addresses.
Results are normalized to the top 5 options per category, sorted by price.
A dedicated calculate_remaining_budget tool performs server-side arithmetic (not LLM arithmetic, which is unreliable). After the agent selects flights, it calls this tool to see what remains for hotels and experiences. If a hotel selection would blow the budget, the agent proactively searches for cheaper alternatives or suggests trade-offs.
Each tool call emits Server-Sent Events to the frontend:
tool_start-- tool name and input parameters (e.g., "Searching flights SFO → BCN...")tool_result-- execution result summaryassistant-- final text response
The frontend renders animated progress indicators for each tool, giving users visibility into a process that typically takes 10–30 seconds.
Conversations persist in the database. Users can follow up with refinements:
- "Find cheaper flights"
- "Switch the hotel to something beachfront"
- "Add a cooking class"
- "What if we fly out a day later?"
The agent receives the full conversation history plus the current trip state (selected flights, hotels, experiences) injected into the system prompt, so it always knows what has been chosen and can modify the plan intelligently.
- Create trips with destination, dates, budget, traveler count, and preference sliders (luxury vs. budget, pace, interest categories).
- Save agent-selected flights, hotels, and experiences to the database.
- List all saved trips with status indicators (planning, saved, archived).
- Load a previous trip and continue the conversation where you left off.
- Delete trips you no longer need.
The frontend renders structured itinerary cards:
- FlightCard -- airline, departure/arrival times, flight number, price, cabin class
- HotelCard -- name, star rating, price per night, total price, check-in/out dates
- ExperienceCard -- name, category, rating, price level, estimated cost
- BudgetBreakdown -- visual breakdown of spending by category against total budget
- Session-based auth with HTTP-only cookies (not JWTs exposed to JavaScript)
- Passwords hashed with bcrypt (10 rounds)
- CSRF guard middleware
- Rate limiting: 100 req/15 min global, 10 req/15 min on auth endpoints
- Protected routes on both frontend (AuthGuard component) and backend (requireAuth middleware)
Every tool invocation is logged to a tool_call_log table:
- Tool name, input parameters, result, latency (ms), cache hit flag, error message
- Enables production-grade observability: "The agent called
search_flights3 times, 2 were cache hits, average latency 1.2s"
voyager/
├── server/ Express API (Railway)
│ ├── src/
│ │ ├── routes/ API route definitions
│ │ ├── handlers/ Request handlers
│ │ ├── services/ Agent loop, caching, SerpApi client
│ │ ├── tools/ 5 tool definitions + executor
│ │ ├── prompts/ System prompt + trip context formatter
│ │ ├── repositories/ Database query layer
│ │ ├── middleware/ Auth, CSRF, rate limiting, logging
│ │ ├── schemas/ Zod validation schemas
│ │ └── config/ Environment, CORS
│ └── migrations/ 10 node-pg-migrate files
│
└── web-client/ Next.js frontend
├── src/
│ ├── app/ App Router pages
│ ├── components/ ChatBox, Itinerary cards, TripForm, etc.
│ ├── context/ AuthContext
│ ├── hooks/ useChat (SSE), useTripState
│ ├── lib/ API fetch wrapper
│ └── styles/ SCSS modules
└── public/
| Layer | Technology | Deployment |
|---|---|---|
| Frontend | Next.js 15, React 19, TypeScript, SCSS, TanStack React Query v5 | -- |
| API Server | Express 5, TypeScript, Pino logging | Railway |
| Database | PostgreSQL (Neon), 10 tables, pgvector-ready | Neon |
| Cache | Redis (ioredis), 1-hour TTL | Railway |
| AI | Anthropic Claude API (claude-sonnet-4-20250514), tool use | Anthropic |
| Flight/Hotel Search | SerpApi (Google Flights + Google Hotels engines) | SerpApi |
| Experiences Search | Google Places API (New) -- Text Search | Google Cloud |
| Auth | Custom session-based (bcrypt + HTTP-only cookies) | -- |
| Testing | Vitest (unit/integration), Playwright (e2e) | -- |
| Dev Tools | ESLint, Prettier, Lefthook (git hooks), pnpm workspaces | -- |
users-- email, password hashsessions-- token hash, expiry, linked to usertrips-- destination, origin, dates, budget, travelers, preferences (JSONB), status enumtrip_flights-- flight details, price, selected flag, raw data (JSONB)trip_hotels-- hotel details, star rating, pricing, selected flagtrip_experiences-- place details, rating, category, estimated cost, selected flagconversations-- one per trip (1:1 relationship)messages-- role (user/assistant/tool), content, tool calls (JSONB), token countapi_cache-- provider-scoped cache with TTL, request hash for deduplicationtool_call_log-- observability: tool name, input/output, latency, cache hit, errorsuser_preferences-- key-value user settings
Two-tier caching maximizes API quota efficiency (SerpApi free tier: 250 searches/month):
- Hot cache (Redis) -- 1-hour TTL, normalized cache keys (sorted params, lowercase)
- Cold cache (PostgreSQL
api_cache) -- persistent, provider-scoped, with expiry timestamps
Cache key normalization ensures that {origin: "SFO", destination: "BCN"} and {destination: "BCN", origin: "SFO"} produce the same key.
This application was built following a three-phase approach:
Get the core agentic loop working end-to-end, deployed. A user sends a trip request, Claude searches flights, calculates budget, searches hotels, and returns an itinerary. SSE progress events stream to the frontend.
Add all five tools. Implement the 8-call safety limit. Full trip persistence (create, save, list, load). Conversation history with tool call logging. Itinerary display components. Multi-turn iteration.
Complete the frontend (forms, chat, cards, budget breakdown). Real-time progress indicators. Trip creation with preference sliders. Saved trips management. Integration tests with mocked APIs. Polish and ship.
- Synchronous tool execution -- The agent needs immediate results to reason about. BullMQ async processing (used in earlier apps) is wrong here; the loop must block on each tool call.
- Server-side budget calculation -- LLMs are unreliable at arithmetic. A dedicated tool ensures correct budget tracking.
- SerpApi over direct Amadeus/Google -- Simpler integration, normalized responses, adequate for a portfolio project.
- 8-call safety limit -- Prevents runaway agent loops from burning API quota or hanging.
- Full conversation memory -- Not summarized. The agent sees every prior message and tool result for maximum context.
- SSE over WebSockets -- Simpler, unidirectional (server to client), sufficient for progress streaming.
- PDF Itinerary Export -- Generate downloadable PDF itineraries with all flight, hotel, and experience details formatted for printing or sharing.
- Email Itinerary Sharing -- Send formatted itinerary summaries via Resend transactional email.
- BullMQ Cache Warming Worker -- Background worker that pre-warms cache for popular routes and destinations during off-peak hours.
- Streaming Text Responses -- Stream Claude's final text response token-by-token (currently sent as a complete block).
- Selection Persistence -- Let users click to select/deselect individual flights, hotels, and experiences from search results, persisting selections to the database.
- Trip Sharing & Collaboration -- Share trip links with travel companions. Multiple users can view and comment on the same itinerary.
- Price Alerts -- Monitor saved trips for price changes on flights and hotels. Notify users when prices drop.
- Calendar Integration -- Export itineraries to Google Calendar or Apple Calendar with proper event times and locations.
- Map Visualization -- Display hotels and experiences on an interactive map (Mapbox or Google Maps embed) for spatial planning.
- Preference Learning -- Track which suggestions users accept vs. reject. Use patterns to improve future recommendations.
- Multi-City Trip Planning -- Support complex itineraries with multiple stops (Paris → Barcelona → Rome).
- Group Trip Budget Splitting -- Split costs across travelers with per-person budget tracking.
- Real-Time Booking Integration -- Move from search-only to actual booking via affiliate links or direct API booking.
- RAG-Enhanced Destination Knowledge -- Ingest travel guides and reviews into a vector database (reusing patterns from App 4) for richer destination context.
- Mobile App -- React Native frontend sharing the same API backend.