Skip to content

Commit 78e2efb

Browse files
committed
docs: add setup and API guides and align scraper run actions
1 parent d48e708 commit 78e2efb

File tree

8 files changed

+505
-6
lines changed

8 files changed

+505
-6
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ All three commands now resolve to the same root dev target.
290290

291291
### 🐳 Docker Deployment
292292

293-
HeadlessX can be easily deployed using Docker Compose. See the [Docker Setup Guide](docs/docker_setup.md) for detailed instructions.
293+
HeadlessX can be easily deployed using Docker Compose. See the [Setup Guide](docs/setup-guide.md) for detailed instructions.
294294

295295
```bash
296296
# Start the application in detached mode
@@ -361,6 +361,8 @@ WEB_PORT=3000 pnpm --filter headlessx-web dev
361361

362362
## 🌐 API Endpoints
363363

364+
For the full backend route inventory, see [docs/api-endpoints.md](docs/api-endpoints.md).
365+
364366
### Website Scraping APIs
365367

366368
| Endpoint | Method | Description |

apps/web/src/components/playground/exa/config/ActionButtons.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ export function ActionButtons({ disabled = false, isPending, onRun, onStop }: Ac
1414
type="button"
1515
onClick={onRun}
1616
disabled={disabled || isPending}
17-
className="inline-flex h-12 items-center justify-center rounded-2xl bg-slate-900 px-5 text-sm font-semibold text-white transition-colors hover:bg-slate-800 disabled:cursor-not-allowed disabled:bg-slate-200 disabled:text-slate-500"
17+
className="inline-flex h-12 items-center justify-center rounded-2xl bg-primary px-5 text-sm font-semibold text-primary-foreground transition-colors hover:bg-primary/90 disabled:cursor-not-allowed disabled:bg-slate-200 disabled:text-slate-500"
1818
>
1919
Run Search
2020
</button>

apps/web/src/components/playground/google-serp/ConfigurationPanel.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ export function ConfigurationPanel({
118118
<button
119119
type="submit"
120120
disabled={!query.trim() || isLoading}
121-
className="inline-flex w-full items-center justify-center gap-2 rounded-2xl bg-slate-900 px-4 py-4 text-sm font-semibold text-white transition-colors hover:bg-slate-800 disabled:cursor-not-allowed disabled:opacity-60"
121+
className="inline-flex w-full items-center justify-center gap-2 rounded-2xl bg-primary px-4 py-4 text-sm font-semibold text-primary-foreground transition-colors hover:bg-primary/90 disabled:cursor-not-allowed disabled:opacity-60"
122122
>
123123
{isLoading ? (
124124
<>

apps/web/src/components/playground/tavily/config/ActionButtons.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ export function ActionButtons({
2828
type="button"
2929
onClick={onRun}
3030
disabled={isPending || !hasQuery || !hasApiKey}
31-
className="inline-flex items-center justify-center gap-2 rounded-2xl bg-slate-900 px-4 py-4 text-sm font-semibold text-white transition-colors hover:bg-slate-800 disabled:cursor-not-allowed disabled:opacity-60"
31+
className="inline-flex items-center justify-center gap-2 rounded-2xl bg-primary px-4 py-4 text-sm font-semibold text-primary-foreground transition-colors hover:bg-primary/90 disabled:cursor-not-allowed disabled:opacity-60"
3232
>
3333
<HugeiconsIcon icon={ButtonIcon} className="h-4 w-4" />
3434
{buttonLabel}

apps/web/src/components/playground/website/config/ActionButtons.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ export function ActionButtons({
2929
type="button"
3030
onClick={onRun}
3131
disabled={isPending || !hasUrl}
32-
className="inline-flex items-center justify-center gap-2 rounded-2xl bg-slate-900 px-4 py-4 text-sm font-semibold text-white transition-colors hover:bg-slate-800 disabled:cursor-not-allowed disabled:opacity-60"
32+
className="inline-flex items-center justify-center gap-2 rounded-2xl bg-primary px-4 py-4 text-sm font-semibold text-primary-foreground transition-colors hover:bg-primary/90 disabled:cursor-not-allowed disabled:opacity-60"
3333
>
3434
<HugeiconsIcon icon={tool === 'map' ? LinkSquare01Icon : SparklesIcon} className="h-4 w-4" />
3535
{buttonLabel}

apps/web/src/components/playground/youtube/config/ActionButtons.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ export function ActionButtons({ disabled = false, isPending, onRun, onStop }: Ac
1414
type="button"
1515
onClick={onRun}
1616
disabled={disabled || isPending}
17-
className="inline-flex h-12 items-center justify-center rounded-2xl bg-slate-900 px-5 text-sm font-semibold text-white transition-colors hover:bg-slate-800 disabled:cursor-not-allowed disabled:bg-slate-200 disabled:text-slate-500"
17+
className="inline-flex h-12 items-center justify-center rounded-2xl bg-primary px-5 text-sm font-semibold text-primary-foreground transition-colors hover:bg-primary/90 disabled:cursor-not-allowed disabled:bg-slate-200 disabled:text-slate-500"
1818
>
1919
Extract
2020
</button>

docs/api-endpoints.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# API Endpoints
2+
3+
This document describes the backend HTTP surface for `apps/api` in HeadlessX.
4+
5+
It is based on the current route tree mounted in `apps/api/src/app.ts`.
6+
7+
## Backend System Summary
8+
9+
- Runtime: Express 5 API with TypeScript
10+
- Persistence: PostgreSQL via Prisma
11+
- Auth: `x-api-key` guard on all non-health routes
12+
- Async jobs: BullMQ with Redis and a separate worker process
13+
- Browser scraping: Camoufox and Playwright services
14+
- External integrations: Tavily, Exa, yt-engine, HTML-to-Markdown service
15+
16+
## Auth And Transport
17+
18+
- Public route: `GET /api/health`
19+
- Protected routes: every other `/api/*` endpoint requires `x-api-key`
20+
- Internal dashboard traffic can use `DASHBOARD_INTERNAL_API_KEY`
21+
- SSE endpoints use `text/event-stream`
22+
23+
Common SSE event names in this backend:
24+
25+
- `start`
26+
- `progress`
27+
- `result`
28+
- `error`
29+
- `done`
30+
31+
Google SERP currently ends its stream with `end` instead of `done`.
32+
33+
## Dependency Notes
34+
35+
| Area | Requirement |
36+
| --- | --- |
37+
| `/api/jobs/*` | Redis plus the queue worker |
38+
| `/api/website/crawl` | Redis plus the queue worker |
39+
| `/api/website/content` | Uses `HTML_TO_MARKDOWN_SERVICE_URL` when available, then falls back locally |
40+
| `/api/tavily/*` | `TAVILY_API_KEY` |
41+
| `/api/exa/*` | `EXA_API_KEY` |
42+
| `/api/youtube/*` | `YT_ENGINE_URL` |
43+
| most protected routes | PostgreSQL for API keys, logs, settings, proxies, and persisted data |
44+
45+
## Core Endpoints
46+
47+
| Method | Path | Purpose |
48+
| --- | --- | --- |
49+
| `GET` | `/api/health` | Public health check and route summary |
50+
| `GET` | `/api/config` | Read current system settings |
51+
| `PATCH` | `/api/config` | Update system settings and restart browser runtime |
52+
| `GET` | `/api/dashboard/stats` | Read dashboard summary metrics |
53+
| `GET` | `/api/logs` | List paginated request logs |
54+
| `GET` | `/api/logs/stats` | Read aggregated request log stats |
55+
| `GET` | `/api/keys` | List API keys |
56+
| `POST` | `/api/keys` | Create API key |
57+
| `PATCH` | `/api/keys/:id/revoke` | Revoke API key |
58+
| `DELETE` | `/api/keys/:id` | Delete API key |
59+
60+
## Proxy Endpoints
61+
62+
| Method | Path | Purpose |
63+
| --- | --- | --- |
64+
| `GET` | `/api/proxies` | List all proxies |
65+
| `GET` | `/api/proxies/active` | List active proxies only |
66+
| `GET` | `/api/proxies/:id` | Read one proxy |
67+
| `POST` | `/api/proxies` | Create proxy |
68+
| `PATCH` | `/api/proxies/:id` | Update proxy |
69+
| `DELETE` | `/api/proxies/:id` | Delete proxy |
70+
| `POST` | `/api/proxies/:id/toggle` | Toggle active state |
71+
| `POST` | `/api/proxies/:id/test` | Test proxy connectivity |
72+
73+
## Website Scraper Endpoints
74+
75+
| Method | Path | Purpose | Notes |
76+
| --- | --- | --- | --- |
77+
| `POST` | `/api/website/scrape` | SSE website scrape | Primary streaming scrape route |
78+
| `POST` | `/api/website/stream` | SSE website scrape | Legacy alias of `/scrape` |
79+
| `POST` | `/api/website/map` | Discover links quickly | Non-streaming |
80+
| `POST` | `/api/website/map/stream` | Stream site discovery progress | SSE |
81+
| `POST` | `/api/website/crawl` | Queue-backed crawl job | Requires Redis and worker |
82+
| `POST` | `/api/website/html` | Fast HTML scrape | No JS rendering |
83+
| `POST` | `/api/website/html-js` | JS-rendered HTML scrape | Browser-rendered |
84+
| `POST` | `/api/website/content` | Markdown content extraction | Uses markdown service when configured |
85+
| `POST` | `/api/website/screenshot` | Full-page screenshot | Binary image result |
86+
87+
## Google SERP Endpoints
88+
89+
| Method | Path | Purpose | Notes |
90+
| --- | --- | --- | --- |
91+
| `POST` | `/api/google-serp/search` | Standard Google result scrape | JSON response |
92+
| `GET` | `/api/google-serp/stream` | Stream Google search progress | SSE, expects query params like `query` and optional `timeout` |
93+
| `GET` | `/api/google-serp/status` | Service status | Lightweight availability check |
94+
95+
## Tavily Endpoints
96+
97+
| Method | Path | Purpose |
98+
| --- | --- | --- |
99+
| `POST` | `/api/tavily/search` | Tavily search |
100+
| `POST` | `/api/tavily/research` | Start Tavily research workflow |
101+
| `GET` | `/api/tavily/research/:requestId` | Poll Tavily research result |
102+
| `GET` | `/api/tavily/status` | Tavily configuration and status |
103+
104+
## Exa Endpoints
105+
106+
| Method | Path | Purpose | Notes |
107+
| --- | --- | --- | --- |
108+
| `POST` | `/api/exa/search` | Standard Exa search | JSON response |
109+
| `POST` | `/api/exa/search/stream` | Stream Exa search progress | SSE |
110+
| `GET` | `/api/exa/status` | Exa configuration and status | Lightweight availability check |
111+
112+
## YouTube Endpoints
113+
114+
| Method | Path | Purpose | Notes |
115+
| --- | --- | --- | --- |
116+
| `POST` | `/api/youtube/info/stream` | Stream YouTube extract flow | SSE |
117+
| `POST` | `/api/youtube/info` | Extract YouTube metadata | JSON response |
118+
| `POST` | `/api/youtube/formats` | Extract available format inventory | JSON response |
119+
| `POST` | `/api/youtube/subtitles` | Extract subtitles and captions | JSON response |
120+
| `POST` | `/api/youtube/save/stream` | Stream temporary download packaging | SSE |
121+
| `POST` | `/api/youtube/save` | Create temporary downloadable archive | JSON response |
122+
| `GET` | `/api/youtube/download/:jobId` | Download generated zip | Proxies yt-engine artifact |
123+
| `DELETE` | `/api/youtube/download/:jobId` | Delete temporary saved artifact | Cleanup endpoint |
124+
| `GET` | `/api/youtube/status` | yt-engine status | Fails if `YT_ENGINE_URL` is missing or unavailable |
125+
126+
## Queue Job Endpoints
127+
128+
| Method | Path | Purpose | Notes |
129+
| --- | --- | --- | --- |
130+
| `GET` | `/api/jobs` | List queue jobs | Filtered via query params |
131+
| `GET` | `/api/jobs/metrics` | Read queue metrics | BullMQ-backed |
132+
| `POST` | `/api/jobs` | Create generic queue job | Supports multiple job types |
133+
| `POST` | `/api/jobs/scrape` | Enqueue scrape job | Async |
134+
| `POST` | `/api/jobs/crawl` | Enqueue crawl job | Async |
135+
| `POST` | `/api/jobs/extract` | Enqueue extract job | Async |
136+
| `POST` | `/api/jobs/index` | Enqueue index job | Async |
137+
| `GET` | `/api/jobs/active` | Read currently active job | Checks stream jobs first, then queue |
138+
| `GET` | `/api/jobs/:id` | Read job status/result | Works for stream and queue jobs |
139+
| `GET` | `/api/jobs/:id/stream` | Reconnect to job progress stream | SSE |
140+
| `POST` | `/api/jobs/:id/cancel` | Cancel running or queued job | Uses active job manager / queue cancellation |
141+
142+
## Legacy Compatibility Routes
143+
144+
These routes are still mounted for backward compatibility.
145+
146+
### `/api/v1`
147+
148+
| Method | Path | Purpose |
149+
| --- | --- | --- |
150+
| `POST` | `/api/v1/html` | Legacy HTML scrape |
151+
| `POST` | `/api/v1/html-js` | Legacy JS HTML scrape |
152+
| `POST` | `/api/v1/content` | Legacy content extraction |
153+
| `POST` | `/api/v1/screenshot` | Legacy screenshot |
154+
| `GET` | `/api/v1/config` | Legacy config read |
155+
| `PATCH` | `/api/v1/config` | Legacy config update |
156+
| `GET` | `/api/v1/logs` | Legacy request logs |
157+
| `GET` | `/api/v1/api-keys` | Legacy API key list |
158+
| `POST` | `/api/v1/api-keys` | Legacy API key create |
159+
160+
### `/api/v2`
161+
162+
| Method | Path | Purpose |
163+
| --- | --- | --- |
164+
| `POST` | `/api/v2/html` | V2 HTML scrape |
165+
| `POST` | `/api/v2/html-js` | V2 JS HTML scrape |
166+
| `POST` | `/api/v2/content` | V2 content extraction |
167+
| `POST` | `/api/v2/screenshot` | V2 screenshot |
168+
| `GET` | `/api/v2/config` | V2 config read |
169+
| `PATCH` | `/api/v2/config` | V2 config update |
170+
171+
## Operational Notes
172+
173+
- The API and worker are separate processes. Queue-backed endpoints may return `503` when Redis is unavailable.
174+
- Configuration changes invalidate cached settings and restart the browser service.
175+
- Website Crawl is not an inline scrape. It is a queued workflow.
176+
- The web dashboard talks to this API using the internal dashboard key on server-side requests.

0 commit comments

Comments
 (0)