Self-host the open-source Firecrawl web-scraping API on Render in one click — API, Playwright worker, Postgres queue, RabbitMQ, and Redis, wired together over the private network.
Firecrawl (github.com/firecrawl/firecrawl) is the scrape/crawl/search API that powers Mendable's hosted product. This template deploys the AGPL-3.0 self-host edition on Render with the full backend stack pre-wired: NUQ Postgres queue (pg_cron-backed), RabbitMQ for pg_notify fan-out, Render Key Value for rate limiting, and a separate Chromium-backed Playwright worker. You get the same API surface as api.firecrawl.dev, running on infrastructure you own.
- Why deploy Firecrawl on Render
- Use cases
- What gets deployed
- Quickstart
- Configuration
- Cost breakdown
- Customization
- Operations
- Upgrading
- Troubleshooting
- FAQ
- Security
- Caveats and limitations
- Credits and license
- Whole stack wired for you — API, Playwright worker, Postgres, RabbitMQ, and Redis are connected over the private network with no copy-pasted connection strings.
- Random Postgres password generated on first deploy —
BULL_AUTH_KEYandPOSTGRES_PASSWORDaregenerateValue: true, so the admin UI and the queue are never left with the upstreamCHANGEMEdefault. - Stateful pieces live on disks, not ephemeral filesystems — Postgres data and RabbitMQ data survive redeploys and instance restarts.
- Auto-deploys upstream images — pinned to
ghcr.io/firecrawl/*:latestso Render redeploys when upstream publishes a new digest. Pin to an explicit version when you want predictability (see below).
What you can build or run with this template:
- Self-hosted scrape API for compliance-heavy products — keep target URLs and scraped content inside your own VPC-equivalent.
- Backend for an LLM RAG pipeline — crawl a docs site once, expose the markdown to your agents via Firecrawl's
/v2/crawland/v2/scrape. - Drop-in replacement for
api.firecrawl.devduring development — pointFIRECRAWL_API_URLat your*.onrender.comhost so SDK code paths exercise your fork. - Behind-the-firewall scraping — combine with an outbound proxy via
PROXY_SERVERto give the agent crawler a stable egress IP. - MCP/agent tool host — pair with
firecrawl-mcpfor Claude, Cursor, or any MCP client.
flowchart LR
user["End user / SDK / MCP"] -->|HTTPS| api["firecrawl-api<br/>(web, image)"]
api -->|amqp| mq["firecrawl-rabbitmq<br/>(pserv, image)"]
api -->|postgres| db[("firecrawl-postgres<br/>(pserv, nuq image)")]
api -->|redis| kv[("firecrawl-redis<br/>(Key Value)")]
api -->|http| pw["firecrawl-playwright<br/>(pserv, image)"]
pw --> targets["Target sites"]
db -.disk.-> pgdisk[/"postgres-data<br/>10 GB"/]
| Resource | Type | Plan | Purpose |
|---|---|---|---|
firecrawl-api |
Web service (Docker) | pro (4 GB) |
API + in-process queue/scrape/extract/nuq workers. Thin Dockerfile wraps ghcr.io/firecrawl/firecrawl:latest so entrypoint.sh can build NUQ_RABBITMQ_URL and PLAYWRIGHT_MICROSERVICE_URL from fromService.hostport values. Public on *.onrender.com. |
firecrawl-playwright |
Private service (image) | standard (2 GB) |
Chromium-backed scraper. Render assigns a suffixed slug (e.g. firecrawl-playwright-vt4c:3000); the Blueprint wires it via fromService.hostport. |
firecrawl-postgres |
Private service (image) | standard (2 GB) |
Custom nuq-postgres image with pg_cron + the NUQ queue schema. Render assigns a suffixed slug (e.g. firecrawl-postgres-ib8d:5432); the Blueprint wires POSTGRES_HOST via fromService.host. |
firecrawl-rabbitmq |
Private service (image) | starter (512 MB) |
RabbitMQ 3 broker for pg_notify fan-out. Intentionally ephemeral — the authoritative queue state is in Postgres, so rabbit doesn't need a disk. |
firecrawl-redis |
Key Value | starter |
Cache + rate-limit store. Wired via fromService.connectionString. |
postgres-data |
Disk on firecrawl-postgres |
10 GB SSD | Persists /var/lib/postgresql/data. |
Region: Oregon (oregon) — edit render.yaml if you need Frankfurt, Singapore, Ohio, or another region. All five resources must share a region for the private network to route between them.
-
Click Deploy to Render.
-
Authorize the Render GitHub App and pick a GitHub account to fork this template into. Render creates
<your-account>/firecrawlfor you. -
Render reads
render.yamland shows the apply screen. Leave the auto-generated secrets alone; paste anOPENAI_API_KEYif you want JSON extraction (optional), and skip the proxy/SearXNG fields unless you need them. -
Click Apply. The first deploy takes ~5–8 minutes — Postgres has to run
initdbagainst the NUQ schema, Render has to pull four images, and the API harness waits for Postgres + RabbitMQ liveness before binding the port. -
When
firecrawl-apiislive, hithttps://<your-service>.onrender.com/v0/health/liveness— expect{"status":"ok"}. Then try a real scrape:curl -X POST 'https://<your-service>.onrender.com/v2/scrape' \ -H 'Content-Type: application/json' \ -d '{"url": "https://firecrawl.dev"}'
The self-hosted API does not require an
Authorizationheader (see SELF_HOST.md).
You set these in the Render dashboard during the Apply step. None are strictly required for a working deploy — the template runs unauthenticated by default — but several unlock optional features.
| Env var | What it's for | How to get it |
|---|---|---|
OPENAI_API_KEY |
Enables AI features: JSON extraction on /v2/scrape, the /v2/extract endpoint, and structured /v2/agent output. Leave blank to disable. |
platform.openai.com/api-keys |
PROXY_SERVER |
Outbound HTTP proxy URL (e.g. http://1.2.3.4:8080). Leave blank if you don't route scrapes through a proxy. |
Your proxy provider |
PROXY_USERNAME |
Proxy basic-auth user. Leave blank for an unauthenticated proxy. | Your proxy provider |
PROXY_PASSWORD |
Proxy basic-auth password. Leave blank for an unauthenticated proxy. | Your proxy provider |
SEARXNG_ENDPOINT |
Self-hosted SearXNG URL to replace Google for /v2/search. Leave blank to use Google. |
Your SearXNG host |
Render generates these on first deploy and stores them as service env vars. Do not rotate them later — rotating POSTGRES_PASSWORD requires resetting the database, and rotating BULL_AUTH_KEY invalidates the admin UI URL.
| Env var | Purpose |
|---|---|
POSTGRES_PASSWORD |
Postgres superuser password. Owned by firecrawl-postgres; the api reads it via fromService.envVarKey. |
BULL_AUTH_KEY |
Protects the queue admin UI at https://<your-service>.onrender.com/admin/<BULL_AUTH_KEY>/queues. |
The Blueprint wires these via fromDatabase, fromService, and literal values pointing at private DNS names. You never type them.
| Env var | Source |
|---|---|
REDIS_URL |
firecrawl-redis.connectionString |
REDIS_RATE_LIMIT_URL |
firecrawl-redis.connectionString |
POSTGRES_HOST |
Literal firecrawl-postgres (private DNS) |
POSTGRES_PORT |
Literal 5432 |
POSTGRES_DB |
firecrawl-postgres.POSTGRES_DB |
POSTGRES_USER |
firecrawl-postgres.POSTGRES_USER |
POSTGRES_PASSWORD |
firecrawl-postgres.POSTGRES_PASSWORD |
NUQ_RABBITMQ_URL |
Literal amqp://firecrawl-rabbitmq:5672 |
PLAYWRIGHT_MICROSERVICE_URL |
Literal http://firecrawl-playwright:3000/scrape |
Common things people change after deploying. Override in the dashboard or edit render.yaml.
| Env var | Default | What it does |
|---|---|---|
NUQ_WORKER_COUNT |
2 |
Number of NUQ scrape worker child processes inside the api container. Upstream default is 5; reduced here to fit the 4 GB plan. |
NUM_WORKERS_PER_QUEUE |
4 |
Per-queue concurrency. Upstream default is 8. |
CRAWL_CONCURRENT_REQUESTS |
5 |
Concurrent in-flight scrapes per crawl job. Upstream default is 10. |
MAX_CONCURRENT_JOBS |
3 |
Cap on simultaneous crawl jobs. Upstream default is 5. |
BROWSER_POOL_SIZE |
3 |
Chromium contexts held warm in the Playwright service. Upstream default is 5. |
MAX_CPU |
0.8 |
Worker rejects new jobs above this CPU utilization. |
MAX_RAM |
0.8 |
Worker rejects new jobs above this memory utilization. |
LOGGING_LEVEL |
info |
One of debug, info, warn, error. |
MAX_CONCURRENT_PAGES |
5 |
(Playwright service) max concurrent browser tabs. |
BLOCK_MEDIA |
true |
(Playwright service) blocks images/video/fonts to cut bandwidth and RAM. |
Full upstream config reference: SELF_HOST.md.
Reference prices from render.com/pricing at time of writing. Confirm current pricing on the dashboard before launching.
| Resource | Plan | Approx. monthly cost |
|---|---|---|
firecrawl-api (web, image) |
pro (4 GB / 2 CPU) |
$85 |
firecrawl-playwright (pserv) |
standard (2 GB / 1 CPU) |
$25 |
firecrawl-postgres (pserv) |
standard (2 GB / 1 CPU) |
$25 |
firecrawl-rabbitmq (pserv) |
starter (512 MB / 0.5 CPU) |
$7 |
firecrawl-redis (Key Value) |
starter (256 MB) |
$10 |
postgres-data (disk) |
10 GB SSD | $2.50 |
| Total | ~$155/month |
Cheaper: drop firecrawl-api to standard (2 GB) and set NUQ_WORKER_COUNT=1 — the API will still respond but throughput drops. Drop firecrawl-playwright to starter only if you also drop MAX_CONCURRENT_PAGES=1; Chromium needs the RAM.
Scale up: bump firecrawl-api to pro_plus (8 GB) and restore NUQ_WORKER_COUNT=5 / NUM_WORKERS_PER_QUEUE=8 to match upstream defaults. Add a second instance via scaling.numInstances on firecrawl-api (stateless) — but leave firecrawl-postgres and firecrawl-rabbitmq at one instance because both own disks.
The template tracks :latest for all three Firecrawl images. To pin:
# render.yaml
services:
- type: web
name: firecrawl-api
image:
url: ghcr.io/firecrawl/firecrawl@sha256:<digest> # immutable
# or
url: ghcr.io/firecrawl/firecrawl:v2.10 # semver tagRepeat for firecrawl-playwright and firecrawl-postgres. Always pin all three together — the NUQ schema and the harness version must match.
In the Render dashboard, open firecrawl-api → Settings → Custom Domains → Add. Render issues a TLS certificate automatically once the CNAME resolves. Full DNS instructions: Render custom domains docs.
You can't — see the Caveats section. Firecrawl's NUQ queue requires pg_cron extension and runs DDL/cron jobs in initdb. Render's managed Postgres doesn't support either. The custom nuq-postgres image is the only supported configuration today.
Self-hosted Firecrawl runs unauthenticated by default (see upstream SELF_HOST.md → API Keys for SDK Usage). To restrict access, put firecrawl-api behind:
- A Render IP allowlist on the service (Pro plan and up), or
- Cloudflare Access / Tailscale Funnel in front of the
*.onrender.comURL, or - The
firecrawl-apiservice's private-network-only setting by changingtype: webtotype: pserv(it then becomes reachable only from other Render services).
# render.yaml
- type: web
name: firecrawl-api
scaling:
minInstances: 1
maxInstances: 3
targetCPUPercent: 70The api container is stateless — all state lives in Postgres, RabbitMQ, and Redis — so horizontal scaling is safe. The harness inside each instance still spawns its own worker children; that's expected.
- Postgres: the
postgres-datadisk has automatic daily snapshots retained for 7 days on paid plans. Restore via dashboard. The NUQ queue is mostly ephemeral — completed jobs are deleted bypg_cronafter 1 hour, failed jobs after 6 hours — so the disk holds mostly in-flight queue rows and isn't a long-term archive. - RabbitMQ: disk snapshots also apply. RabbitMQ's persisted queues survive restarts.
- Application state: scraped content is returned in the API response, not persisted. If you need archival, store responses in your own object storage on the client side.
Each service exposes CPU, memory, request rate, and response time charts under Metrics in the dashboard. firecrawl-api has healthCheckPath: /v0/health/liveness, so Render will mark deploys unhealthy and roll back if the API can't bind a port within the health-check window.
firecrawl-api is stateless — set scaling.minInstances / scaling.maxInstances to add capacity. The other four services (Postgres, RabbitMQ, Playwright, Redis) are single-instance:
- Postgres and RabbitMQ own disks; multi-instance with the same disk isn't supported.
- The Playwright service is technically stateless but heavy on RAM — scale vertically (bigger plan) before scaling horizontally.
- Render Key Value is single-instance by design.
In the Render dashboard, open the service → Logs. Or via CLI:
render logs --resources <service-id> --tailThe API harness logs are colorized per worker (api, worker, extract-worker, nuq-worker-N, nuq-prefetch-worker, nuq-reconciler).
Because the template pins :latest, Render redeploys automatically when upstream publishes a new image only if you have autoDeployTrigger set to detect image digest changes. By default, you must trigger a manual deploy:
- Dashboard →
firecrawl-api→ Manual Deploy → Clear build cache & deploy. - Repeat for
firecrawl-playwrightandfirecrawl-postgres.
Upgrade all three images in the same window — the NUQ queue schema and the api harness expect compatible versions.
For automated upgrades, switch to explicit version tags and bump via PR. The Renovate Docker manager understands image.url references in render.yaml.
Watch the upstream Firecrawl releases before upgrading across major versions. Notable migrations to date:
- v2.0 (Sep 2024) — endpoint path moved from
/v1/scrapeto/v2/scrape. Update your SDK tofirecrawl-py>=2.0/@mendable/firecrawl-js>=2.0. - NUQ queue migration — pre-NUQ Firecrawl (≤ v1.x) used a Bull queue on Redis only. This template ships the post-NUQ stack (Postgres + RabbitMQ + Redis). You can't downgrade to a pre-NUQ image without removing the
firecrawl-postgresandfirecrawl-rabbitmqservices.
ghcr.io/firecrawl/* images are public, so this is rare. If it does happen:
- Check
https://github.com/firecrawl/firecrawl/pkgs/container/firecrawl— confirm the tag exists. - Hit "Clear build cache & deploy" in the dashboard. Render caches image digests aggressively.
- If pinning to a digest with
@sha256:, double-check the digest is for the right architecture (Render runslinux/amd64).
The api harness waits for Postgres and RabbitMQ before binding port 3002. If either backend is slow to start, the api hits Render's health-check timeout. Fixes:
- Check
firecrawl-postgreslogs fordatabase system is ready to accept connectionsandfirecrawl-rabbitmqforServer startup complete. Both should appear in the first 60–90 seconds. - If Postgres logs show
data directory "/var/lib/postgresql/data" has wrong ownership, the disk was created out of order. Delete the disk and redeploy. - If you downgraded
firecrawl-apibelowstandard, you'll see Node OOM crashes (Reached heap limit Allocation failed - JavaScript heap out of memory). Restoreproor higher.
This is harmless. Firecrawl uses Supabase for advanced logging on the cloud product; self-hosted instances don't configure it. The error appears in logs but does not affect scraping. See upstream SELF_HOST.md → Supabase client is not configured.
Self-hosted Firecrawl runs unauthenticated by default. This warning is expected. If you need auth, see Enable authentication above.
pg_cron requires shared_preload_libraries = 'pg_cron' and cron.database_name = 'postgres'. The nuq-postgres image sets both, but if you overrode POSTGRES_DB away from postgres in render.yaml, the cron extension cannot find its target database. Restore POSTGRES_DB: postgres or rebuild the image with a matching cron.database_name.
- Service-level logs: dashboard → Logs (or
render logs --resources <id> --tail) - Deploy-level logs: dashboard → Events → click the failed deploy
- Open an issue in this template repo for Render-specific bugs
- Open an issue upstream for Firecrawl application bugs
No. Free web services sleep after 15 minutes of inactivity, which breaks the in-process queue workers, and 512 MB is far below Firecrawl's startup memory floor. The minimum supported plan is standard (2 GB) for firecrawl-api, and pro (4 GB) is what the template ships.
The NUQ Postgres schema is queue state only — there's no long-lived application data to migrate. Steps:
- Drain the source instance (stop accepting new requests, wait for in-flight crawls).
- Deploy this template.
- Cut over your client code by changing the API base URL.
If you want to preserve in-flight crawl IDs across the cutover, dump/restore the nuq.* tables from the source Postgres to the destination via render psql --resources firecrawl-postgres + pg_dump / pg_restore. Most users skip this.
Yes. Remove the firecrawl-postgres and/or firecrawl-rabbitmq services from render.yaml, and replace the fromService references on firecrawl-api with literal value: strings pointing at your hosts. Constraints:
- Your Postgres must have the
pg_cronextension and you must run thenuq.sqlschema by hand once. - Your RabbitMQ must speak AMQP 0-9-1 — anything from
rabbitmq:3.xupward works.
firecrawl-postgres re-runs initdb on the next boot, which re-creates the NUQ schema. You lose in-flight queue jobs but the api will keep working. Treat this as a nuclear reset for the queue.
Firecrawl's NUQ queue requires the pg_cron extension and runs DDL during initdb (apps/nuq-postgres/nuq.sql). Render's managed Postgres does not currently support pg_cron, and initdb scripts only run on a self-managed image. The custom nuq-postgres image is the only supported configuration today. Managed Postgres trades the customizability for daily backups, HA, and read replicas — when Firecrawl ships a pg_cron-free queue backend, this template will switch.
Render's private network is region-scoped — all five resources must share a region. For multi-region, deploy this stack independently in each region and put a global LB (Cloudflare, Fly Anycast, etc.) in front of the *.onrender.com URLs.
- Encryption at rest: Render disks (
postgres-data,rabbitmq-data) are encrypted by Render's storage layer. Render Key Value is encrypted at rest. - Encryption in transit: TLS terminates at the
*.onrender.comhostname forfirecrawl-api. The four private services (firecrawl-playwright,firecrawl-postgres,firecrawl-rabbitmq,firecrawl-redis) communicate over Render's private network without TLS — fine for service-to-service traffic inside the same workspace. - Network exposure:
firecrawl-apiis the only public endpoint. The other four services are unreachable from the public internet. - Secret rotation:
- Safe to rotate:
OPENAI_API_KEY,PROXY_*. Update in dashboard; no redeploy required for the env to apply on next worker spawn. - Dangerous to rotate:
BULL_AUTH_KEY(invalidates the admin UI URL — note the new one),POSTGRES_PASSWORD(requires recreating the Postgres role).
- Safe to rotate:
- Reporting vulnerabilities: template-specific bugs → this repo's issues. Firecrawl application bugs → upstream security policy.
- No managed Postgres. Render's managed Postgres lacks
pg_cron, so the template runsnuq-postgresas a private service. You lose Render's daily logical backups, point-in-time recovery, and the managed HA tier. Disk snapshots (7-day retention) are the closest substitute. - No managed RabbitMQ. Render doesn't offer managed AMQP. RabbitMQ runs as a single-instance pserv with a 1 GB disk. Lose the disk → lose any in-flight queue messages.
- Single-instance Postgres and RabbitMQ. Both services own disks; you can't run multiple instances. Vertical scaling only.
- Five resources × Oregon. The Blueprint deploys five resources in one region. Inter-region private networking is not available on Render today.
:latestimage tags by default. Upstream can push a breaking change at any time. Pin to immutable digests for production (Pin the upstream version).- Standard plan is the floor for
firecrawl-api. Downgrading tostartercauses Node OOM during harness startup; the log signature isReached heap limit Allocation failed - JavaScript heap out of memoryfollowed by Render's port-scanner emitting==> No open ports detected. - First image pull takes ~90 seconds per service. Subsequent deploys are <30s because Render caches the digest.
- No authentication out of the box. Self-hosted Firecrawl matches upstream's no-auth default. Anyone with the
*.onrender.comURL can hit the API. Lock it down before exposing it broadly — see Enable authentication. - AGPL-3.0. The upstream Firecrawl API is AGPL-3.0. If you offer this as a hosted service to third parties, you must publish your modifications under AGPL-3.0. The SDKs and this template are MIT.
- Upstream: firecrawl/firecrawl — AGPL-3.0
- Render template: MIT (see LICENSE)
- Template maintainer: render-examples
If this template helped you, give upstream a star.