A high-performance REST API service built with Playwright and TypeScript that extracts brand identity elements from any website.
Extracts the following brand assets from any URL:
- Logo - Discovers brand logos using multi-strategy heuristics (meta tags, common selectors, SVG detection)
- Tagline - Extracts primary heading or Open Graph title
- Description - Pulls meta descriptions and summary text
- Color Palette - Analyzes computed styles to extract brand colors (up to 8 colors)
- Typography - Detects fonts used across headings and body text (up to 4 fonts)
- Screenshot - Captures viewport screenshot in base64 PNG format
- Node.js >= 18.0.0
- npm or yarn
# Install dependencies
npm install
# Playwright will auto-install Chromium browserCreate a .env file in the root directory:
PORT=3001
NODE_ENV=development
ALLOWED_ORIGINS=http://localhost:3000,https://your-production-domain.com# Development mode (with hot reload)
npm run dev
# Build for production
npm run build
# Start production server
npm startThe service will start on http://localhost:3001 (or your configured PORT).
GET /healthResponse:
{
"status": "ok",
"service": "brand-scraper-service",
"timestamp": "2025-12-26T13:33:21.000Z"
}POST /api/scrape
Content-Type: application/json
{
"url": "https://example.com"
}Response:
{
"success": true,
"data": {
"title": "Example Domain",
"logo": "https://example.com/logo.png",
"tagline": "Example Domain",
"description": "This domain is for use in illustrative examples...",
"colors": ["#1A73E8", "#34A853", "#FBBC04", "#EA4335"],
"fonts": ["Roboto", "Arial", "Helvetica"],
"screenshot": "..."
},
"meta": {
"scrapedAt": "2025-12-26T13:33:21.000Z",
"duration": "3247ms"
}
}Error Response:
{
"error": "Scraping failed",
"message": "Navigation timeout exceeded",
"url": "https://example.com"
}- Average scrape time: 2-5 seconds for simple sites
- Complex sites: 5-8 seconds
- Optimizations:
- Headless browser mode
- Network idle detection
- Parallel extraction using
page.evaluate() - Viewport-only screenshots
-
Logo Discovery (Multi-priority):
- Apple Touch Icon
- Open Graph image
- Common CSS selectors (
img[class*="logo"], etc.) - SVG detection
- Favicon fallback
-
Color Extraction:
- Analyzes computed styles from key elements
- Filters out generic colors (white, black, transparent)
- Converts RGB/RGBA to HEX format
- Returns top 8 unique colors
-
Font Detection:
- Queries computed
font-familyfrom typography elements - Extracts primary font from font stack
- Deduplicates and returns top 4 fonts
- Queries computed
-
Screenshot:
- Viewport-only capture (1280x800)
- PNG format, base64 encoded
- Optimized for speed
- Engine: Chromium (via Playwright)
- Mode: Headless
- Viewport: 1280x800
- Wait Strategy: Network idle with 45s timeout
- User Agent: Modern Chrome on Windows
- CORS protection with configurable origins
- Request body size limit (10MB)
- URL validation before scraping
- Graceful error handling
- No data persistence
| Variable | Default | Description |
|---|---|---|
PORT |
3001 |
Server port |
NODE_ENV |
development |
Environment mode |
ALLOWED_ORIGINS |
* |
CORS allowed origins (comma-separated) |
If Chromium doesn't install automatically:
npx playwright install chromiumFor large-scale scraping, increase Node.js memory:
NODE_OPTIONS="--max-old-space-size=4096" npm startMIT
Contributions are welcome! Please feel free to submit issues or pull requests.