SeaPortal

Fast content extraction for AI agents. HTTP-first, no browser required.

Install

# npm (recommended)
npm install -g seaportal

# Go
go install github.com/pinchtab/seaportal/cmd/seaportal@latest

Usage

seaportal https://pinchtab.com

# Options
seaportal --json https://pinchtab.com       # JSON output
seaportal --snapshot https://pinchtab.com   # Accessibility tree
seaportal --fast https://pinchtab.com       # Bail early if browser needed
seaportal --no-dedupe https://pinchtab.com  # Disable deduplication

# Subcommands
seaportal sitemap https://pinchtab.com/sitemap.xml  # Flatten a sitemap
seaportal feed https://pinchtab.com/feed.xml        # Parse RSS / Atom / JSON Feed
seaportal mcp                                       # Run as an MCP server over stdio

# Version
seaportal --version

The full flag list and subcommands are in the CLI reference. SeaPortal also runs as an MCP server (seaportal mcp), and ships seabench, a benchmark/evaluation harness.

Accessibility Snapshot

The --snapshot flag outputs a semantic accessibility tree — useful for AI agents that need to understand page structure and interact with elements:

seaportal --snapshot https://pinchtab.com

{
  "role": "document",
  "children": [
    {
      "role": "navigation",
      "name": "Main",
      "tag": "nav",
      "ref": "e1",
      "selector": "#main-nav",
      "depth": 0,
      "children": [
        {"role": "link", "name": "Home", "tag": "a", "ref": "e2", "selector": "a.nav-link", "depth": 1, "href": "/", "interactive": true}
      ]
    }
  ]
}

Each node includes:

role — Accessibility role (heading, link, button, textbox, etc.)
name — Accessible name (from aria-label, title, alt, or text)
tag — HTML tag name (div, a, button, etc.)
ref — Element reference (e1, e2...) for targeting
selector — CSS selector for the element
depth — Nesting depth in the tree
interactive — Whether the element can be clicked/typed
level — Heading level (1-6) for headings
href — Link target for links

Snapshot Options

# Filter to interactive elements only
seaportal --snapshot --filter=interactive https://example.com

# Compact text output (instead of JSON)
seaportal --snapshot --format=compact https://example.com

# Limit output size (approximate token count)
seaportal --snapshot --max-tokens=2000 https://example.com

# Combine options
seaportal --snapshot --filter=interactive --format=compact https://example.com

Compact format outputs a readable text tree:

document
  e1 navigation "Main" <nav> [interactive]
    e2 link "Home" <a> [interactive] href=/
    e3 link "Docs" <a> [interactive] href=/docs
  e4 main <main>
    e5 heading "Welcome" <h1> level=1

As a Library

The public package is the module root, github.com/pinchtab/seaportal:

import "github.com/pinchtab/seaportal"

// Extract content
result := seaportal.FromURL("https://pinchtab.com")
fmt.Println(result.Content) // extracted Markdown

// With options
result := seaportal.FromURLWithOptions("https://pinchtab.com", seaportal.Options{
    Dedupe:   true,
    FastMode: true,
})

// Build accessibility snapshot
snapshot, err := seaportal.BuildSnapshot(htmlString)

// Snapshot with options (filter, max tokens)
opts := seaportal.SnapshotOptions{
    FilterInteractive: true,
    MaxTokens:         2000,
}
snapshot, err := seaportal.BuildSnapshotWithOptions(htmlString, opts)

// Compact text output
fmt.Println(snapshot.ToCompact())

See the API reference for the full surface.

Features

Fast on its niche — Pure HTTP; on reachable static/SSR pages p50 ~1s, p95 ~2s (across the open web the tail is much longer)
Stealthy — Chrome TLS fingerprint, realistic headers
Smart — Readability extraction + Markdown conversion
Semantic — Accessibility tree for AI agents
Honest — Classifies pages, signals when browser is needed
Clean — Deduplicates repeated content blocks

Detection

Automatically detects:

Bot protection (Cloudflare, AWS WAF, DataDome, PerimeterX)
Captcha pages
Access denied / login walls
SPA / JavaScript-only content

Page Classification

Class	Description
`static`	Pure HTML, high confidence
`ssr`	Server-rendered, good extraction
`hydrated`	SSR + JS enhancement, usually extractable
`spa`	JavaScript-only content, needs browser
`dynamic`	Heavy client-side rendering
`blocked`	Bot protection, captcha, access denied

The quality float is an advisory soft signal, not a gate — clean server-rendered pages routinely score ~0 while extracting perfectly. Route on the page class and browser-recommendation signal (profile.decision / browserRecommended), not the raw quality value. See api.md and browser-discriminator.md.

Reliability / what to expect

SeaPortal is a fast first-pass triage that fails over, not a universal fetcher. It wins on static and server-rendered pages and tells you when to reach for a browser instead of pretending every URL extracts.

Numbers below are a frozen snapshot of the committed live sweeps — full breakdown, dates, and git SHAs in the reliability reference:

	Reachable, in-niche (static/SSR)	Across the open web (Tranco top-1000)
Latency (ok fetches)	p50 ~1s, p95 ~2s	p50 ~1.6s, p90 >10s, p95 ~15s
Success	~94% ok	40% ok — ~53% netting out the ~242 dead CDN/DNS infra hosts

What that means in practice:

In its niche it's fast and reliable.
Across the raw open web, ~1 in 3 hosts time out and ~1 in 4 error — many are CDN/DNS infrastructure domains (akamaiedge.net, cloudfront.net, …) that never serve HTML.
Treat extraction as triage: set --timeout and route on the browser-recommendation signal (profile.decision / browserRecommended), failing over to a real browser rather than assuming the happy path.

Regenerate any time with ./dev bench sweep (see the seabench reference).

What It Doesn't Do

JavaScript execution
Full browser rendering
Cookie/session management

For JS-heavy pages, use a browser and pass HTML to seaportal.FromHTML().

Core vs. advanced surfaces

SeaPortal is, first, one thing: a fast, no-browser fetch-and-extract primitive that returns clean Markdown + an accessibility snapshot and tells you when a page needs a browser. That is the core, and everything in the value prop above describes it.

Layered on top are secondary, opt-in helpers — useful, but not the identity and off by default:

Surface	What it is	Where
Chunking (`--chunk`)	Split Markdown into heading/sentence/window chunks for RAG	api.md
BM25 ranking (`--query`)	Score heading-bounded sections by relevance	api.md
Split output (`--split-*`)	Shard a large extraction across files	api.md
TEI-Lite XML (`--xml`)	Wrap a result as TEI-Lite for corpus tooling	api.md
Sitemaps & feeds	Flatten `sitemap.xml`, parse RSS/Atom/JSON Feed	api.md
`seabench`	Benchmark / capability harness — dev tooling, not shipped product	seabench.md

If you only want the core, ignore all of the above: seaportal <url> and seaportal.FromURL(...) never touch them.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.devcontainer		.devcontainer
.github		.github
cmd		cmd
docs		docs
internal		internal
npm		npm
scripts		scripts
skills		skills
testdata		testdata
tests		tests
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
dev		dev
go.mod		go.mod
go.sum		go.sum
seaportal.go		seaportal.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeaPortal

Install

Usage

Accessibility Snapshot

Snapshot Options

As a Library

Features

Detection

Page Classification

Reliability / what to expect

What It Doesn't Do

Core vs. advanced surfaces

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SeaPortal

Install

Usage

Accessibility Snapshot

Snapshot Options

As a Library

Features

Detection

Page Classification

Reliability / what to expect

What It Doesn't Do

Core vs. advanced surfaces

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages