Marklift

URL → Clean Markdown — Fetch a webpage, extract the main content, and convert it to LLM-friendly Markdown. Built for agents and pipelines.

Fetches HTTP(S) URLs with configurable timeout and headers
Source types: website, twitter (Nitter), reddit — inferred from URL when not specified. Medium adapter is removed for now.
Extracts article content with Mozilla Readability (or raw body)
Converts to Markdown with Turndown and custom rules
Optimizes for agents: normalizes spacing, dedupes links, strips tracking params, optional chunking
Typed API and CLI

Requirements: Node.js 18+

Install

npm install marklift

Usage

Programmatic

import { urlToMarkdown } from "marklift";

// source is inferred from URL when omitted (twitter/x.com → twitter, reddit → reddit, else website)
const result = await urlToMarkdown("https://example.com/article", {
  timeout: 10_000,
});
const tweet = await urlToMarkdown("https://x.com/user/status/123"); // uses twitter adapter

console.log(result.title);
console.log(result.markdown);
console.log(result.wordCount, result.sections.length, result.links.length);

CLI

# Install globally to get the `marklift` command
npm install -g marklift

# Convert a URL to Markdown (prints to stdout). Source is inferred from URL.
marklift https://example.com
marklift https://x.com/user/status/123   # uses twitter adapter
marklift https://reddit.com/r/...         # uses reddit adapter

# Output full result as JSON
marklift https://example.com --json

# Options
marklift https://example.com --timeout 15000
marklift https://example.com --chunk-size 2000
marklift https://example.com --source website   # override inferred source

CLI options:

Option	Description
`--source <website\|twitter\|reddit>`	Source adapter (default: inferred from URL). Override when needed.
`--timeout <ms>`	Request timeout in milliseconds (default: 15000)
`--chunk-size <n>`	Split markdown into chunks of ~n characters
`--json`	Output full result as JSON instead of markdown

API

`urlToMarkdown(url, options?)`

Converts a URL to clean Markdown. Returns a Promise<MarkdownResult>.

Options:

Option	Type	Description
`source`	`"website" \| "twitter" \| "reddit"`	Source adapter. Default: inferred from URL (twitter.com/x.com/nitter → twitter, reddit.com → reddit, else website). Override to force a specific adapter.
`timeout`	`number`	Request timeout in ms (default: 15000)
`headers`	`Record<string, string>`	Custom HTTP headers (e.g. `User-Agent`)
`chunkSize`	`number`	If set, `result.chunks` will contain token-safe chunks

Result (MarkdownResult):

url — Original URL
title — Page title
description — Meta description (if present)
markdown — Full markdown with source-specific frontmatter (see below) + body
sections — { heading, content }[] by heading (stable order)
links — Deduplicated links, sorted (tracking params stripped)
wordCount — Approximate word count
contentHash — SHA-256 of optimized markdown (stability checks)
metadata? — Structured metadata (OG, canonical, author, publishedAt, image, language)
chunks? — When chunkSize is set: { content, index, total }[] (no split inside code blocks or tables)

`urlToMarkdownStream(url, options?)`

Async generator that yields MarkdownChunk (meta, sections, links) as they are produced. Useful for streaming into an LLM or pipeline.

Markdown format (per source)

Each adapter outputs markdown with a frontmatter block (--- … ---) then the body.

Website (and reddit). Format type: website. Medium not supported currently.

---
source: https://example.com/article
canonical: https://example.com/article
title: Example Article Title
description: Short meta description
author: John Doe
published_at: 2025-01-12
language: en
content_hash: <sha256>
word_count: 1243
---
# Title

Body content…

Twitter:

---
platform: twitter
source: https://twitter.com/username/status/1234567890
tweet_id: 1234567890
author:
  name: Author Name
published_at: 2025-01-10T18:22:00Z
language: en
content_hash: <sha256>
---
Body content…

Errors

InvalidUrlError — Invalid or non-HTTP(S) URL
FetchError — Network error, timeout, or non-2xx response
ParseError — Readability or parsing failure

Production: Website and reddit adapters use a browser-like User-Agent by default so requests from servers/datacenters get full HTML. The Twitter adapter keeps the Marklift User-Agent so Nitter works. Override via headers if needed.

Example

import { urlToMarkdown, urlToMarkdownStream } from "marklift";

// One-shot (source inferred from URL)
const result = await urlToMarkdown("https://blog.example.com/post", {
  timeout: 10_000,
  chunkSize: 2000,
});
console.log(result.title, result.wordCount);
if (result.chunks) {
  for (const chunk of result.chunks) {
    // Send chunk to LLM, etc.
  }
}

// Streaming
for await (const chunk of urlToMarkdownStream(
  "https://blog.example.com/post"
)) {
  process.stdout.write(chunk.content);
}

Testing

npm test          # unit + E2E (E2E needs network)
npm run test:unit # unit only (no network)
npm run test:e2e  # E2E with real URLs only

Set SKIP_E2E=1 to skip E2E tests (e.g. in CI without network).

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup, code style, and how to submit changes.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
image.png		image.png
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Marklift

Install

Usage

Programmatic

CLI

API

`urlToMarkdown(url, options?)`

`urlToMarkdownStream(url, options?)`

Markdown format (per source)

Errors

Example

Testing

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

starc007/marklift

Folders and files

Latest commit

History

Repository files navigation

Marklift

Install

Usage

Programmatic

CLI

API

urlToMarkdown(url, options?)

urlToMarkdownStream(url, options?)

Markdown format (per source)

Errors

Example

Testing

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`urlToMarkdown(url, options?)`

`urlToMarkdownStream(url, options?)`

Packages