|
22 | 22 |
|
23 | 23 | - 🧠 Custom built HTML to Markdown Convertor Optimized for LLMs (~50% fewer tokens) |
24 | 24 | - 🔍 Generates [Minimal](./packages/mdream/src/preset/minimal.ts) GitHub Flavored Markdown: Frontmatter, Nested & HTML markup support. |
25 | | -- ✂️ LangChain compatible [Markdown Text Splitter](#text-splitter) for single-pass chunking. |
| 25 | +- ✂️ LangChain compatible [Markdown Text Splitter](./packages/mdream/README.md#markdown-splitting) for single-pass chunking. |
26 | 26 | - 🚀 Ultra Fast: Stream 1.4MB of HTML to markdown in ~50ms. |
27 | 27 | - ⚡ Tiny: 6kB gzip, zero dependency core. |
28 | 28 | - ⚙️ Run anywhere: [CLI Crawler](#mdream-crawl), [Docker](#docker-usage), [GitHub Actions](#github-actions-integration), [Vite](#vite-integration), & more. |
@@ -329,71 +329,16 @@ const markdown = htmlToMarkdown('<h1>Hello World</h1>') |
329 | 329 | console.log(markdown) // # Hello World |
330 | 330 | ``` |
331 | 331 |
|
332 | | -See the [Mdream Package README](./packages/mdream/README.md) for complete documentation on API usage, streaming, presets, and the plugin system. |
| 332 | +**Core Functions:** |
| 333 | +- [htmlToMarkdown](./packages/mdream/README.md#api-usage) - Convert HTML to Markdown |
| 334 | +- [streamHtmlToMarkdown](./packages/mdream/README.md#api-usage) - Stream HTML to Markdown |
| 335 | +- [parseHtml](./packages/mdream/README.md#api-usage) - Parse HTML to AST |
333 | 336 |
|
334 | | -## Text Splitter |
335 | | - |
336 | | -Mdream includes a [LangChain](https://python.langchain.com/api_reference/text_splitters/markdown/langchain_text_splitters.markdown.ExperimentalMarkdownSyntaxTextSplitter.html) compatible Markdown splitter that runs efficiently in single pass. |
337 | | - |
338 | | -This provides significant performance improvements over traditional multi-pass splitters and allows |
339 | | -you to integrate with your custom Mdream plugins. |
340 | | - |
341 | | -```ts |
342 | | -import { htmlToMarkdownSplitChunks } from 'mdream/splitter' |
343 | | - |
344 | | -const chunks = await htmlToMarkdownSplitChunks('<h1>Hello World</h1><p>This is a paragraph.</p>', { |
345 | | - chunkSize: 1000, |
346 | | - chunkOverlap: 200, |
347 | | -}) |
348 | | -console.log(chunks) // Array of text chunks |
349 | | -``` |
350 | | - |
351 | | -See the [Text Splitter Documentation](./packages/mdream/docs/splitter.md) for complete usage and configuration. |
352 | | - |
353 | | -## Streaming llms.txt Generation |
354 | | - |
355 | | -Generate `llms.txt` and `llms-full.txt` files by streaming pages to disk without keeping full content in memory. Ideal for programmatic generation from crawlers or build systems. |
356 | | - |
357 | | -```ts |
358 | | -import { createLlmsTxtStream } from 'mdream/llms-txt' |
359 | | - |
360 | | -const stream = createLlmsTxtStream({ |
361 | | - siteName: 'My Docs', |
362 | | - description: 'Documentation site', |
363 | | - origin: 'https://example.com', |
364 | | - generateFull: true, |
365 | | - outputDir: './dist', |
366 | | -}) |
367 | | - |
368 | | -const writer = stream.getWriter() |
369 | | - |
370 | | -// Stream pages as they're processed |
371 | | -await writer.write({ |
372 | | - title: 'Home', |
373 | | - content: '# Welcome\n\nHome page content.', |
374 | | - url: '/', |
375 | | - metadata: { description: 'Welcome page' }, |
376 | | -}) |
377 | | - |
378 | | -await writer.write({ |
379 | | - title: 'About', |
380 | | - content: '# About\n\nAbout page content.', |
381 | | - url: '/about', |
382 | | -}) |
383 | | - |
384 | | -await writer.close() |
385 | | -``` |
386 | | - |
387 | | -**Options:** |
388 | | -- `siteName` - Site name for header (default: 'Site') |
389 | | -- `description` - Site description for header |
390 | | -- `origin` - Base URL to prepend to relative URLs |
391 | | -- `generateFull` - Generate llms-full.txt with complete content (default: false) |
392 | | -- `outputDir` - Directory to write files (default: process.cwd()) |
393 | | - |
394 | | -**Output:** |
395 | | -- `llms.txt` - List of pages with titles and descriptions |
396 | | -- `llms-full.txt` - Complete page content with frontmatter (if `generateFull: true`) |
| 337 | +**Utilities:** |
| 338 | +- [Presets](./packages/mdream/README.md#presets) - Pre-configured plugin combinations |
| 339 | +- [Plugin System](./packages/mdream/README.md#plugin-system) - Customize conversion behavior |
| 340 | +- [Markdown Splitting](./packages/mdream/README.md#markdown-splitting) - Split HTML into chunks |
| 341 | +- [llms.txt Generation](./packages/mdream/README.md#llmstxt-generation) - Generate llms.txt artifacts |
397 | 342 |
|
398 | 343 | ## Credits |
399 | 344 |
|
|
0 commit comments