@@ -36,8 +36,10 @@ When you finish a task, always run `pnpm typecheck` to ensure that the code is t
3636- Test all: ` pnpm test `
3737- Test single file: ` pnpm test path/to/test.ts `
3838- Test with pattern: ` pnpm test -t "test pattern" `
39- - Test GitHub markdown: ` pnpm test:github `
40- - Development: ` pnpm dev:prepare `
39+ - Test folder: ` pnpm test test/unit/plugins/ `
40+ - Development build (stub): ` pnpm dev:prepare `
41+ - Live test with real sites: ` pnpm test:github:live ` , ` pnpm test:wiki:file `
42+ - Benchmarking: ` pnpm bench:stream ` , ` pnpm bench:string `
4143
4244## Code Style Guidelines
4345- Indentation: 2 spaces
@@ -52,11 +54,25 @@ When you finish a task, always run `pnpm typecheck` to ensure that the code is t
5254- Follow ESLint config based on @antfu/eslint-config
5355
5456## Project Architecture
55- - Core modules:
56- - ` parser.ts ` : Handles HTML parsing into a DOM-like structure
57- - ` markdown.ts ` : Transforms DOM nodes to Markdown
58- - ` htmlStreamAdapter.ts ` : Manages HTML streaming conversion
59- - ` index.ts ` : Main entry point with primary API functions
57+
58+ ### Core Architecture
59+ - ` src/index.ts ` : Main entry point with ` syncHtmlToMarkdown ` and ` streamHtmlToMarkdown ` APIs
60+ - ` src/parser.ts ` : Manual HTML parsing into DOM-like structure for performance
61+ - ` src/markdown.ts ` : DOM node to Markdown transformation logic
62+ - ` src/stream.ts ` : Streaming HTML processing with content-based buffering
63+ - ` src/types.ts ` : Core TypeScript interfaces for nodes, plugins, and state management
64+
65+ ### Plugin System
66+ - ` src/pluggable/plugin.ts ` : Plugin creation utilities and base interfaces
67+ - ` src/plugins/ ` : Built-in plugins (filter, extraction, tailwind, readability, etc.)
68+ - ` src/libs/query-selector.ts ` : CSS selector parsing logic shared across plugins
69+ - Plugin hooks: ` beforeNodeProcess ` , ` onNodeEnter ` , ` onNodeExit ` , ` processTextNode `
70+
71+ ### Key Concepts
72+ - ** Node Types** : ElementNode (HTML elements) and TextNode (text content) with parent/child relationships
73+ - ** Streaming Architecture** : Processes HTML incrementally using buffer regions and optimal chunk boundaries
74+ - ** Plugin Pipeline** : Each plugin can intercept and transform content at different processing stages
75+ - ** Memory Efficiency** : Immediate processing and callback patterns to avoid collecting large data structures
6076
6177## Technical Details
6278- Parser: Manual HTML parsing for performance, doesn't use browser DOM
@@ -75,12 +91,25 @@ When you finish a task, always run `pnpm typecheck` to ensure that the code is t
7591 - ` --chunk-size <size> ` : Controls stream chunking (default: 4096)
7692 - ` -v, --verbose ` : Enables debug logging
7793
78- Always run tests after making changes to ensure backward compatibility.
94+ ## CLI and Testing
95+
96+ ### CLI Usage
97+ - Processes HTML from stdin, outputs Markdown to stdout
98+ - Test with live sites: ` curl -s https://example.com | node ./bin/mdream.mjs --origin https://example.com `
99+ - Key CLI options: ` --origin <url> ` , ` -v/--verbose ` , ` --chunk-size <size> `
79100
80- ## Docs
101+ ### Testing Strategy
102+ - Comprehensive test coverage in ` test/unit/ ` and ` test/integration/ `
103+ - Plugin tests in ` test/unit/plugins/ ` - always add tests for new plugins
104+ - Real-world test fixtures in ` test/fixtures/ ` (GitHub, Wikipedia HTML)
105+ - Template tests for complex HTML structures (navigation, tables, etc.)
106+ - Always run tests after making changes to ensure backward compatibility
81107
82- Please reference the following docs:
108+ ## Plugin Development
83109
84- - @docs/plugin-api .md
85- - @docs/plugins .md
86- - @docs/plugin-api .md
110+ When creating new plugins:
111+ 1 . Use CSS selectors from ` src/libs/query-selector.ts ` for element matching
112+ 2 . Implement memory-efficient patterns (immediate callbacks vs. collecting data)
113+ 3 . Add comprehensive tests covering edge cases and real-world scenarios
114+ 4 . Follow existing plugin patterns in ` src/plugins/ ` directory
115+ 5 . Export from ` src/plugins.ts ` for public API access
0 commit comments