Skip to content

principal-ai/markdown-search

Repository files navigation

@principal-ai/markdown-search

High-performance full-text search for markdown documents using FlexSearch and Bun.

Features

  • 🚀 Fast Performance - Built on Bun runtime for blazing fast file operations
  • 🔍 Full-Text Search - Powered by FlexSearch for efficient indexing and searching
  • 📝 Markdown-Optimized - Understands markdown structure (sections, code blocks, tables, etc.)
  • 🎯 Flexible Searching - Search by document type, language, with fuzzy matching
  • đź’ľ Persistent Indexes - Save and load search indexes for instant startup
  • 🔌 Extensible - Adapter pattern for different platforms (Node, VS Code, etc.)
  • 🏗️ TypeScript - Full TypeScript support with comprehensive types

Installation

bun add @principal-ai/markdown-search

Or with npm:

npm install @principal-ai/markdown-search

Quick Start

import { createSearchEngine } from '@principal-ai/markdown-search';

// Create a search engine instance
const searchEngine = createSearchEngine({
  rootPath: './docs',        // Directory to search
  storagePath: '.search',    // Where to store the index
  indexKey: 'my-docs'        // Name for this index
});

// Initialize and index files
await searchEngine.initialize();
await searchEngine.indexFiles();

// Search for content
const results = await searchEngine.search('your query');

results.forEach(result => {
  console.log(`${result.title} (${result.type})`);
  console.log(`Score: ${result.score}`);
  console.log(`File: ${result.fileName}`);
});

Advanced Usage

Custom Configuration

import { 
  SearchEngine, 
  NodeFileSystemAdapter, 
  NodeStorageAdapter,
  SearchEngineFactory 
} from '@principal-ai/markdown-search';

const searchEngine = new SearchEngine({
  fileSystem: new NodeFileSystemAdapter('./docs'),
  storage: new NodeStorageAdapter('.search-index'),
  searchEngine: SearchEngineFactory.create('flexsearch', {
    // FlexSearch options
    tokenize: 'forward',
    resolution: 9,
    depth: 3,
  })
});

Indexing with Progress

await searchEngine.indexFiles({
  onProgress: (progress) => {
    console.log(`${progress.phase}: ${progress.percentage}%`);
    if (progress.currentFile) {
      console.log(`Processing: ${progress.currentFile}`);
    }
  },
  batchSize: 10,
  indexChunks: true, // Index individual code blocks, tables, etc.
});

Search Options

const results = await searchEngine.search('query', {
  // Filter by document type
  types: ['section', 'code', 'table'],
  
  // Filter by programming language (for code blocks)
  languages: ['typescript', 'javascript'],
  
  // Fuzzy search threshold (0-1)
  fuzzyThreshold: 0.8,
  
  // Pagination
  limit: 10,
  offset: 0,
  
  // Search specific fields
  fields: ['content', 'title'],
  
  // Sort options
  sortBy: 'relevance',
  sortOrder: 'desc'
});

Document Types

The search engine understands different types of markdown content:

  • document - Entire markdown file
  • section - Document sections (based on headings)
  • code - Code blocks with language detection
  • mermaid - Mermaid diagrams
  • table - Markdown tables
  • heading - Individual headings
  • paragraph - Regular text paragraphs
  • list - List items
  • blockquote - Quoted text

Updating the Index

// Update specific files
await searchEngine.updateFiles([
  '/path/to/file1.md',
  '/path/to/file2.md'
]);

// Clear and rebuild index
await searchEngine.clearIndex();
await searchEngine.indexFiles();

Index Management

// Check if index exists
const hasIndex = await searchEngine.hasIndex();

// Get index statistics
const stats = await searchEngine.getStats();
console.log(`Total files: ${stats.totalFiles}`);
console.log(`Total documents: ${stats.totalDocuments}`);

// Export/Import index for backup
const indexData = await searchEngine.getSearchAdapter().exportIndex();
// ... save indexData somewhere ...

// Later, import it back
await searchEngine.getSearchAdapter().importIndex(indexData);

Platform Support

Node.js/Bun (Default)

The package includes built-in adapters for Node.js and Bun environments:

  • NodeFileSystemAdapter - File system operations using Bun's fast APIs
  • NodeStorageAdapter - File-based storage for indexes

VS Code Extension

The package maintains compatibility with VS Code extensions through included VS Code adapters:

import { 
  VSCodeFileSystemAdapter, 
  VSCodeStorageAdapter 
} from '@principal-ai/markdown-search/adapters';

Custom Adapters

You can create custom adapters for other platforms:

class MyCustomFileSystemAdapter implements SearchFileSystemAdapter {
  async findMarkdownFiles(options?: FindOptions): Promise<FileInfo[]> {
    // Your implementation
  }
  
  async readFile(path: string): Promise<string> {
    // Your implementation
  }
  
  // ... other required methods
}

API Reference

SearchEngine

The main class for searching markdown documents.

Constructor

new SearchEngine(config: SearchEngineConfig, indexKey?: string)

Methods

  • initialize(): Promise<void> - Initialize the search engine
  • indexFiles(options?: IndexingOptions): Promise<IndexResult> - Index all markdown files
  • search(query: string, options?: SearchOptions): Promise<SearchResult[]> - Search the index
  • updateFiles(paths: string[], options?: IndexingOptions): Promise<IndexResult> - Update specific files
  • clearIndex(): Promise<void> - Clear the entire index
  • hasIndex(): Promise<boolean> - Check if index exists
  • getStats(): Promise<SearchIndexStats | null> - Get index statistics

Types

See the types.ts file for all available TypeScript types.

Examples

Check the examples directory for more usage examples:

Performance

The package is optimized for performance:

  • Bun Runtime: Leverages Bun's fast file I/O operations
  • Batch Processing: Indexes files in configurable batches
  • Incremental Updates: Only re-index changed files
  • Persistent Indexes: Load pre-built indexes instantly

Development

# Install dependencies
bun install

# Run tests
bun test

# Build
bun run build

# Type checking
bun run typecheck

# Format code
bun run format

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Credits

Built by the A24Z Team as part of the markdown tooling ecosystem.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •