Skip to content

Commit ab20218

Browse files
committed
feat: add support to more languages
1 parent c80e37d commit ab20218

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+8466
-50
lines changed

CLAUDE.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
ts-syntax-highlighter is a blazing-fast, TypeScript-native syntax highlighter with comprehensive grammar support for 6 modern web languages (JavaScript/JSX, TypeScript/TSX, HTML, CSS, JSON, STX). Built for performance with both synchronous and asynchronous tokenization modes.
8+
9+
## Monorepo Structure
10+
11+
This is a Bun-based monorepo with workspaces:
12+
- `packages/ts-syntax-highlighter/` - Main syntax highlighter library
13+
- `packages/benchmarks/` - Performance benchmarking suite
14+
- `docs/` - VitePress documentation
15+
16+
Within `packages/ts-syntax-highlighter/`:
17+
- `src/` - Source code with modular architecture
18+
- `test/` - Test files (416 tests total)
19+
- `bin/` - CLI implementation
20+
- `examples/` - Usage examples
21+
22+
## Development Commands
23+
24+
```bash
25+
# Build the library (uses build.ts with bun-plugin-dtsx)
26+
bun run build
27+
28+
# Run all tests (uses Bun's native test runner)
29+
bun test
30+
31+
# Type checking
32+
bun run typecheck
33+
34+
# Linting
35+
bun run lint
36+
bun run lint:fix
37+
38+
# Run benchmarks
39+
bun run bench
40+
41+
# Documentation
42+
bun run dev:docs
43+
bun run build:docs
44+
```
45+
46+
### Testing Commands
47+
48+
Run all tests:
49+
```bash
50+
bun test
51+
```
52+
53+
Run a specific test file:
54+
```bash
55+
bun test packages/ts-syntax-highlighter/test/tokenizer.test.ts
56+
```
57+
58+
Run tests matching a pattern:
59+
```bash
60+
bun test --test-name-pattern "should tokenize"
61+
```
62+
63+
## Architecture
64+
65+
### Core Components
66+
67+
1. **Tokenizer** (`src/tokenizer.ts`, `src/fast-tokenizer.ts`)
68+
- Main tokenization engine with sync and async modes
69+
- Uses grammar patterns to identify tokens
70+
- Fast mode optimized for performance with minimal overhead
71+
- Standard mode includes full token metadata (scopes, offsets, etc.)
72+
73+
2. **Highlighter** (`src/highlighter.ts`)
74+
- High-level API orchestrating tokenization and rendering
75+
- Manages language/theme loading
76+
- Optional caching system for highlighted code
77+
- Entry point: `createHighlighter(options)` factory function
78+
79+
3. **Renderer** (`src/renderer.ts`)
80+
- Converts tokens to HTML/ANSI output
81+
- Handles line numbers, highlighting, annotations
82+
- Supports transformers for custom token/line processing
83+
84+
4. **Grammars** (`src/grammars/`)
85+
- Language-specific tokenization rules
86+
- Each grammar defines patterns, scopes, and keyword tables
87+
- Languages: javascript.ts, typescript.ts, html.ts, css.ts, json.ts, stx.ts
88+
- TextMate-inspired grammar format with optimizations
89+
90+
5. **Themes** (`src/themes/`)
91+
- Color schemes for rendering
92+
- VSCode-compatible theme format
93+
- Supports light/dark modes via `src/dual-theme.ts`
94+
95+
6. **Plugins** (`src/plugins.ts`)
96+
- Extensibility system for custom languages/themes/transformers
97+
98+
7. **Transformers** (`src/transformers.ts`)
99+
- Token and line transformers for advanced features
100+
- Line highlighting, dimming, blurring, focus modes
101+
- Added/removed line indicators for diffs
102+
103+
### Key Design Patterns
104+
105+
- **Dual-mode tokenization**: Async (fast) and sync modes for different use cases
106+
- **Grammar-based**: TextMate-style patterns with keyword optimization
107+
- **Plugin architecture**: Extensible for custom languages and transformers
108+
- **Streaming support**: `src/streaming.ts` for large files
109+
- **Performance profiling**: `src/profiler.ts` for optimization
110+
- **Language detection**: `src/detect.ts` for automatic language identification
111+
112+
## TypeScript Conventions
113+
114+
- Use TypeScript for all code; **prefer interfaces over types**
115+
- **Avoid enums**; use maps or `as const` instead
116+
- Avoid usage of `any`
117+
- Use functional and declarative patterns; avoid classes unless needed
118+
- Use descriptive variable names with auxiliary verbs (e.g., `isLoading`, `hasError`)
119+
- If browser and Node implementations are both valid, prefer browser version
120+
121+
## Testing Guidelines
122+
123+
- Write tests for all functions, types, and interfaces
124+
- Use Bun's native test modules: `import { describe, expect, it } from 'bun:test'`
125+
- Aim for 100% test coverage
126+
- Test files mirror source structure in `test/` directory
127+
- Include accurate examples in docblocks
128+
129+
Example test structure:
130+
```typescript
131+
import { describe, expect, it } from 'bun:test'
132+
import { functionToTest } from '../src/module'
133+
134+
describe('Feature Name', () => {
135+
it('should do something specific', () => {
136+
expect(functionToTest('input')).toBe('expected')
137+
})
138+
})
139+
```
140+
141+
## Code Style
142+
143+
- Write concise, technical TypeScript with accurate docblock examples
144+
- If Bun native modules are available, use them
145+
- Prefer iteration and modularization over duplication
146+
- Use proper JSDoc comments for functions, types, interfaces
147+
- Ensure logging is comprehensive for debugging
148+
- Reuse eslint-ignore comments where present
149+
150+
## Build System
151+
152+
- Uses Bun.build with `bun-plugin-dtsx` for type definitions
153+
- Build configuration in `packages/ts-syntax-highlighter/build.ts`
154+
- Outputs to `dist/` with minification
155+
- CLI compilation for multiple platforms via compile scripts
156+
- Type definitions generated alongside JavaScript output
157+
158+
## Token Flow
159+
160+
1. **Input**: Code string + language ID
161+
2. **Grammar Loading**: Get grammar for language from `src/grammars/`
162+
3. **Tokenization**: Apply patterns line-by-line, generating tokens with scopes
163+
4. **Transformation** (optional): Apply token/line transformers
164+
5. **Rendering**: Convert tokens to HTML/ANSI with theme colors
165+
6. **Output**: Rendered code with metadata
166+
167+
## Performance Considerations
168+
169+
- Async mode is ~40% faster than sync mode
170+
- Grammar patterns ordered by frequency for optimization
171+
- Pre-compiled regex with minimal backtracking
172+
- Memory efficient: ~3x source code size
173+
- Caching available for repeated highlighting
174+
- Streaming mode for large files to avoid memory spikes
175+
176+
## CLI Usage
177+
178+
The package includes a CLI tool (bin/cli.ts) compiled to standalone binaries for multiple platforms.

README.md

Lines changed: 72 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ A blazing-fast, TypeScript-native syntax highlighter with comprehensive grammar
1313
## Features
1414

1515
-**Blazing Fast** - Highly optimized tokenization with async and sync modes for maximum performance
16-
- 🎨 **6 Languages** - JavaScript/JSX, TypeScript/TSX, HTML, CSS, JSON, and STX
16+
- 🎨 **48 Languages** - Comprehensive support for web, system, and specialized languages
1717
- 🔥 **Modern Syntax** - Full support for ES2024+, BigInt, numeric separators, optional chaining, and more
1818
- ⚛️ **JSX/TSX Support** - Complete React and TypeScript JSX highlighting
1919
- 🎯 **CSS4 Features** - Modern color functions _(hwb, lab, lch, oklab, oklch)_, container queries, CSS layers
@@ -54,59 +54,73 @@ function add(a: number, b: number): number {
5454

5555
## Supported Languages
5656

57-
### JavaScript/JSX
58-
59-
- ES2024+ features (BigInt, numeric separators, optional chaining, nullish coalescing)
60-
- JSX elements and expressions
61-
- Template literals with expressions
62-
- Regex literals with all flags
63-
- Async/await, generators
64-
- Modern operators: `?.`, `??`, `?.[]`, `?.()`
65-
66-
### TypeScript/TSX
67-
68-
- All JavaScript features plus:
69-
- Type annotations and assertions
70-
- Interfaces, types, enums
71-
- Generics and type parameters
72-
- TypeScript-specific operators: `is`, `keyof`, `infer`
73-
- TSX (TypeScript + JSX)
74-
- Utility types
75-
76-
### HTML
77-
78-
- HTML5 elements
79-
- Data attributes (`data-*`)
80-
- ARIA attributes (`aria-*`)
81-
- Event handlers (`onclick`, `onload`, etc.)
82-
- HTML entities
83-
- DOCTYPE declarations
84-
85-
### CSS
86-
87-
- Modern color functions: `hwb()`, `lab()`, `lch()`, `oklab()`, `oklch()`, `color()`
88-
- Math functions: `calc()`, `min()`, `max()`, `clamp()`, `round()`, `abs()`, `sign()`
89-
- Trigonometric: `sin()`, `cos()`, `tan()`, `asin()`, `acos()`, `atan()`
90-
- Gradients: `linear-gradient()`, `radial-gradient()`, `conic-gradient()`
91-
- At-rules: `@media`, `@keyframes`, `@supports`, `@container`, `@layer`, `@property`
92-
- CSS custom properties (variables): `--custom-property`, `var()`
93-
94-
### JSON
95-
96-
- Objects and arrays
97-
- Strings with proper escape sequences
98-
- Numbers (including scientific notation)
99-
- Booleans and null
100-
- Invalid escape detection
57+
ts-syntax-highlighter supports **48 languages** across web development, systems programming, data formats, and specialized domains:
58+
59+
### Web & Frontend (10 languages)
60+
- **JavaScript/JSX** - ES2024+, async/await, template literals, JSX
61+
- **TypeScript/TSX** - Type annotations, generics, interfaces, TSX
62+
- **HTML** - HTML5, data/aria attributes, event handlers
63+
- **CSS** - Modern features, custom properties, CSS4 functions
64+
- **SCSS/Sass** - Variables, nesting, mixins, functions
65+
- **Vue** - Single-file components, directives, templates
66+
- **GraphQL** - Queries, mutations, fragments, directives
67+
- **Markdown** - Headings, emphasis, code blocks, tables, links
68+
- **XML** - Elements, attributes, CDATA, namespaces
69+
- **LaTeX** - Commands, environments, math mode
70+
71+
### System & Compiled Languages (10 languages)
72+
- **C** - Preprocessor, pointers, functions
73+
- **C++** - Classes, templates, namespaces, modern C++
74+
- **Rust** - Ownership, traits, macros, lifetimes
75+
- **Go** - Goroutines, interfaces, defer
76+
- **C#** - LINQ, async/await, properties, attributes
77+
- **Java** - Classes, generics, annotations
78+
- **Swift** - Optionals, protocols, closures
79+
- **Kotlin** - Null safety, coroutines, data classes
80+
- **Dart** - Async, mixins, null safety
81+
82+
### Scripting & Dynamic Languages (5 languages)
83+
- **Python** - Functions, classes, decorators, f-strings
84+
- **Ruby** - Blocks, symbols, string interpolation
85+
- **PHP** - Variables, functions, classes, HTML embedding
86+
- **Lua** - Tables, functions, metatables
87+
- **R** - Vectors, data frames, statistical functions
88+
89+
### Data & Configuration (8 languages)
90+
- **JSON** - Objects, arrays, proper escaping
91+
- **JSONC** - JSON with comments
92+
- **JSON5** - Unquoted keys, trailing commas, comments
93+
- **YAML** - Keys, nested structures, anchors
94+
- **TOML** - Tables, arrays, data types
95+
- **CSV** - Comma-separated values
96+
- **IDL** - Interface definitions
97+
- **Protobuf** - Messages, services, enums
98+
99+
### Shell & DevOps (7 languages)
100+
- **Bash/Shell** - Variables, pipes, functions, control flow
101+
- **PowerShell** - Cmdlets, pipelines, parameters
102+
- **Dockerfile** - Instructions, multi-stage builds
103+
- **Makefile** - Targets, variables, directives
104+
- **Terraform/HCL** - Resources, variables, interpolation
105+
- **Nginx** - Directives, server blocks, locations
106+
- **CMD/Batch** - Commands, variables, control flow
107+
108+
### Specialized & Other (8 languages)
109+
- **SQL** - Queries, DDL, DML, functions
110+
- **Diff/Patch** - File headers, hunks, added/removed lines
111+
- **RegExp** - Character classes, groups, quantifiers
112+
- **BNF** - Grammar notation
113+
- **ABNF** - Augmented BNF
114+
- **Solidity** - Smart contracts, events, modifiers
115+
- **Log** - Timestamps, log levels, stack traces
116+
- **Text/Plain** - Plain text (no highlighting)
101117

102118
### STX
103-
104119
- Blade-like templating syntax
105120
- 50+ directives
106121
- Components, layouts, includes
107122
- Control flow, loops
108123
- Authentication, authorization
109-
- And much more
110124

111125
## Performance
112126

@@ -151,14 +165,23 @@ bun run bench
151165
```typescript
152166
import { Tokenizer } from 'ts-syntax-highlighter'
153167

154-
// Create tokenizer for a specific language
155-
const tokenizer = new Tokenizer('javascript' | 'typescript' | 'html' | 'css' | 'json' | 'stx')
168+
// Create tokenizer for a specific language (48 languages supported)
169+
const tokenizer = new Tokenizer('javascript') // or any of the 48 supported languages
156170

157171
// Async tokenization (faster, recommended)
158172
const tokens = await tokenizer.tokenizeAsync(code: string)
159173

160174
// Sync tokenization
161175
const tokens = tokenizer.tokenize(code: string)
176+
177+
// Supported languages:
178+
// 'javascript', 'typescript', 'html', 'css', 'json', 'stx',
179+
// 'bash', 'markdown', 'yaml', 'jsonc', 'diff', 'python',
180+
// 'php', 'java', 'c', 'cpp', 'rust', 'csharp', 'dockerfile',
181+
// 'ruby', 'go', 'sql', 'idl', 'text', 'json5', 'vue', 'toml',
182+
// 'scss', 'kotlin', 'swift', 'dart', 'r', 'graphql', 'powershell',
183+
// 'makefile', 'terraform', 'bnf', 'regexp', 'lua', 'cmd', 'abnf',
184+
// 'csv', 'log', 'nginx', 'xml', 'protobuf', 'solidity', 'latex'
162185
```
163186

164187
### Language Detection

0 commit comments

Comments
 (0)