Commit f4b75b6
committed
perf(formatter): pre-allocate enough space for the FormatElement buffer (#15422)
# VecBuffer Capacity Analysis
## Overview
This document explains the empirical analysis that determined the optimal buffer capacity allocation for the formatter's `VecBuffer`.
## Data Source
Analysis of **4,891 files** from the **VSCode repository** formatter test runs, measuring:
- Source text length (input)
- Formatted document length (output buffer requirement)
The VSCode repository provides a comprehensive real-world dataset with diverse JavaScript/TypeScript patterns, file sizes, and coding styles, making it an ideal benchmark for formatter capacity optimization.
## Key Findings
### Overall Statistics
| Metric | Value | Interpretation |
|--------|-------|----------------|
| **Median ratio** | 0.194 (19.4%) | Half of files need ≤19.4% of source length |
| **Average ratio** | 0.189 (18.9%) | Typical formatted size |
| **75th percentile** | 0.254 (25.4%) | 75% of files need ≤25.4% |
| **90th percentile** | 0.314 (31.4%) | 90% of files need ≤31.4% |
| **95th percentile** | 0.355 (35.5%) | 95% of files need ≤35.5% |
| **99th percentile** | 0.477 (47.7%) | 99% of files need ≤47.7% |
| **Max observed** | 0.947 (94.7%) | Extreme outlier case |
### Buffer Requirements by File Size
| File Size Range | Files | Median | 95th Percentile | 99th Percentile | Example (95th) |
|-----------------|-------|--------|-----------------|-----------------|----------------|
| **< 1KB** | 277 | 0.126 | **0.300** | 0.779 | 500B → 150B |
| **1KB - 5KB** | 1,772 | 0.190 | **0.360** | 0.462 | 3KB → 1.08KB |
| **5KB - 10KB** | 1,002 | 0.206 | **0.377** | 0.454 | 7.5KB → 2.83KB |
| **10KB - 50KB** | 1,628 | 0.202 | **0.346** | 0.482 | 30KB → 10.38KB |
| **> 50KB** | 212 | 0.193 | **0.302** | 0.348 | 100KB → 30.2KB |
**Key Insight**: The 95th percentile ranges from 0.30 to 0.38 across all file sizes, showing consistent behavior regardless of file size.
## New Implementation
### Chosen Formula
```rust
let capacity = (context.source_text().len() * 2) / 5; // 0.4 multiplier
```
### How 0.4 Was Derived
1. **Identified worst-case 95th percentile**: 0.377 (5KB-10KB files)
2. **Added safety margin**: 0.377 → 0.40
3. **Verified universal coverage**:
- All size ranges have 95th percentile ≤ 0.377
- 0.4 > 0.377, so it covers 95%+ of all file sizes
4. **Chose clean fraction**: `2/5` for efficient integer arithmetic
### Benefits
| Aspect | Improvement |
|--------|-------------|
| **Small files** | 7x memory reduction (from 133% to 40%) |
| **Large files** | Slight increase (from 33% to 40%, +21%) |
| **Coverage** | 95%+ files avoid reallocation |
| **Code simplicity** | No branching needed |
| **Universality** | Single formula for all file sizes |
## Performance Characteristics
- **Memory efficiency**: Allocates only ~2x actual need (40% vs 19% median)
- **Reallocation rate**: <5% of files will need buffer growth
- **Safety margin**: 12% headroom above worst-case 95th percentile
- **Trade-off**: Accepts rare reallocations for 5% of files to save memory on the other 95%
## Validation
The formula was validated across:
- 277 tiny files (<1KB)
- 1,772 small files (1-5KB)
- 1,002 medium files (5-10KB)
- 1,628 large files (10-50KB)
- 212 very large files (>50KB)
All size ranges showed consistent 95th percentile requirements between 0.30-0.38, confirming that a universal 0.4 multiplier is optimal.
## Conclusion
The **0.4 multiplier** (`capacity = source_len * 2 / 5`) provides the best balance between:
- Memory efficiency (60% savings vs old small-file allocation)
- Performance (95%+ hit rate without reallocation)
- Code simplicity (no conditional logic)
- Universal applicability (works for all file sizes)
This is a data-driven optimization based on real-world formatter usage across thousands of files from the VSCode repository, representing production-grade JavaScript/TypeScript code patterns.1 parent ee035b4 commit f4b75b6
1 file changed
+7
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
357 | 363 | | |
358 | | - | |
| 364 | + | |
359 | 365 | | |
360 | 366 | | |
361 | 367 | | |
| |||
0 commit comments