Skip to content

Add Borsh Internals & Performance Optimization Guide #3

@rz1989s

Description

@rz1989s

Problem Statement

Context7 Benchmark Impact: Q7 scored 82/100, Q1 needs performance content

Q7 Feedback (82/100):

"Lacks explicit explanation of the internal mechanisms LUMOS uses to ensure Borsh compatibility (field ordering guarantees, discriminant assignments for enums, serialization algorithm specifics)"

Q1 Feedback (82/100):

"Minimal coverage of performance implications for deeply nested structures or large vectors"

Critical Gaps:

  • No Borsh compatibility mechanism documentation
  • No field ordering guarantee explanations
  • No discriminant assignment rules
  • No performance optimization guides

Proposed Solution

Add comprehensive documentation explaining Borsh internals and performance optimization.

1. New Page: docs/internals/borsh.md

---
title: Borsh Internals
description: How LUMOS ensures Borsh serialization compatibility
---

# Borsh Serialization Internals

## Overview

LUMOS uses [Borsh](https://borsh.io) for deterministic serialization. This guide explains how LUMOS ensures byte-for-byte compatibility between Rust and TypeScript.

## Field Ordering Guarantees

### Struct Field Order

LUMOS preserves **declaration order** during serialization:

```rust
struct Player {
    wallet: PublicKey,  // Serialized first (bytes 0-31)
    level: u16,         // Serialized second (bytes 32-33)
    score: u64,         // Serialized third (bytes 34-41)
}

Binary layout:

[32 bytes: wallet][2 bytes: level][8 bytes: score]
Total: 42 bytes

Key guarantee: Fields serialize in the exact order they appear in the schema.

Why Order Matters

Changing field order breaks deserialization:

// ❌ BREAKING: Order changed
struct Player {
    level: u16,    // Now at bytes 0-1
    wallet: PublicKey,  // Now at bytes 2-33
    score: u64,    // Now at bytes 34-41
}

Result: Cannot deserialize old data with new schema.

Enum Discriminant Assignment

Sequential Assignment

LUMOS assigns discriminants sequentially (0, 1, 2...):

enum GameState {
    Active,     // discriminant: 0
    Paused,     // discriminant: 1
    Finished,   // discriminant: 2
}

Binary format:

  • GameState::Active0x00
  • GameState::Paused0x01
  • GameState::Finished0x02

Discriminant Stability

Adding variants at the end maintains compatibility:

// v1
enum State {
    Active,   // 0
    Paused,   // 1
}

// v2 (backward compatible)
enum State {
    Active,   // 0 (unchanged)
    Paused,   // 1 (unchanged)
    Finished, // 2 (new)
}

Inserting variants breaks compatibility:

// ❌ BREAKING: Inserted variant
enum State {
    Active,   // 0 (unchanged)
    Finished, // 1 (NEW - shifts Paused\!)
    Paused,   // 2 (WAS 1 - BREAKING\!)
}

Type Encoding Rules

Primitive Types (Little-Endian)

Type Size Example Binary (hex)
u8 1 byte 255 FF
u16 2 bytes 1000 E8 03
u32 4 bytes 1000000 40 42 0F 00
u64 8 bytes 1000000000 00 CA 9A 3B 00 00 00 00
u128 16 bytes Large value Little-endian
bool 1 byte true 01, false00

Strings

Format: [4-byte length][UTF-8 bytes]

"hello" →
  [0x05 0x00 0x00 0x00]  // length: 5
  [0x68 0x65 0x6C 0x6C 0x6F]  // "hello"

Vectors

Format: [4-byte length][element₁][element₂]...[elementₙ]

Vec<u16>([10, 20, 30])[0x03 0x00 0x00 0x00]  // length: 3
  [0x0A 0x00]  // 10
  [0x14 0x00]  // 20
  [0x1E 0x00]  // 30

Options

Format: [1-byte discriminant][value if Some]

None → 0x00
Some(value) → 0x01 [serialized value]

PublicKey (Solana)

Fixed 32 bytes (no length prefix):

Pubkey → [32 bytes of public key data]

Rust ↔ TypeScript Compatibility

Type Mapping

LUMOS ensures these types serialize identically:

LUMOS Rust TypeScript Borsh Encoding
u64 u64 number 8 bytes LE
u128 u128 bigint 16 bytes LE
PublicKey Pubkey PublicKey 32 bytes
String String string Length prefix + UTF-8
Vec<T> Vec<T> T[] Length prefix + elements
Option<T> Option<T> T | undefined Discriminant + value

Generated Schemas Match

Rust:

#[derive(BorshSerialize, BorshDeserialize)]
pub struct Player {
    pub wallet: Pubkey,
    pub level: u16,
    pub score: u64,
}

TypeScript:

export const PlayerSchema = {
  struct: {
    wallet: { array: { type: 'u8', len: 32 } },
    level: 'u16',
    score: 'u64',
  },
};

Binary output: Identical byte-for-byte.

Anchor Integration

Account Discriminator

Anchor adds 8-byte discriminator prefix:

[8-byte discriminator][borsh-serialized data]

Discriminator calculation:

let discriminator = hash("account:PlayerAccount")[..8];

LUMOS handles this automatically when using #[account] attribute.

Instruction Discriminators

Similar for instructions:

[8-byte discriminator][borsh-serialized args]

Compatibility Verification

Round-Trip Testing

LUMOS generates compatible code if this passes:

#[test]
fn test_borsh_roundtrip() {
    let original = Player {
        wallet: Pubkey::new_unique(),
        level: 10,
        score: 1000,
    };
    
    // Serialize
    let bytes = original.try_to_vec().unwrap();
    
    // Deserialize
    let deserialized = Player::try_from_slice(&bytes).unwrap();
    
    assert_eq\!(original, deserialized);
}

Cross-Language Testing

// Serialize in TypeScript
const player = { wallet: new PublicKey(...), level: 10, score: 1000 };
const bytes = borsh.serialize(PlayerSchema, player);

// Deserialize in Rust (should match exactly)
let player = Player::try_from_slice(&bytes).unwrap();
assert_eq\!(player.level, 10);

### 2. New Page: `docs/guide/performance.md`

```markdown
---
title: Performance Optimization
description: Optimize LUMOS schemas for performance and cost
---

# Performance Optimization

## Account Size Optimization

### Minimize Account Size

Smaller accounts = lower rent costs:

```rust
// ❌ Wasteful (16 bytes)
struct Player {
    level: u64,   // Max level is 100
    health: u64,  // Max health is 1000
}

// ✅ Optimized (3 bytes)
struct Player {
    level: u8,    // 0-255 is sufficient
    health: u16,  // 0-65,535 is sufficient
}

// Savings: 13 bytes = 81% reduction

Rent calculation:

3 bytes: ~0.0000069 SOL
16 bytes: ~0.0000371 SOL
Savings: 81% per account

Use Bit Flags for Booleans

// ❌ Multiple booleans (4 bytes)
struct Permissions {
    can_read: bool,
    can_write: bool,
    can_delete: bool,
    can_admin: bool,
}

// ✅ Bit flags (1 byte)
struct Permissions {
    flags: u8,  // 8 flags in 1 byte
}

impl Permissions {
    const READ: u8 = 1 << 0;
    const WRITE: u8 = 1 << 1;
    const DELETE: u8 = 1 << 2;
    const ADMIN: u8 = 1 << 3;
    
    pub fn can_read(&self) -> bool {
        self.flags & Self::READ \!= 0
    }
}

Flatten Nested Structures

// ❌ Deeply nested
struct Player {
    info: PlayerInfo,
}
struct PlayerInfo {
    stats: PlayerStats,
}
struct PlayerStats {
    level: u16,
    health: u16,
}

// ✅ Flattened
struct Player {
    level: u16,
    health: u16,
}

Serialization Performance

Avoid Large Vectors in Hot Paths

// ❌ Unbounded vector
struct Player {
    inventory: Vec<Item>,  // Could be 1000+ items
}

// ✅ Fixed-size array or pagination
struct Player {
    inventory: [Option<Item>; 20],  // Max 20 items
}

// Or separate account for large collections
struct Inventory {
    owner: Pubkey,
    items: Vec<Item>,
}

Benchmark Serialization

use std::time::Instant;

let start = Instant::now();
let bytes = player.try_to_vec()?;
let duration = start.elapsed();

msg\!("Serialization took: {:?} for {} bytes", duration, bytes.len());

Client-Side Performance

Batch RPC Calls

// ❌ N network calls
for (const address of addresses) {
  const account = await connection.getAccountInfo(address);
  // process account
}

// ✅ 1 network call
const accounts = await connection.getMultipleAccountsInfo(addresses);
for (const account of accounts) {
  // process account
}

Performance: 100 accounts: 5000ms → 150ms (33x faster)

Cache Deserialized Data

class AccountCache {
  private cache = new Map<string, CachedAccount>();
  private TTL = 5000; // 5 seconds
  
  get(address: PublicKey): PlayerAccount | null {
    const entry = this.cache.get(address.toBase58());
    if (\!entry) return null;
    
    if (Date.now() - entry.timestamp > this.TTL) {
      this.cache.delete(address.toBase58());
      return null;
    }
    
    return entry.account;
  }
}

Use WebSockets for Live Data

// ❌ Polling (wasteful)
setInterval(async () => {
  const account = await fetchAccount(address);
  updateUI(account);
}, 1000);

// ✅ WebSocket subscription
connection.onAccountChange(address, (accountInfo) => {
  const account = deserialize(accountInfo.data);
  updateUI(account);
});

Deeply Nested Structures

Performance Impact

// Deep nesting = slower serialization
struct Game {
    world: World,  // Level 1
}
struct World {
    regions: Vec<Region>,  // Level 2
}
struct Region {
    zones: Vec<Zone>,  // Level 3
}
struct Zone {
    entities: Vec<Entity>,  // Level 4
}

Impact:

  • Slower serialization/deserialization
  • Higher memory usage
  • More complex validation

Optimization: Use References

// ✅ Flatten with references
struct Game {
    world_id: Pubkey,  // Reference to World account
}

struct World {
    region_ids: Vec<Pubkey>,  // References to Region accounts
}

// Each account is independent, smaller, faster

Account Size Calculator

# Calculate account size
lumos size schemas/player.lumos --type PlayerAccount

# Output:
# PlayerAccount:
#   Fixed: 42 bytes
#   Variable: 4 + name.len() + 4 + (inventory.len() * 32)
#   
#   Examples:
#     Empty: 50 bytes → rent: 0.00034776 SOL
#     10 items: 370 bytes → rent: 0.00257544 SOL

### 3. New Page: `docs/reference/type-encoding.md`

```markdown
---
title: Type Encoding Reference
description: Binary encoding format for all LUMOS types
---

# Type Encoding Reference

Complete reference for Borsh binary encoding of LUMOS types.

## Encoding Tables

[Include comprehensive tables from lumos core issue #82]

## Binary Layout Diagrams

[Include visual diagrams showing memory layout]

## Size Calculation

[Include formulas for calculating account sizes]

4. Update Navigation

sidebar: {
  '/guide/': [
    {
      text: 'Advanced',
      items: [
        { text: 'Performance', link: '/guide/performance' },
      ]
    }
  ],
  '/internals/': [
    {
      text: 'Internals',
      items: [
        { text: 'Borsh Serialization', link: '/internals/borsh' },
      ]
    }
  ],
  '/reference/': [
    { text: 'Type Encoding', link: '/reference/type-encoding' },
  ]
}

Acceptance Criteria

  • docs/internals/borsh.md created with complete Borsh explanation
  • docs/guide/performance.md created with optimization patterns
  • docs/reference/type-encoding.md created with encoding tables
  • Binary layout diagrams included
  • Navigation updated
  • Cross-links added to related pages
  • Target: Q7: 82→92 (+10), Q1: +3 (performance)

Impact

Context7 Benchmark:

  • Q7: 82 → 92 (+10 points)
  • Q1: 82 → 85 (+3 points - performance content)

Overall Score: 88.0 → 89.3 (+1.3 points)

User Value:

  • Deep understanding of serialization
  • Cost optimization techniques
  • Performance best practices
  • Binary format reference

Related

  • Context7 Benchmark Questions 7, 1
  • Borsh specification
  • Solana rent economics
  • lumos core issue #82 (similar content)

Priority Justification

🟢 MEDIUM - Moderate score (82), valuable for advanced users, complements core repo documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions