tryparse

Multi-strategy parser for messy, real-world data. Built to handle LLM responses with broken JSON, markdown wrappers, type mismatches, and inconsistent formatting.

Quick Start

[dependencies]
tryparse = "0.3"
serde = { version = "1.0", features = ["derive"] }

use tryparse::parse;
use serde::Deserialize;

#[derive(Deserialize, Debug)]
struct User {
    name: String,
    age: u32,
}

fn main() {
    // Handles markdown wrappers, trailing commas, unquoted keys, type coercion
    let messy_input = r#"
    Here's your data:
    ```json
    {
      name: "Alice",
      age: "30",
    }
    ```
    "#;

    let user: User = parse(messy_input).unwrap();
    println!("{:?}", user); // User { name: "Alice", age: 30 }
}

Core Features

With `serde::Deserialize`

Basic type coercion works out of the box:

use tryparse::parse;
use serde::Deserialize;

#[derive(Deserialize)]
struct Data {
    count: i64,      // "42" → 42
    price: f64,      // "3.14" → 3.14
    active: bool,    // "true" → true
    tags: Vec<String>, // "tag" → ["tag"]
}

let data: Data = parse(r#"{"count": "42", "price": "3.14", "active": "true", "tags": "tag"}"#).unwrap();

With `LlmDeserialize` (derive feature)

Advanced features require the derive feature:

[dependencies]
tryparse = { version = "0.3", features = ["derive"] }
tryparse-derive = "0.3"

Fuzzy field matching - Handles different naming conventions:

use tryparse::parse_llm;
use tryparse_derive::LlmDeserialize;

#[derive(Debug, LlmDeserialize)]
struct Config {
    user_name: String,  // Matches: userName, UserName, user-name, user.name, USER_NAME
    max_count: i64,
}

let data: Config = parse_llm(r#"{"userName": "Alice", "maxCount": 30}"#).unwrap();

Enum fuzzy matching - Case-insensitive, partial matches:

#[derive(Debug, LlmDeserialize)]
enum Status {
    InProgress,  // Matches: "in_progress", "in-progress", "inprogress", "in progress"
    Completed,   // Matches: "complete", "COMPLETED", "done"
    Cancelled,
}

Union types - Automatically picks the best variant:

#[derive(Debug, LlmDeserialize)]
#[llm(union)]
enum Value {
    Number(i64),
    Text(String),
    List(Vec<String>),
}

// Parses as Number(42)
let v1: Value = parse_llm("42").unwrap();

// Parses as Text("hello")
let v2: Value = parse_llm(r#""hello""#).unwrap();

// Parses as List(...)
let v3: Value = parse_llm(r#"["a", "b"]"#).unwrap();

Implied key - Single-field structs unwrap values:

#[derive(Debug, LlmDeserialize)]
struct Wrapper {
    data: String,
}

// Direct string wraps into the single field
let w: Wrapper = parse_llm(r#""hello world""#).unwrap();
assert_eq!(w.data, "hello world");

API Reference

Basic Parsing

// Parse with serde::Deserialize
fn parse<T: DeserializeOwned>(input: &str) -> Result<T>

// Parse with serde::Deserialize, get all candidates
fn parse_with_candidates<T: DeserializeOwned>(input: &str) -> Result<(T, Vec<FlexValue>)>

// Parse with custom parser configuration
fn parse_with_parser<T: DeserializeOwned>(input: &str, parser: &FlexibleParser) -> Result<T>

Advanced Parsing (requires `derive` feature)

// Parse with LlmDeserialize trait (fuzzy matching, unions, etc.)
fn parse_llm<T: LlmDeserialize>(input: &str) -> Result<T>

// Parse with LlmDeserialize, get all candidates
fn parse_llm_with_candidates<T: LlmDeserialize>(input: &str) -> Result<(T, Vec<FlexValue>)>

Utilities

// Score a candidate (lower is better)
fn score_candidate(candidate: &FlexValue) -> u32

// Rank candidates by score
fn rank_candidates(candidates: &mut [FlexValue])

// Get the best candidate
fn best_candidate(candidates: &[FlexValue]) -> Option<&FlexValue>

How It Works

1. Multi-Stage Parsing Pipeline

Input String
    ↓
┌──────────────────────────────────┐
│ Pre-Processing                   │
│ • Remove BOM, zero-width chars   │
│ • Fix excessive nesting (>50)    │
│ • Normalize backslashes          │
└────────────┬─────────────────────┘
             ↓
┌──────────────────────────────────┐
│ Strategy Execution (parallel)    │
│ • DirectJson        (priority 1) │
│ • Markdown          (priority 2) │
│ • YAML              (priority 15)│
│ • JsonFixer         (priority 20)│
│ • Heuristic         (priority 30)│
│                                  │
│ → Produces Vec<FlexValue>        │
└────────────┬─────────────────────┘
             ↓
┌──────────────────────────────────┐
│ Scoring & Ranking                │
│ • Base score by source           │
│ • Transformation penalties       │
│ • Confidence adjustment          │
│ • Sort ascending (best first)    │
└────────────┬─────────────────────┘
             ↓
┌──────────────────────────────────┐
│ Deserialization                  │
│ • Try candidates in order        │
│ • Apply type coercion            │
│ • Track transformations          │
│ • Return first success           │
└──────────────────────────────────┘

2. Parsing Strategies

Strategy	Priority	Description
DirectJson	1	Direct `serde_json::from_str()`. Fastest path for valid JSON.
Markdown	2	Extracts from markdown code blocks. Scores by keywords, position, size.
YAML	15	Parses YAML, converts to JSON. Requires `yaml` feature.
JsonFixer	20	Fixes common JSON errors (see below).
Heuristic	30	Pattern-based extraction from prose. Last resort.

3. JSON Fixes Applied

The JsonFixer strategy handles:

Trailing commas: {"a": 1,} → {"a": 1}
Unquoted keys: {name: "x"} → {"name": "x"}
Single quotes: {'a': 1} → {"a": 1}
Missing commas: {"a":1 "b":2} → {"a":1,"b":2}
Unclosed braces/brackets: {"a": 1 → {"a": 1}
Comments: {"a": 1 /* comment */} → {"a": 1}
Smart quotes: {"a": "value"} → {"a": "value"}
Double-escaped JSON: "{\"a\":1}" → {"a":1}
Template literals: {`key`: "value"} → {"key": "value"}
Hex numbers: {"a": 0xFF} → {"a": 255}
Unescaped newlines in strings
JavaScript functions: Removed entirely

4. Type Coercion

Applied during deserialization (works with both Deserialize and LlmDeserialize):

From	To	Example
String	Number	`"42"` → `42`
String	Bool	`"true"` → `true`
Number	String	`42` → `"42"`
Float	Int	`42.0` → `42`
Single	Array	`"item"` → `["item"]`

5. Field Matching (LlmDeserialize only)

Normalizes field names to snake_case and matches case-insensitively:

Struct Field	Matches JSON Keys
`user_name`	`userName`, `UserName`, `user-name`, `user.name`, `USER_NAME`, `username`
`max_count`	`maxCount`, `MaxCount`, `max-count`, `max.count`, `MAX_COUNT`

Note: Does not handle acronyms perfectly. XMLParser becomes x_m_l_parser not xml_parser.

6. Scoring System

Base Scores (by source):

Direct JSON: 0
Markdown: 10
YAML: 15
Fixed JSON: 20 + (5 × number of fixes)
Heuristic: 50

Transformation Penalties:

String→Number: +2
Float→Int: +3
Field rename: +4
Single→Array: +5
Default inserted: +50

Confidence Modifier:

Each transformation reduces confidence by 5%
Final score += (1.0 - confidence) × 100

Lower scores win. Direct JSON with no coercion scores 0 (best possible).

Examples

Handling Markdown Responses

let llm_output = r#"
Sure! Here's the user data:

```json
{
  "name": "Alice",
  "age": 30,
  "email": "alice@example.com"
}

Let me know if you need anything else! "#;

#[derive(Deserialize, Debug)] struct User { name: String, age: i64, email: String, }

let user: User = parse(llm_output).unwrap();


### Inspecting Parse Candidates

```rust
use tryparse::{parse_with_candidates, scoring::score_candidate};

let (result, candidates) = parse_with_candidates::<User>(messy_input).unwrap();

println!("Best result: {:?}", result);
println!("\nAll candidates:");
for (i, candidate) in candidates.iter().enumerate() {
    println!("  {}: {:?} (score: {})",
        i,
        candidate.source(),
        score_candidate(candidate)
    );

    // Inspect transformations
    for t in candidate.transformations() {
        println!("    - {:?}", t);
    }
}

Custom Parser Configuration

use tryparse::parser::{FlexibleParser, strategies::*};

let parser = FlexibleParser::new()
    .with_strategy(DirectJsonStrategy)
    .with_strategy(MarkdownStrategy::new())
    .with_strategy(JsonFixerStrategy::new());

let data: User = parse_with_parser(input, &parser).unwrap();

Complex Nested Structures

use std::collections::HashMap;

#[derive(Debug, LlmDeserialize)]
struct Project {
    name: String,
    owner: User,
    status: Status,
    tags: Vec<String>,
    metadata: HashMap<String, String>,
}

let project: Project = parse_llm(complex_json).unwrap();

Union Types with Scoring

#[derive(Debug, LlmDeserialize)]
#[llm(union)]
enum Response {
    Success { data: User },
    Error { message: String, code: i64 },
    Pending { estimated_time: i64 },
}

// Automatically picks the variant that best matches the structure
let response: Response = parse_llm(api_response).unwrap();

match response {
    Response::Success { data } => println!("User: {:?}", data),
    Response::Error { message, code } => eprintln!("Error {}: {}", code, message),
    Response::Pending { estimated_time } => println!("Wait {}s", estimated_time),
}

Feature Flags

# Default: includes markdown and yaml
[dependencies]
tryparse = "0.3"

# Minimal build (core JSON parsing only)
tryparse = { version = "0.3", default-features = false }

# With derive macros for LlmDeserialize
tryparse = { version = "0.3", features = ["derive"] }

# All features
tryparse = { version = "0.3", features = ["derive", "markdown", "yaml"] }

Available features:

markdown (default) - Markdown code block extraction
yaml (default) - YAML parsing support
derive - Derive macro for LlmDeserialize (fuzzy field/enum matching, union types)

Testing

# Unit tests (227 tests in lib)
cargo test --lib

# All tests (lib + integration + doc tests)
cargo test --all-features

# Minimal build tests
cargo test --no-default-features

# Run specific test
cargo test --test integration_test -- --nocapture

Performance Considerations

Parsing is synchronous: No async/await support
Memory overhead: Tracks all parsing candidates and transformations
Strategy execution: Some strategies run in parallel
Regex compilation: Expensive regexes are compiled lazily and cached
Best for: <1MB inputs, occasional parsing (not high-frequency loops)

Optimizations:

Direct JSON (valid JSON) takes the fastest path
Failed strategies short-circuit early
Scoring is lazy (only computed when needed)
Copy semantics for small types (enums, etc.)

Debugging

Enable detailed logging:

env_logger::init();
std::env::set_var("RUST_LOG", "tryparse=debug");

let result = parse::<User>(input);

Inspect what went wrong:

match parse::<User>(input) {
    Ok(user) => println!("Success: {:?}", user),
    Err(e) => {
        eprintln!("Parse failed: {:?}", e);

        // Try getting candidates to see what was parsed
        if let Ok((_, candidates)) = parse_with_candidates::<User>(input) {
            for c in candidates {
                eprintln!("Candidate: {:?}", c.value);
            }
        }
    }
}

Known Limitations

Synchronous only - No async parsing
No streaming - Requires complete input string
Memory overhead - Tracks all candidates and transformations
Acronym handling - XMLParser → x_m_l_parser (not xml_parser)
Best-effort parsing - May produce unexpected results on ambiguous input
No custom deserializers - Can't implement custom Deserialize logic for fields

Contributing

Requirements:

Rust 1.85.0+
Run cargo fmt before committing
Pass cargo clippy --all-targets --all-features
All tests must pass: cargo test --all-features

Pull request checklist:

Clear description of what and why
Tests for new functionality
Update README if API changes
No clippy warnings
All existing tests pass

License

Apache-2.0

Credits

Parsing algorithms inspired by BAML's Schema-Aligned Parsing.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
tryparse-derive		tryparse-derive
tryparse		tryparse
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tryparse

Quick Start

Core Features

With `serde::Deserialize`

With `LlmDeserialize` (derive feature)

API Reference

Basic Parsing

Advanced Parsing (requires `derive` feature)

Utilities

How It Works

1. Multi-Stage Parsing Pipeline

2. Parsing Strategies

3. JSON Fixes Applied

4. Type Coercion

5. Field Matching (LlmDeserialize only)

6. Scoring System

Examples

Handling Markdown Responses

Custom Parser Configuration

Complex Nested Structures

Union Types with Scoring

Feature Flags

Testing

Performance Considerations

Debugging

Known Limitations

Contributing

License

Credits

About

Uh oh!

Releases

Packages

Languages

License

microagents/tryparse

Folders and files

Latest commit

History

Repository files navigation

tryparse

Quick Start

Core Features

With serde::Deserialize

With LlmDeserialize (derive feature)

API Reference

Basic Parsing

Advanced Parsing (requires derive feature)

Utilities

How It Works

1. Multi-Stage Parsing Pipeline

2. Parsing Strategies

3. JSON Fixes Applied

4. Type Coercion

5. Field Matching (LlmDeserialize only)

6. Scoring System

Examples

Handling Markdown Responses

Custom Parser Configuration

Complex Nested Structures

Union Types with Scoring

Feature Flags

Testing

Performance Considerations

Debugging

Known Limitations

Contributing

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

With `serde::Deserialize`

With `LlmDeserialize` (derive feature)

Advanced Parsing (requires `derive` feature)

Packages