Multi-strategy parser for messy, real-world data. Built to handle LLM responses with broken JSON, markdown wrappers, type mismatches, and inconsistent formatting.
[dependencies]
tryparse = "0.3"
serde = { version = "1.0", features = ["derive"] }use tryparse::parse;
use serde::Deserialize;
#[derive(Deserialize, Debug)]
struct User {
name: String,
age: u32,
}
fn main() {
// Handles markdown wrappers, trailing commas, unquoted keys, type coercion
let messy_input = r#"
Here's your data:
```json
{
name: "Alice",
age: "30",
}
```
"#;
let user: User = parse(messy_input).unwrap();
println!("{:?}", user); // User { name: "Alice", age: 30 }
}Basic type coercion works out of the box:
use tryparse::parse;
use serde::Deserialize;
#[derive(Deserialize)]
struct Data {
count: i64, // "42" → 42
price: f64, // "3.14" → 3.14
active: bool, // "true" → true
tags: Vec<String>, // "tag" → ["tag"]
}
let data: Data = parse(r#"{"count": "42", "price": "3.14", "active": "true", "tags": "tag"}"#).unwrap();Advanced features require the derive feature:
[dependencies]
tryparse = { version = "0.3", features = ["derive"] }
tryparse-derive = "0.3"Fuzzy field matching - Handles different naming conventions:
use tryparse::parse_llm;
use tryparse_derive::LlmDeserialize;
#[derive(Debug, LlmDeserialize)]
struct Config {
user_name: String, // Matches: userName, UserName, user-name, user.name, USER_NAME
max_count: i64,
}
let data: Config = parse_llm(r#"{"userName": "Alice", "maxCount": 30}"#).unwrap();Enum fuzzy matching - Case-insensitive, partial matches:
#[derive(Debug, LlmDeserialize)]
enum Status {
InProgress, // Matches: "in_progress", "in-progress", "inprogress", "in progress"
Completed, // Matches: "complete", "COMPLETED", "done"
Cancelled,
}Union types - Automatically picks the best variant:
#[derive(Debug, LlmDeserialize)]
#[llm(union)]
enum Value {
Number(i64),
Text(String),
List(Vec<String>),
}
// Parses as Number(42)
let v1: Value = parse_llm("42").unwrap();
// Parses as Text("hello")
let v2: Value = parse_llm(r#""hello""#).unwrap();
// Parses as List(...)
let v3: Value = parse_llm(r#"["a", "b"]"#).unwrap();Implied key - Single-field structs unwrap values:
#[derive(Debug, LlmDeserialize)]
struct Wrapper {
data: String,
}
// Direct string wraps into the single field
let w: Wrapper = parse_llm(r#""hello world""#).unwrap();
assert_eq!(w.data, "hello world");// Parse with serde::Deserialize
fn parse<T: DeserializeOwned>(input: &str) -> Result<T>
// Parse with serde::Deserialize, get all candidates
fn parse_with_candidates<T: DeserializeOwned>(input: &str) -> Result<(T, Vec<FlexValue>)>
// Parse with custom parser configuration
fn parse_with_parser<T: DeserializeOwned>(input: &str, parser: &FlexibleParser) -> Result<T>// Parse with LlmDeserialize trait (fuzzy matching, unions, etc.)
fn parse_llm<T: LlmDeserialize>(input: &str) -> Result<T>
// Parse with LlmDeserialize, get all candidates
fn parse_llm_with_candidates<T: LlmDeserialize>(input: &str) -> Result<(T, Vec<FlexValue>)>// Score a candidate (lower is better)
fn score_candidate(candidate: &FlexValue) -> u32
// Rank candidates by score
fn rank_candidates(candidates: &mut [FlexValue])
// Get the best candidate
fn best_candidate(candidates: &[FlexValue]) -> Option<&FlexValue>Input String
↓
┌──────────────────────────────────┐
│ Pre-Processing │
│ • Remove BOM, zero-width chars │
│ • Fix excessive nesting (>50) │
│ • Normalize backslashes │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Strategy Execution (parallel) │
│ • DirectJson (priority 1) │
│ • Markdown (priority 2) │
│ • YAML (priority 15)│
│ • JsonFixer (priority 20)│
│ • Heuristic (priority 30)│
│ │
│ → Produces Vec<FlexValue> │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Scoring & Ranking │
│ • Base score by source │
│ • Transformation penalties │
│ • Confidence adjustment │
│ • Sort ascending (best first) │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Deserialization │
│ • Try candidates in order │
│ • Apply type coercion │
│ • Track transformations │
│ • Return first success │
└──────────────────────────────────┘
| Strategy | Priority | Description |
|---|---|---|
| DirectJson | 1 | Direct serde_json::from_str(). Fastest path for valid JSON. |
| Markdown | 2 | Extracts from markdown code blocks. Scores by keywords, position, size. |
| YAML | 15 | Parses YAML, converts to JSON. Requires yaml feature. |
| JsonFixer | 20 | Fixes common JSON errors (see below). |
| Heuristic | 30 | Pattern-based extraction from prose. Last resort. |
The JsonFixer strategy handles:
- Trailing commas:
{"a": 1,}→{"a": 1} - Unquoted keys:
{name: "x"}→{"name": "x"} - Single quotes:
{'a': 1}→{"a": 1} - Missing commas:
{"a":1 "b":2}→{"a":1,"b":2} - Unclosed braces/brackets:
{"a": 1→{"a": 1} - Comments:
{"a": 1 /* comment */}→{"a": 1} - Smart quotes:
{"a": "value"}→{"a": "value"} - Double-escaped JSON:
"{\"a\":1}"→{"a":1} - Template literals:
{`key`: "value"}→{"key": "value"} - Hex numbers:
{"a": 0xFF}→{"a": 255} - Unescaped newlines in strings
- JavaScript functions: Removed entirely
Applied during deserialization (works with both Deserialize and LlmDeserialize):
| From | To | Example |
|---|---|---|
| String | Number | "42" → 42 |
| String | Bool | "true" → true |
| Number | String | 42 → "42" |
| Float | Int | 42.0 → 42 |
| Single | Array | "item" → ["item"] |
Normalizes field names to snake_case and matches case-insensitively:
| Struct Field | Matches JSON Keys |
|---|---|
user_name |
userName, UserName, user-name, user.name, USER_NAME, username |
max_count |
maxCount, MaxCount, max-count, max.count, MAX_COUNT |
Note: Does not handle acronyms perfectly. XMLParser becomes x_m_l_parser not xml_parser.
Base Scores (by source):
- Direct JSON: 0
- Markdown: 10
- YAML: 15
- Fixed JSON: 20 + (5 × number of fixes)
- Heuristic: 50
Transformation Penalties:
- String→Number: +2
- Float→Int: +3
- Field rename: +4
- Single→Array: +5
- Default inserted: +50
Confidence Modifier:
- Each transformation reduces confidence by 5%
- Final score +=
(1.0 - confidence) × 100
Lower scores win. Direct JSON with no coercion scores 0 (best possible).
let llm_output = r#"
Sure! Here's the user data:
```json
{
"name": "Alice",
"age": 30,
"email": "alice@example.com"
}Let me know if you need anything else! "#;
#[derive(Deserialize, Debug)] struct User { name: String, age: i64, email: String, }
let user: User = parse(llm_output).unwrap();
### Inspecting Parse Candidates
```rust
use tryparse::{parse_with_candidates, scoring::score_candidate};
let (result, candidates) = parse_with_candidates::<User>(messy_input).unwrap();
println!("Best result: {:?}", result);
println!("\nAll candidates:");
for (i, candidate) in candidates.iter().enumerate() {
println!(" {}: {:?} (score: {})",
i,
candidate.source(),
score_candidate(candidate)
);
// Inspect transformations
for t in candidate.transformations() {
println!(" - {:?}", t);
}
}
use tryparse::parser::{FlexibleParser, strategies::*};
let parser = FlexibleParser::new()
.with_strategy(DirectJsonStrategy)
.with_strategy(MarkdownStrategy::new())
.with_strategy(JsonFixerStrategy::new());
let data: User = parse_with_parser(input, &parser).unwrap();use std::collections::HashMap;
#[derive(Debug, LlmDeserialize)]
struct Project {
name: String,
owner: User,
status: Status,
tags: Vec<String>,
metadata: HashMap<String, String>,
}
let project: Project = parse_llm(complex_json).unwrap();#[derive(Debug, LlmDeserialize)]
#[llm(union)]
enum Response {
Success { data: User },
Error { message: String, code: i64 },
Pending { estimated_time: i64 },
}
// Automatically picks the variant that best matches the structure
let response: Response = parse_llm(api_response).unwrap();
match response {
Response::Success { data } => println!("User: {:?}", data),
Response::Error { message, code } => eprintln!("Error {}: {}", code, message),
Response::Pending { estimated_time } => println!("Wait {}s", estimated_time),
}# Default: includes markdown and yaml
[dependencies]
tryparse = "0.3"
# Minimal build (core JSON parsing only)
tryparse = { version = "0.3", default-features = false }
# With derive macros for LlmDeserialize
tryparse = { version = "0.3", features = ["derive"] }
# All features
tryparse = { version = "0.3", features = ["derive", "markdown", "yaml"] }Available features:
markdown(default) - Markdown code block extractionyaml(default) - YAML parsing supportderive- Derive macro forLlmDeserialize(fuzzy field/enum matching, union types)
# Unit tests (227 tests in lib)
cargo test --lib
# All tests (lib + integration + doc tests)
cargo test --all-features
# Minimal build tests
cargo test --no-default-features
# Run specific test
cargo test --test integration_test -- --nocapture- Parsing is synchronous: No async/await support
- Memory overhead: Tracks all parsing candidates and transformations
- Strategy execution: Some strategies run in parallel
- Regex compilation: Expensive regexes are compiled lazily and cached
- Best for: <1MB inputs, occasional parsing (not high-frequency loops)
Optimizations:
- Direct JSON (valid JSON) takes the fastest path
- Failed strategies short-circuit early
- Scoring is lazy (only computed when needed)
- Copy semantics for small types (enums, etc.)
Enable detailed logging:
env_logger::init();
std::env::set_var("RUST_LOG", "tryparse=debug");
let result = parse::<User>(input);Inspect what went wrong:
match parse::<User>(input) {
Ok(user) => println!("Success: {:?}", user),
Err(e) => {
eprintln!("Parse failed: {:?}", e);
// Try getting candidates to see what was parsed
if let Ok((_, candidates)) = parse_with_candidates::<User>(input) {
for c in candidates {
eprintln!("Candidate: {:?}", c.value);
}
}
}
}- Synchronous only - No async parsing
- No streaming - Requires complete input string
- Memory overhead - Tracks all candidates and transformations
- Acronym handling -
XMLParser→x_m_l_parser(notxml_parser) - Best-effort parsing - May produce unexpected results on ambiguous input
- No custom deserializers - Can't implement custom
Deserializelogic for fields
Requirements:
- Rust 1.85.0+
- Run
cargo fmtbefore committing - Pass
cargo clippy --all-targets --all-features - All tests must pass:
cargo test --all-features
Pull request checklist:
- Clear description of what and why
- Tests for new functionality
- Update README if API changes
- No clippy warnings
- All existing tests pass
Apache-2.0
Parsing algorithms inspired by BAML's Schema-Aligned Parsing.