Skip to content

Unescaped quotes in _directive_to_source corrupt BQL source #143

@martin-c

Description

@martin-c

Description

rustfava: unescaped quotes in _directive_to_source corrupt BQL source

Every BQL query — including the implicit ones Fava runs for /statistics/
and the saved-query sidebar — fails with

Query parse error: parse error: unexpected input.

whenever the ledger contains a transaction whose payee or narration includes
a literal " (correctly escaped with \" in the source file, as the
beancount grammar requires).

Why

rustfava.rustledger.query is the adapter that "replaces beanquery": it
runs BQL by handing source text to rledger over the WASM bridge instead of
walking already-parsed Directive objects. Before each query it
re-serializes the entries through _entries_to_source /
_directive_to_source. That regenerator wraps user strings with naive
f-strings:

header = f'{date} {flag} "{payee}" "{narration}"'

" and \ inside payee / narration are not escaped. A single
transaction with an inner " therefore round-trips to invalid beancount
(three quoted runs instead of two), rledger rejects the whole regenerated
source, and every subsequent query — not just ones touching that
transaction — surfaces the same generic parse error.

The same hole exists for event, note, document, query, and
custom directives.

This is rustfava-specific: upstream Fava executes queries via beanquery
directly on the parsed entry objects, with no re-serialization step.

Repro

repro.py

ledger.beancount (one transaction with an escaped inner quote, parses
cleanly with rledger check and bean-check):

option "title" "Quote-escaping repro"
option "operating_currency" "USD"

2024-01-01 open Assets:Bank          USD
2024-01-01 open Expenses:Hardware    USD

2024-02-12 * "Amazon" "Roloc 2\" Disc Pad Assembly"
  Assets:Bank        -10.00 USD
  Expenses:Hardware   10.00 USD

repro.py loads it with rustfava's loader, dumps what
_entries_to_source produces, then runs a trivial SELECT.

$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py
loaded 3 entries from ledger.beancount, 0 load errors

--- regenerated source rustfava feeds to rledger ---
2024-01-01 open Assets:Bank USD
2024-01-01 open Expenses:Hardware USD
2024-02-12 * "Amazon" "Roloc 2" Disc Pad Assembly"
  Assets:Bank  -10.00 USD
  Expenses:Hardware  10.00 USD
--- end regenerated source ---

running a trivial query …
  FAIL: ParseError: parse error: unexpected input

Note line 3 of the regenerated source: the inner " was dropped instead of
escaped to \", so the line has three quoted runs and is no longer valid
beancount.

Expected

running a trivial query …
  OK: 2 rows
    ('Assets:Bank', {'USD': Decimal('-10.00')})
    ('Expenses:Hardware', {'USD': Decimal('10.00')})

Suggested fix (small)

Add a small helper and use it for every user-supplied string in
_directive_to_source:

def _bean_str(s: str) -> str:
    """Quote a string for safe inclusion in beancount source.

    Escapes the only two characters that have meaning inside a beancount
    string literal: backslash and double quote.
    """
    s = (s or "").replace("\\", "\\\\").replace('"', '\\"')
    return f'"{s}"'

Call sites that need updating (all in
rustfava/rustledger/query.py:_directive_to_source):

directive from to
transaction f'... "{payee}" "{narration}"' f'... {_bean_str(payee)} {_bean_str(narration)}'
transaction f'... "{narration}"' f'... {_bean_str(narration)}'
event f'{date} event "{event_type}" "{desc}"' f'{date} event {_bean_str(event_type)} {_bean_str(desc)}'
note f'{date} note {account} "{comment}"' f'{date} note {account} {_bean_str(comment)}'
document f'{date} document {account} "{filename}"' f'{date} document {account} {_bean_str(filename)}'
query f'{date} query "{name}" "{query_string}"' f'{date} query {_bean_str(name)} {_bean_str(query_string)}'
custom " ".join(f'"{v}"' for v in values) + header " ".join(_bean_str(v) for v in values) + _bean_str(custom_type)

If you prefer keeping escaping out of the call sites entirely, a
string.Formatter subclass with a !q conversion centralizes the rule
behind a custom format spec:

class _BeanFormatter(string.Formatter):
    def convert_field(self, value, conversion):
        if conversion == "q":
            s = "" if value is None else str(value)
            s = s.replace("\\", "\\\\").replace('"', '\\"')
            return f'"{s}"'
        return super().convert_field(value, conversion)

_bf = _BeanFormatter().format
header = _bf('{} {} {!q} {!q}', date, flag, payee, narration)

Both produce identical, rledger-valid output for the repro above; pick
whichever fits the project's style.

Suggested fix (deeper)

"forgot to escape one character" is the symptom. The root cause is that
rustfava maintains a second serializer_entries_to_source /
_directive_to_source — when beancount already has a canonical, battle-
tested one in beancount.parser.printer.EntryPrinter /
format_entry(). The duplicate serializer doesn't just miss escaping; it
also drops postings with cost basis (@@, {...}), metadata, links,
tags, balance tolerances, and a few other corners. Patching the escape
rule fixes the immediate crash but leaves the broader correctness gap.

EntryPrinter already dispatches by class name
(getattr(self, type(entry).__name__)), so the existing
beancount.parser.printer.format_entry() would work — except rustfava's
RLTransaction / RLOpen / … aren't subclasses of beancount's
Transaction / Open / …, so the dispatch table misses them.

Two minimally-invasive ways to use the canonical printer instead:

  1. Register the RL* types. Subclass EntryPrinter once and alias the
    methods:

    class _RLEntryPrinter(EntryPrinter):
        RLTransaction = EntryPrinter.Transaction
        RLOpen        = EntryPrinter.Open
        RLClose       = EntryPrinter.Close
        RLBalance     = EntryPrinter.Balance
        RLPrice       = EntryPrinter.Price
        RLCommodity   = EntryPrinter.Commodity
        RLEvent       = EntryPrinter.Event
        RLNote        = EntryPrinter.Note
        RLDocument    = EntryPrinter.Document
        RLPad         = EntryPrinter.Pad
        RLQuery       = EntryPrinter.Query
        RLCustom      = EntryPrinter.Custom

    …and replace the body of _entries_to_source with
    _RLEntryPrinter()(entries). This assumes the RL* namedtuples expose
    the same field names as beancount's; where they don't, the alias
    becomes a small adapter method instead of a plain assignment.

  2. Make RL* types subclasses of beancount's NamedTuple-based
    directives
    (or register them as virtual subclasses). Then the stock
    format_entry() works untouched.

Either path replaces the entire ~90-line _directive_to_source with
something that's an order of magnitude smaller and inherits every escape
and edge case the beancount maintainers have already encountered.

Environment

component version
rustfava main (f43a15b/40d0e8c/8d54804, 2026-01-24) through v1.30.12
rledger 0.15.0
Python 3.14.5
OS macOS 15.6.1 (Darwin arm64)

Steps to Reproduce

ledger.beancount (one transaction with an escaped inner quote, parses
cleanly with rledger check and bean-check):

option "title" "Quote-escaping repro"
option "operating_currency" "USD"

2024-01-01 open Assets:Bank          USD
2024-01-01 open Expenses:Hardware    USD

2024-02-12 * "Amazon" "Roloc 2\" Disc Pad Assembly"
  Assets:Bank        -10.00 USD
  Expenses:Hardware   10.00 USD

repro.py loads it with rustfava's loader, dumps what
_entries_to_source produces, then runs a trivial SELECT.

Rustfava Version

v1.30.12, main

Python Version

3.14.5

Operating System

macOS

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions