Skip to content

query-time serializer silently drops beancount features #144

@martin-c

Description

@martin-c

Description

rustfava: query-time serializer silently drops beancount features

Companion to issue 143 That one fixes the visible crash; this one is the silent half of the same root cause.

When rustfava executes BQL it re-serializes parsed entries to source text
and hands that text to rledger
(rustfava.rustledger.query._entries_to_source /
_directive_to_source). That serializer only handles a subset of each
directive's fields. The rest — tags, links, metadata, posting flags,
cost basis {...}, per-posting prices @/@@, booking methods, balance
tolerances, and so on — are dropped on the floor.

The result is silent data corruption. No exception, no error in the
Fava sidebar; BQL just returns wrong answers against a degraded copy of
the ledger.

Repro

repro.py

ledger.beancount parses cleanly with rledger check. Run

$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py

…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.

Diff: original vs. regenerated

--- ledger.beancount (original)
+++ rustfava regenerated source
@@ -1,35 +1,16 @@
-2024-01-01 open Assets:US:Bank          USD
-2024-01-01 open Assets:US:Brokerage     "STRICT"
-2024-01-01 open Assets:DE:Bank          EUR
-2024-01-01 open Equity:Opening          USD,EUR
+2024-01-01 open Assets:US:Bank USD
+2024-01-01 open Assets:US:Brokerage
+2024-01-01 open Assets:DE:Bank EUR
+2024-01-01 open Equity:Opening USD EUR
 2024-01-01 commodity AAPL
-  name: "Apple Inc."
-  asset-class: "stock"
-
-2024-02-15 * "Wise" "USD->EUR transfer" #fx-2024 ^transfer-batch-12
-  category: "international"
-  ! Assets:US:Bank          -1000.00 USD
-    confidence: "high"
-    Assets:DE:Bank            900.00 EUR @@ 1000.00 USD
+2024-02-15 * "Wise" "USD->EUR transfer"
+  Assets:US:Bank  -1000.00 USD
+  Assets:DE:Bank  900.00 EUR
 2024-03-20 * "Schwab" "Buy 10 AAPL"
-  Assets:US:Brokerage       10 AAPL {170.50 USD, 2024-03-20}
+  Assets:US:Brokerage  10 AAPL
   Assets:US:Bank  -1705.00 USD
-
-2024-07-04 * "Marriott" "Vegas stay" #vacation-2024 ^trip-vegas
+2024-07-04 * "Marriott" "Vegas stay"
   Assets:US:Bank  -250.00 USD
   Expenses:Hotel  250.00 USD
-
-2024-12-31 balance Assets:DE:Bank   900.00 ~ 0.05 EUR
+2024-12-31 balance Assets:DE:Bank 900.00 EUR

Queries that silently lie

query expected observed
SELECT date, narration WHERE 'vacation-2024' IN tags 1 row: the Marriott stay 0 rows — tag dropped
SELECT date, narration WHERE 'trip-vegas' IN links 1 row: the Marriott stay 0 rows — link dropped
SELECT date, narration WHERE META('category') = 'international' 1 row: the Wise transfer 0 rows — metadata dropped
SELECT account, units(position), cost(position) WHERE account ~ 'Brokerage' 10 AAPL with cost basis 170.50 USD / 2024-03-20 row returned, but cost(position) is None
SELECT sum(convert(position, 'USD')) WHERE date = 2024-02-15 ≈ 0 USD — the @@ price makes the txn balance {'USD': -1000.00, 'EUR': 900.00}@@ dropped, postings no longer balance in USD

The full output (including verbatim Python rows) is reproducible with
repro.py and shown at the bottom of this report.

Why each one matters

  • @@ total price is the worst. In the Wise transfer the @@ 1000.00 USD is what makes the cross-currency posting balance. After
    regeneration, the transaction has -1000 USD reconciled against
    +900 EUR with no price annotation — that's unbalanced in money
    terms. Sum-after-conversion queries silently use the price database (or
    no price at all) instead of the explicit per-transaction rate the user
    wrote down.

  • Cost basis {price, date} is how beancount tracks "I bought 10
    shares at $170.50 on 2024-03-20." Drop it and capital-gains math
    returns nonsense. For anyone using beancount for taxes, this is a
    showstopper.

  • Tags and links are the primary way users filter in Fava. #tax-2024,
    #vacation-2024, ^invoice-42 — they all stop matching after
    regeneration.

  • Directive and posting metadata is how plugins, importers and Fava
    extensions add structured context (payee-id:, category:, ML
    classifier confidence, …). Dropped on regen, invisible to BQL.

  • Per-posting flag ! marks "this single leg needs attention." Drop
    it and the leg looks reconciled.

  • Balance tolerance ~ is essential for FX accounts where rounding
    makes exact balances impossible. Drop it and assertions flip red on a
    three-cent discrepancy.

  • open booking method ("STRICT", "FIFO", "LIFO", etc.)
    controls how lots are matched at sale time. Drop it and capital-gains
    reporting changes silently.

Relationship to the escape bug

The escape bug filed as
issue 143 is the visible
manifestation of the same underlying design issue: rustfava maintains a
duplicate, hand-written serializer for beancount source. The escape bug
takes the whole regenerator down with parse error: unexpected input;
this one lets the regenerator succeed but corrupts the data on the way
through.

A _bean_str() style patch fixes the crash but leaves the silent
correctness gap. Closing the gap by adding handlers for every dropped
field is possible but reinvents work beancount has already done. The
sustainable fix is the same one suggested at the bottom of the sibling
report: stop maintaining a second serializer.

Suggested fix

Delegate to beancount.parser.printer.EntryPrinter. It already handles
every field listed above correctly and is the same code path
bean-format uses.

EntryPrinter dispatches by class name
(getattr(self, type(entry).__name__)), so the existing
format_entry() would work — except rustfava's RLTransaction /
RLOpen / … aren't subclasses of beancount's Transaction / Open / …,
so the dispatch table misses them. Two minimal ways to fix that:

  1. Register the RL* types by subclassing EntryPrinter once and
    aliasing the methods (or writing tiny adapters where field names
    diverge):

    from beancount.parser.printer import EntryPrinter
    
    class _RLEntryPrinter(EntryPrinter):
        RLTransaction = EntryPrinter.Transaction
        RLOpen        = EntryPrinter.Open
        RLClose       = EntryPrinter.Close
        RLBalance     = EntryPrinter.Balance
        RLPrice       = EntryPrinter.Price
        RLCommodity   = EntryPrinter.Commodity
        RLEvent       = EntryPrinter.Event
        RLNote        = EntryPrinter.Note
        RLDocument    = EntryPrinter.Document
        RLPad         = EntryPrinter.Pad
        RLQuery       = EntryPrinter.Query
        RLCustom      = EntryPrinter.Custom
    
    _printer = _RLEntryPrinter()
    
    def _entries_to_source(entries):
        return "".join(_printer(e) for e in entries)
  2. Make RL* types subclasses of beancount's NamedTuple-based
    directives
    , or register them as virtual subclasses. Then the stock
    format_entry() works untouched.

Either path replaces ~90 lines of _directive_to_source with something
an order of magnitude smaller, and inherits every escape rule and edge
case the beancount maintainers have already worked out.

Repro output (verbatim)

loaded 10 entries from ledger.beancount, 0 load errors

  query: tags-filter
  SQL:   SELECT date, narration WHERE 'vacation-2024' IN tags
  expected: 1 row: the 2024-07-04 Marriott transaction
  observed: 0 row(s)

  query: links-filter
  SQL:   SELECT date, narration WHERE 'trip-vegas' IN links
  expected: 1 row: the 2024-07-04 Marriott transaction
  observed: 0 row(s)

  query: metadata-filter
  SQL:   SELECT date, narration WHERE META('category') = 'international'
  expected: 1 row: the 2024-02-15 Wise transfer
  observed: 0 row(s)

  query: cost-basis
  SQL:   SELECT account, units(position), cost(position) WHERE account ~ 'Brokerage'
  expected: 10 AAPL @ 170.50 USD cost basis
  observed: 1 row(s)
            ('Assets:US:Brokerage', RLAmount(number=Decimal('10'), currency='AAPL'), None)

  query: fx-convert
  SQL:   SELECT sum(convert(position, 'USD')) WHERE date = 2024-02-15
  expected: ≈ 0 USD — the @@ price makes the transaction balance
  observed: 1 row(s)
            ({'USD': Decimal('-1000.00'), 'EUR': Decimal('900.00')},)

Environment

component version
rustfava main (f43a15b/40d0e8c/8d54804, 2026-01-24) through v1.30.12
rledger 0.15.0
Python 3.14.5
OS macOS 15.6.1 (Darwin arm64)

Steps to Reproduce

ledger.beancount parses cleanly with rledger check. Run

$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py

…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.

Rustfava Version

1.30.12, main

Python Version

3.14.5

Operating System

macOS

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions