Description
rustfava: query-time serializer silently drops beancount features
Companion to issue 143 That one fixes the visible crash; this one is the silent half of the same root cause.
When rustfava executes BQL it re-serializes parsed entries to source text
and hands that text to rledger
(rustfava.rustledger.query._entries_to_source /
_directive_to_source). That serializer only handles a subset of each
directive's fields. The rest — tags, links, metadata, posting flags,
cost basis {...}, per-posting prices @/@@, booking methods, balance
tolerances, and so on — are dropped on the floor.
The result is silent data corruption. No exception, no error in the
Fava sidebar; BQL just returns wrong answers against a degraded copy of
the ledger.
Repro
repro.py
ledger.beancount parses cleanly with rledger check. Run
$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py
…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.
Diff: original vs. regenerated
--- ledger.beancount (original)
+++ rustfava regenerated source
@@ -1,35 +1,16 @@
-2024-01-01 open Assets:US:Bank USD
-2024-01-01 open Assets:US:Brokerage "STRICT"
-2024-01-01 open Assets:DE:Bank EUR
-2024-01-01 open Equity:Opening USD,EUR
+2024-01-01 open Assets:US:Bank USD
+2024-01-01 open Assets:US:Brokerage
+2024-01-01 open Assets:DE:Bank EUR
+2024-01-01 open Equity:Opening USD EUR
2024-01-01 commodity AAPL
- name: "Apple Inc."
- asset-class: "stock"
-
-2024-02-15 * "Wise" "USD->EUR transfer" #fx-2024 ^transfer-batch-12
- category: "international"
- ! Assets:US:Bank -1000.00 USD
- confidence: "high"
- Assets:DE:Bank 900.00 EUR @@ 1000.00 USD
+2024-02-15 * "Wise" "USD->EUR transfer"
+ Assets:US:Bank -1000.00 USD
+ Assets:DE:Bank 900.00 EUR
2024-03-20 * "Schwab" "Buy 10 AAPL"
- Assets:US:Brokerage 10 AAPL {170.50 USD, 2024-03-20}
+ Assets:US:Brokerage 10 AAPL
Assets:US:Bank -1705.00 USD
-
-2024-07-04 * "Marriott" "Vegas stay" #vacation-2024 ^trip-vegas
+2024-07-04 * "Marriott" "Vegas stay"
Assets:US:Bank -250.00 USD
Expenses:Hotel 250.00 USD
-
-2024-12-31 balance Assets:DE:Bank 900.00 ~ 0.05 EUR
+2024-12-31 balance Assets:DE:Bank 900.00 EUR
Queries that silently lie
| query |
expected |
observed |
SELECT date, narration WHERE 'vacation-2024' IN tags |
1 row: the Marriott stay |
0 rows — tag dropped |
SELECT date, narration WHERE 'trip-vegas' IN links |
1 row: the Marriott stay |
0 rows — link dropped |
SELECT date, narration WHERE META('category') = 'international' |
1 row: the Wise transfer |
0 rows — metadata dropped |
SELECT account, units(position), cost(position) WHERE account ~ 'Brokerage' |
10 AAPL with cost basis 170.50 USD / 2024-03-20 |
row returned, but cost(position) is None |
SELECT sum(convert(position, 'USD')) WHERE date = 2024-02-15 |
≈ 0 USD — the @@ price makes the txn balance |
{'USD': -1000.00, 'EUR': 900.00} — @@ dropped, postings no longer balance in USD |
The full output (including verbatim Python rows) is reproducible with
repro.py and shown at the bottom of this report.
Why each one matters
-
@@ total price is the worst. In the Wise transfer the @@ 1000.00 USD is what makes the cross-currency posting balance. After
regeneration, the transaction has -1000 USD reconciled against
+900 EUR with no price annotation — that's unbalanced in money
terms. Sum-after-conversion queries silently use the price database (or
no price at all) instead of the explicit per-transaction rate the user
wrote down.
-
Cost basis {price, date} is how beancount tracks "I bought 10
shares at $170.50 on 2024-03-20." Drop it and capital-gains math
returns nonsense. For anyone using beancount for taxes, this is a
showstopper.
-
Tags and links are the primary way users filter in Fava. #tax-2024,
#vacation-2024, ^invoice-42 — they all stop matching after
regeneration.
-
Directive and posting metadata is how plugins, importers and Fava
extensions add structured context (payee-id:, category:, ML
classifier confidence, …). Dropped on regen, invisible to BQL.
-
Per-posting flag ! marks "this single leg needs attention." Drop
it and the leg looks reconciled.
-
Balance tolerance ~ is essential for FX accounts where rounding
makes exact balances impossible. Drop it and assertions flip red on a
three-cent discrepancy.
-
open booking method ("STRICT", "FIFO", "LIFO", etc.)
controls how lots are matched at sale time. Drop it and capital-gains
reporting changes silently.
Relationship to the escape bug
The escape bug filed as
issue 143 is the visible
manifestation of the same underlying design issue: rustfava maintains a
duplicate, hand-written serializer for beancount source. The escape bug
takes the whole regenerator down with parse error: unexpected input;
this one lets the regenerator succeed but corrupts the data on the way
through.
A _bean_str() style patch fixes the crash but leaves the silent
correctness gap. Closing the gap by adding handlers for every dropped
field is possible but reinvents work beancount has already done. The
sustainable fix is the same one suggested at the bottom of the sibling
report: stop maintaining a second serializer.
Suggested fix
Delegate to beancount.parser.printer.EntryPrinter. It already handles
every field listed above correctly and is the same code path
bean-format uses.
EntryPrinter dispatches by class name
(getattr(self, type(entry).__name__)), so the existing
format_entry() would work — except rustfava's RLTransaction /
RLOpen / … aren't subclasses of beancount's Transaction / Open / …,
so the dispatch table misses them. Two minimal ways to fix that:
-
Register the RL* types by subclassing EntryPrinter once and
aliasing the methods (or writing tiny adapters where field names
diverge):
from beancount.parser.printer import EntryPrinter
class _RLEntryPrinter(EntryPrinter):
RLTransaction = EntryPrinter.Transaction
RLOpen = EntryPrinter.Open
RLClose = EntryPrinter.Close
RLBalance = EntryPrinter.Balance
RLPrice = EntryPrinter.Price
RLCommodity = EntryPrinter.Commodity
RLEvent = EntryPrinter.Event
RLNote = EntryPrinter.Note
RLDocument = EntryPrinter.Document
RLPad = EntryPrinter.Pad
RLQuery = EntryPrinter.Query
RLCustom = EntryPrinter.Custom
_printer = _RLEntryPrinter()
def _entries_to_source(entries):
return "".join(_printer(e) for e in entries)
-
Make RL* types subclasses of beancount's NamedTuple-based
directives, or register them as virtual subclasses. Then the stock
format_entry() works untouched.
Either path replaces ~90 lines of _directive_to_source with something
an order of magnitude smaller, and inherits every escape rule and edge
case the beancount maintainers have already worked out.
Repro output (verbatim)
loaded 10 entries from ledger.beancount, 0 load errors
query: tags-filter
SQL: SELECT date, narration WHERE 'vacation-2024' IN tags
expected: 1 row: the 2024-07-04 Marriott transaction
observed: 0 row(s)
query: links-filter
SQL: SELECT date, narration WHERE 'trip-vegas' IN links
expected: 1 row: the 2024-07-04 Marriott transaction
observed: 0 row(s)
query: metadata-filter
SQL: SELECT date, narration WHERE META('category') = 'international'
expected: 1 row: the 2024-02-15 Wise transfer
observed: 0 row(s)
query: cost-basis
SQL: SELECT account, units(position), cost(position) WHERE account ~ 'Brokerage'
expected: 10 AAPL @ 170.50 USD cost basis
observed: 1 row(s)
('Assets:US:Brokerage', RLAmount(number=Decimal('10'), currency='AAPL'), None)
query: fx-convert
SQL: SELECT sum(convert(position, 'USD')) WHERE date = 2024-02-15
expected: ≈ 0 USD — the @@ price makes the transaction balance
observed: 1 row(s)
({'USD': Decimal('-1000.00'), 'EUR': Decimal('900.00')},)
Environment
| component |
version |
| rustfava |
main (f43a15b/40d0e8c/8d54804, 2026-01-24) through v1.30.12 |
| rledger |
0.15.0 |
| Python |
3.14.5 |
| OS |
macOS 15.6.1 (Darwin arm64) |
Steps to Reproduce
ledger.beancount parses cleanly with rledger check. Run
$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py
…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.
Rustfava Version
1.30.12, main
Python Version
3.14.5
Operating System
macOS
Additional Context
No response
Description
rustfava: query-time serializer silently drops beancount features
Companion to
issue 143That one fixes the visible crash; this one is the silent half of the same root cause.When rustfava executes BQL it re-serializes parsed entries to source text
and hands that text to rledger
(
rustfava.rustledger.query._entries_to_source/_directive_to_source). That serializer only handles a subset of eachdirective's fields. The rest — tags, links, metadata, posting flags,
cost basis
{...}, per-posting prices@/@@, booking methods, balancetolerances, and so on — are dropped on the floor.
The result is silent data corruption. No exception, no error in the
Fava sidebar; BQL just returns wrong answers against a degraded copy of
the ledger.
Repro
repro.py
ledger.beancountparses cleanly withrledger check. Run$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.
Diff: original vs. regenerated
Queries that silently lie
SELECT date, narration WHERE 'vacation-2024' IN tagsSELECT date, narration WHERE 'trip-vegas' IN linksSELECT date, narration WHERE META('category') = 'international'SELECT account, units(position), cost(position) WHERE account ~ 'Brokerage'cost(position)isNoneSELECT sum(convert(position, 'USD')) WHERE date = 2024-02-15@@price makes the txn balance{'USD': -1000.00, 'EUR': 900.00}—@@dropped, postings no longer balance in USDThe full output (including verbatim Python rows) is reproducible with
repro.pyand shown at the bottom of this report.Why each one matters
@@total price is the worst. In the Wise transfer the@@ 1000.00 USDis what makes the cross-currency posting balance. Afterregeneration, the transaction has
-1000 USDreconciled against+900 EURwith no price annotation — that's unbalanced in moneyterms. Sum-after-conversion queries silently use the price database (or
no price at all) instead of the explicit per-transaction rate the user
wrote down.
Cost basis
{price, date}is how beancount tracks "I bought 10shares at $170.50 on 2024-03-20." Drop it and capital-gains math
returns nonsense. For anyone using beancount for taxes, this is a
showstopper.
Tags and links are the primary way users filter in Fava.
#tax-2024,#vacation-2024,^invoice-42— they all stop matching afterregeneration.
Directive and posting metadata is how plugins, importers and Fava
extensions add structured context (
payee-id:,category:, MLclassifier confidence, …). Dropped on regen, invisible to BQL.
Per-posting flag
!marks "this single leg needs attention." Dropit and the leg looks reconciled.
Balance tolerance
~is essential for FX accounts where roundingmakes exact balances impossible. Drop it and assertions flip red on a
three-cent discrepancy.
openbooking method ("STRICT","FIFO","LIFO", etc.)controls how lots are matched at sale time. Drop it and capital-gains
reporting changes silently.
Relationship to the escape bug
The escape bug filed as
issue 143is the visiblemanifestation of the same underlying design issue: rustfava maintains a
duplicate, hand-written serializer for beancount source. The escape bug
takes the whole regenerator down with
parse error: unexpected input;this one lets the regenerator succeed but corrupts the data on the way
through.
A
_bean_str()style patch fixes the crash but leaves the silentcorrectness gap. Closing the gap by adding handlers for every dropped
field is possible but reinvents work beancount has already done. The
sustainable fix is the same one suggested at the bottom of the sibling
report: stop maintaining a second serializer.
Suggested fix
Delegate to
beancount.parser.printer.EntryPrinter. It already handlesevery field listed above correctly and is the same code path
bean-formatuses.EntryPrinterdispatches by class name(
getattr(self, type(entry).__name__)), so the existingformat_entry()would work — except rustfava'sRLTransaction/RLOpen/ … aren't subclasses of beancount'sTransaction/Open/ …,so the dispatch table misses them. Two minimal ways to fix that:
Register the RL* types by subclassing
EntryPrinteronce andaliasing the methods (or writing tiny adapters where field names
diverge):
Make
RL*types subclasses of beancount'sNamedTuple-baseddirectives, or register them as virtual subclasses. Then the stock
format_entry()works untouched.Either path replaces ~90 lines of
_directive_to_sourcewith somethingan order of magnitude smaller, and inherits every escape rule and edge
case the beancount maintainers have already worked out.
Repro output (verbatim)
Environment
main(f43a15b/40d0e8c/8d54804, 2026-01-24) through v1.30.12Steps to Reproduce
ledger.beancountparses cleanly withrledger check. Run$ ~/.local/share/uv/tools/rustfava/bin/python3 repro.py…and you get a unified diff between the original ledger and the source
rustfava feeds rledger, followed by five BQL queries that all return the
wrong thing.
Rustfava Version
1.30.12, main
Python Version
3.14.5
Operating System
macOS
Additional Context
No response