clarify amount parsing/rendering/commodity directives #793
Here are some notes and actions aimed at improving amount parsing, rendering, and commodity directives, from this mail thread which spun off from #698. Other issues that might be affected:
(Here's a shorter summary of the points below.)
Problems with amount parsing/rendering in current master:
clarify how commodity directives can control both input and output
parse decimal separators more carefully
simplify D directive if it gives any trouble
The text was updated successfully, but these errors were encountered:
I haven't been participating more in this discussion because I know I have no experience designing syntax and have little experience with hledger. But since there aren't yet any other comments, it might be worth saying something, even if only to summarize your points.
If I understand correctly, the major proposed short-term change to hledger's parser are to associate a unique decimal separator to every commodity so that the numerical value of commoditized amounts is unambiguous. (Output styles and user documentation are addressed as well, but I will focus on the parser, since that is the only part of hledger I know anything about.)
The decimal separator for a commodity will be determined by a commodity directive, if present; otherwise, it the decimal separator for a commodity will be determined by the first encountered amount of that commodity. In either case, the example amount must contain a separator that can be interpreted as a decimal separator (where we will assume that a separator is a decimal separator wherever we can), or else an error will be thrown (?).
The medium-term objectives seem to be refinements of the main idea of explicitly declaring a decimal separator.
It looks like the focus on the decimal separator, as opposed to e.g. the digit group separators, is sound since the decimal separator is the only separator that affects the interpretation of an amount.
Given that we want to throw errors when encountering decimal separator inconsistency, the proposed short-term changes seem minimal, which is good. No new syntax is introduced, for instance.
This is a breaking change for users that have mixed amount styles for a single commodity in a single journal, but perhaps this is not common practice?
Also, are there other feasible alternatives to disambiguating the parsing of amounts? I can't imagine an alternative to the proposed change that would be better for maintaining backwards and sideways compatibility. One alternative I can imagine, one that would be less sideways compatible, would be to have the user choose amount parsing styles from a number of pre-set styles. In particular, these styles would determine the decimal separator, or the absence thereof. The idea is that, since there are only a finite number of conventions for numbers or monetary amounts, we might be able to write specialized parsers for each one.
If we are indeed able to implement parsers for all (or at least most?) conventional systems, I imagine it would be convenient for the user to simply choose one of them; rather than learning the rules for specification, they would only need to learn the name of the system they already know. I also think it could be beneficial to restrict the syntax for amounts to that of conventional systems, in particular for the purpose of exporting or otherwise communicating hledger data. Furthermore, by using specialized parsers, we might be able to more cleanly handle "exotic" systems (e.g. checking that the Japanese digit group separators in issue #796 appear in a certain order and are not repeated). For backwards compatibility, maybe we could retain the current amount parser (or the currently proposed amount parser?) as the default parser.
Thanks for the input @awjchen. I would say it this way: Currently the input decimal separator is detected or guessed for each individual amount. This is too loose, allowing numbers to be quietly misparsed. As a workaround we allow commodity directives to specify it unambiguously, but this has several problems due to unclear semantics and scope. Overall the current setup causes confusion and bug reports. The proposal aims to make this more intuitive and robust by parsing decimal separators more strictly and tightening up the semantics: 1. Input decimal separator will be detected or guessed within each file, and for each commodity in that file. (We don't really need or want it to vary across commodities, but given current syntax it's easier to allow that.) 2. While parsing the amounts in a file, any inconsistency with the file-wide input decimal separator, or any guessing required, will be reported as a warning or error. 3. Commodity directives will have clearer semantics (first one sets output decimal separator, most recent one in parse stream sets input decimal separator) and scope (affecting subsequent entries in the current file only).
Another example of confusing behavior, via @bradyt. Not yet understood.
[Opened as #1091].
Related: currently it seems not possible for hledger to display a decimal mark different from the one in the journal. Eg, you can't print reports with a
The multi-commodity-directive behaviour proposed above (first one sets input decimal mark, most recent sets output decimal mark) would allow it, as would a dedicated