Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify amount parsing/rendering/commodity directives #793

Open
simonmichael opened this issue Jun 1, 2018 · 6 comments
Open

clarify amount parsing/rendering/commodity directives #793

simonmichael opened this issue Jun 1, 2018 · 6 comments
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. journal The journal file format, and its features. needs:code To unblock: needs code/code updates needs:docs To unblock: needs corresponding documentation or doc updates needs:tests To unblock: needs more automated tests or test updates

Comments

@simonmichael
Copy link
Owner

simonmichael commented Jun 1, 2018

Here are some notes and actions aimed at improving amount parsing, rendering, and commodity directives, from this mail thread which spun off from #698. Other issues that might be affected:
#489 #561 #688

(Here's a shorter summary of the points below.)

problems

Problems with amount parsing/rendering in current master:

  • numbers are parsed loosely, accepting any of two decimal separators
  • we don't warn about inconsistent choice of decimal separator across amounts
  • ambiguous-separator amounts can be silently misparsed (a single digit group separator is interpreted as a decimal separator)
  • commodity directives, the recommended solution, have unclear semantics
    • they are used for: declaring commodity symbols, resolving input decimal separator ambiguity, controlling output style
    • the directive's scope for each of of these is non-obvious. Should subfiles be affected ? other files ? transactions in the same file preceding the directive ?
  • commodity directives used to resolve the input decimal separator also limit output style
    • they force digit group separators to appear in output
    • they force the output decimal separator to match the input
  • D directives do all that commodity directives do and more, adding complexity
  • allowing commodities to have different input decimal separators within a file is excessive flexibility
  • allowing commodities to have different output decimal separator within a report is excessive flexibility
  • lack of clarity makes this annoying to learn, costly to support; ongoing issue reports

goals

  • be convenient and intuitive
  • be i18n-aware
  • detect and report errors, avoid guessing
  • don't require learning detailed rules
  • in every situation, do something that's sensible at least in hindsight
  • keep as much backward and sideways compatibility as possible
  • end confusion and bug reports about basic number parsing and rendering

short term

clarify how commodity directives can control both input and output

  • a directive for a commodity sets its input decimal point for the rest of the current file, exclusive of subfiles
  • a directive for a commodity declares its symbol's validity for the rest of the current file, exclusive of subfiles
  • the first directive for a commodity across all files sets its output style in the report

parse decimal separators more carefully

  • commodities and input decimal separators are always declared together, and per file
  • at the start of each file (and each included file), no commodities or input decimal separators are known
  • parsing a commodity directive
    • declares the commodity and its input decimal separator for the current file
    • the directive's amount must have a separator
    • if the separator is ambiguous is it assumed to be the decimal separator. Could display a warning.
    • if this is the first directive for this commodity among all files: also declares the commodity's output style for the report
  • parsing a definite-separator amount whose commodity is not yet declared, has the same effect as a commodity directive. Could display a warning. In a future strict mode, this will raise an error.
  • parsing an ambiguous-separator amount whose commodity is not yet declared, has the same effect as a commodity directive and also displays a warning (one per file)
  • parsing an amount whose decimal separator is inconsistent with the one declared for its commodity, raises an error

clarify docs

  • "use commodity directives in each file to help parse it correctly, declaring commodity symbols and the decimal separator used"
  • "use commodity directives in the first/uppermost file to help control each commodity's output style, declaring the symbol position, digit groups, decimal separator, and number of decimal places"

simplify D directive if it gives any trouble

  • D only specifies a default symbol, eg: D $ . The old syntax is accepted for backwards compatibility but only the symbol matters.

medium term

  • add a decimal or decimal-separator directive to set that once per file. "decimal ,"
  • add an alternate simpler form of commodity directive that just defines a symbol. "commodity $". And perhaps allow declaring multiple symbols with one directive.
  • use system locale to choose a default output decimal separator
  • add an --amount-style command line option that overrides output style, for individual commodities or all commodities.
  • add a strict mode that does more error checking
@simonmichael simonmichael added A-WISH Some kind of improvement request, hare-brained proposal, or plea. journal The journal file format, and its features. help wanted needs:docs To unblock: needs corresponding documentation or doc updates needs:tests To unblock: needs more automated tests or test updates needs:code To unblock: needs code/code updates labels Jun 1, 2018
@awjchen
Copy link
Collaborator

awjchen commented Jun 10, 2018

I haven't been participating more in this discussion because I know I have no experience designing syntax and have little experience with hledger. But since there aren't yet any other comments, it might be worth saying something, even if only to summarize your points.

Summary

If I understand correctly, the major proposed short-term change to hledger's parser are to associate a unique decimal separator to every commodity so that the numerical value of commoditized amounts is unambiguous. (Output styles and user documentation are addressed as well, but I will focus on the parser, since that is the only part of hledger I know anything about.)

The decimal separator for a commodity will be determined by a commodity directive, if present; otherwise, it the decimal separator for a commodity will be determined by the first encountered amount of that commodity. In either case, the example amount must contain a separator that can be interpreted as a decimal separator (where we will assume that a separator is a decimal separator wherever we can), or else an error will be thrown (?).

The medium-term objectives seem to be refinements of the main idea of explicitly declaring a decimal separator.

Comments

It looks like the focus on the decimal separator, as opposed to e.g. the digit group separators, is sound since the decimal separator is the only separator that affects the interpretation of an amount.

Given that we want to throw errors when encountering decimal separator inconsistency, the proposed short-term changes seem minimal, which is good. No new syntax is introduced, for instance.

This is a breaking change for users that have mixed amount styles for a single commodity in a single journal, but perhaps this is not common practice?

Alternatives

Also, are there other feasible alternatives to disambiguating the parsing of amounts? I can't imagine an alternative to the proposed change that would be better for maintaining backwards and sideways compatibility. One alternative I can imagine, one that would be less sideways compatible, would be to have the user choose amount parsing styles from a number of pre-set styles. In particular, these styles would determine the decimal separator, or the absence thereof. The idea is that, since there are only a finite number of conventions for numbers or monetary amounts, we might be able to write specialized parsers for each one.

If we are indeed able to implement parsers for all (or at least most?) conventional systems, I imagine it would be convenient for the user to simply choose one of them; rather than learning the rules for specification, they would only need to learn the name of the system they already know. I also think it could be beneficial to restrict the syntax for amounts to that of conventional systems, in particular for the purpose of exporting or otherwise communicating hledger data. Furthermore, by using specialized parsers, we might be able to more cleanly handle "exotic" systems (e.g. checking that the Japanese digit group separators in issue #796 appear in a certain order and are not repeated). For backwards compatibility, maybe we could retain the current amount parser (or the currently proposed amount parser?) as the default parser.

@simonmichael
Copy link
Owner Author

simonmichael commented Jun 11, 2018 via email

@alerque
Copy link
Collaborator

alerque commented Jun 20, 2019

Cross linking related comment on symbols in currencies.

Repository owner deleted a comment from awchen Jul 11, 2019
@simonmichael
Copy link
Owner Author

simonmichael commented Sep 25, 2019

Another example of confusing behavior, via @bradyt. Not yet understood.

; problems with amount parsing/display
; cf https://github.com/simonmichael/hledger/issues/793

2019/09/24
    a                  2,000.00
    b                  1,000
    c

2019/09/26
    (d)             2000,00

comment

$ hledger print
2019/09/24
    a       2,000,000
    b           1,000
    c

2019/09/26
    (d)       2,000,000

$ hledger -f a.j reg a amt:'2000'
2019/09/24                   a                    2,000,000     2,000,000

[Opened as #1091].

simonmichael added a commit that referenced this issue Sep 28, 2019
And if they did, the stats command would now throw an error.

Changed:
journalApplyCommodityStyles
journalInferCommodityStyles
commodityStylesFromAmounts
@simonmichael
Copy link
Owner Author

Related: currently it seems not possible for hledger to display a decimal mark different from the one in the journal. Eg, you can't print reports with a . decimal point from this journal:

2020-01-01
    (a)         $1,23

The multi-commodity-directive behaviour proposed above (first one sets input decimal mark, most recent sets output decimal mark) would allow it, as would a dedicated decimal or decimal-mark directive.

@simonmichael
Copy link
Owner Author

https://hledger.org/csv.html#decimal-mark rule added in master, for CSV files. We might want to add it to journal format also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. journal The journal file format, and its features. needs:code To unblock: needs code/code updates needs:docs To unblock: needs corresponding documentation or doc updates needs:tests To unblock: needs more automated tests or test updates
Projects
None yet
Development

No branches or pull requests

3 participants