Balance Assignments and accounts resetting #438

Merged
merged 13 commits into from Dec 10, 2016

Projects

None yet

3 participants

@johannesgerer
Contributor
johannesgerer commented Dec 7, 2016 edited

This is my first version (untested). I will be testing it tomorrow.

Be invited to start reviewing it now.

See #129 #288 #157

@johannesgerer
Contributor

It compiles now and passes all tests (including new ones).

Notes: It mimics ledger's behavior in ignoring temporal order and strictly using parse order for both the transactions and the postings.

Core functionality in concisely implemented in Hledger.Data.Journal.inferOrCheckPosting taking only 7 lines.

@johannesgerer johannesgerer changed the title from Balance Assignments and accounts resetting (preview) to Balance Assignments and accounts resetting Dec 7, 2016
@simonmichael
Owner

Great!

First comment, I'm sorry but abandoning temporal order won't work. Ledger's parse-order behaviour is easier to implement but more confusing for users, hledger and Beancount have it right here.

@johannesgerer
Contributor

I understand. Temporal order is nice to have. I implemented it, with the restriction, that assignments (i.e. postings with assertions but no amount) are only allowed in transactions where postings have no pdate defined. The more general problem is much more complex.

Time complexity should not worse than before.

@simonmichael
Owner

Nice! Quick thoughts:

Would you mind squashing these into one or more logical patches for easy review.

There's a typo you probably noticed already (cf travis).

Your restriction sounds quite reasonable.

How necessary is unsafeFreeze ? Someone (me) may overlook it and cause a bug in future.

Could you add haddocks for a few more of the new top-level functions ? One sentence is fine. It may seem redundant but I find it quite helpful.

@johannesgerer
Contributor

Ok. unsafeFreeze is gone.

I have added haddock to everything.

And cleaned it up in 5 separate commits.

@simonmichael

Overall I like this a lot. I think it is not easy code for future hledger hackers to understand, eg at handleObject I gave up (for now). But I'm not sure much can be done about it; the feature is inherently complex. It seems to me you've done a good job encapsulating everything in a principled and compact way, and perhaps we will find small improvements in future making it more transparent. And your tests give confidence. I'll do some additional testing of my own next. Great work @johannesgerer.

@@ -43,7 +43,7 @@ hledger print -f personal.journal -f business.journal -f alias.journal -f person
# 3. files can be of different formats
-hledger print -f personal.journal -f a.timeclock -f b.timedot
+hledger print -f personal.journal -f ../journal/a.timeclock -f ../journal/b.timedot
@simonmichael
simonmichael Dec 8, 2016 Owner

I usually run these tests with "make test", which enables shelltestrunner's --execdir flag which sets the working directory. How are you running them ? I'm not sure if adding this relative path makes them more robust or more fragile.

@johannesgerer
johannesgerer Dec 8, 2016 Contributor

They failed with make test. This change is required, because these journal are in a different directory. (And the way, they were relative even before the change.)

@simonmichael
simonmichael Dec 9, 2016 edited Owner

Doh! I have uncommitted symlinks. Thanks.

@@ -463,8 +470,8 @@ journalApplyAliases aliases j@Journal{jtxns=ts} =
-- check balance assertions.
journalFinalise :: ClockTime -> FilePath -> Text -> Bool -> ParsedJournal -> Either String Journal
journalFinalise t path txt assrt j@Journal{jfiles=fs} = do
- (journalNumberAndTieTransactions <$>
- (journalBalanceTransactions $
+ (journalTieTransactions <$>
@simonmichael
simonmichael Dec 8, 2016 Owner

Can you explain more why numbering transactions is no longer needed ? Is it not used for eg the transaction ids in print -O csv ?

@johannesgerer
johannesgerer Dec 8, 2016 Contributor

The numbering now is performed in journalBalanceTransactions, which uses it to get back the parsing order after iterating over time ordered postings.

hledger-lib/Hledger/Data/Types.hs
@@ -198,7 +198,7 @@ data Posting = Posting {
pcomment :: Text, -- ^ this posting's comment lines, as a single non-indented multi-line string
ptype :: PostingType,
ptags :: [Tag], -- ^ tag names and values, extracted from the comment
- pbalanceassertion :: Maybe MixedAmount, -- ^ optional: the expected balance in the account after this posting
+ pbalanceassertion :: Maybe Amount, -- ^ optional: the expected balance in the account after this posting
@simonmichael
simonmichael Dec 8, 2016 Owner

IIRC the doc should be changed to "the expected balance in this commodity in the account after this posting". I can fix it later.

@johannesgerer
johannesgerer Dec 8, 2016 Contributor

i fixed it.

@@ -136,3 +136,140 @@ hledger -f - stats
# >>> /Transactions/
# >>>2
# >>>=0
+
@simonmichael
simonmichael Dec 8, 2016 Owner

Tests! These make me very happy :)

@simonmichael
simonmichael Dec 8, 2016 edited Owner

Would you like to add at least a rough draft of the user docs for this ? In the journal format manual next to balance assertions, I guess. Or I can do it if you prefer.

@johannesgerer
johannesgerer Dec 8, 2016 Contributor

I think it would be beneficial, I you changed the docs, because you see the whole picture and this also helps check again, if everything works as expected.

@simonmichael
simonmichael Dec 9, 2016 Owner

Ok will do.

@simonmichael
Owner

This assertion fails with the new code:

1/1
    a      $1
    b      -1 zorkmids

1/2
    a     $-1 = $0
    b
@johannesgerer
Contributor

Ok, thanks for spotting this! I did not handle prices correctly.

It now removes all prices before checking any balances. It this ok?

The test case is now included in the tests and works!

@johannesgerer
Contributor
johannesgerer commented Dec 9, 2016 edited

What can I do to make it more accessible?

handleObject receives different objects ordered by date. It can handle either a single posting (from an already balanced transaction without assigments) or a whole transaction with assignments (which is required to no posting with pdate set.).

For a single posting, there is not much to be done. Only add its amount to its account and check the assertion, if there is one. This functionality is provided by addAmountAndCheckBalance.

For a whole transaction, it loops over all postings, and performs addAmountAndCheckBalance, if there is an amount. If there is no amount, the amount is inferred by the assertion or left empty if there is no assertion. Then, the transaction is balanced, the amount added to the balance (all in balanceTransactionUpdate) and the resulting transaction with no missing amounts is stores in the array, for later retrieval.

@simonmichael
Owner

Thanks for the fix. Re code clarity, I would look at names first, I often find more informative names for things after a little time passes. Here's a few top-of-the-head ideas:

journalBalanceTransactions' -> journalBalanceTransactionsStatefully, journalProcessAssignmentsAndAssertions, ... ? By the way the presence of the boolean assrt parameter makes me wonder if checking assertions can be separated out and done as an outer step after processing assignments, or not ?

BalanceState -> BalanceStateM, BalancingComputation or something, hinting that it's a monad not a state value ?

toDated, "This converts a transaction into a list of objects" - actually it returns a computation generating that list, right ? toDated -> findAssertionDependencies ?

handleObject -> ? I can't come up with a clear name for this, which is usually a bad sign. It seems to do two fairly different jobs, doesn't it ? Maybe it can be refactored. And/or add your helpful comment above to it.

After names feel good, I would look at code, to see if the hardest functions could have an easier to understand implementation.

But you have good haddocks in place and the code doesn't affect other modules, so none of this will delay merging.

@simonmichael
Owner

Here's some benchmarking, of the balance command with the 10k sample file (10k transactions, 0 assertions, 0 assignments):

$ quickbench -N 3 -w hledger,hledger-assignments 'hledger -f data/10000x1000x10.journal balance' 'hledger -f data/10000x1000x10.journal balance --ignore-assertions'
Running 2 tests 1 times with 2 executables at 2016-12-09 13:31:00 PST:

Best times 1:
+-----------------------------------------------------------++---------+---------------------+
|                                                           || hledger | hledger-assignments |
+===========================================================++=========+=====================+
| -f data/10000x1000x10.journal balance                     ||    1.82 |                1.90 |
| -f data/10000x1000x10.journal balance --ignore-assertions ||    1.69 |                1.90 |
+-----------------------------------------------------------++---------+---------------------+

Best times 2:
+-----------------------------------------------------------++---------+---------------------+
|                                                           || hledger | hledger-assignments |
+===========================================================++=========+=====================+
| -f data/10000x1000x10.journal balance                     ||    1.84 |                1.90 |
| -f data/10000x1000x10.journal balance --ignore-assertions ||    1.74 |                1.89 |
+-----------------------------------------------------------++---------+---------------------+

Best times 3:
+-----------------------------------------------------------++---------+---------------------+
|                                                           || hledger | hledger-assignments |
+===========================================================++=========+=====================+
| -f data/10000x1000x10.journal balance                     ||    1.81 |                2.00 |
| -f data/10000x1000x10.journal balance --ignore-assertions ||    1.75 |                1.91 |
+-----------------------------------------------------------++---------+---------------------+

And of my personal journal (1k transactions, 140 assertions, 0 assignments):

$ quickbench -N 3 -w hledger,hledger-assignments 'hledger balance' 'hledger balance --ignore-assertions'Running 2 tests 1 times with 2 executables at 2016-12-09 13:32:26 PST:

Best times 1:
+-----------------------------++---------+---------------------+
|                             || hledger | hledger-assignments |
+=============================++=========+=====================+
| balance                     ||    0.49 |                0.47 |
| balance --ignore-assertions ||    0.43 |                0.48 |
+-----------------------------++---------+---------------------+

Best times 2:
+-----------------------------++---------+---------------------+
|                             || hledger | hledger-assignments |
+=============================++=========+=====================+
| balance                     ||    0.42 |                0.48 |
| balance --ignore-assertions ||    0.40 |                0.42 |
+-----------------------------++---------+---------------------+

Best times 3:
+-----------------------------++---------+---------------------+
|                             || hledger | hledger-assignments |
+=============================++=========+=====================+
| balance                     ||    0.41 |                0.45 |
| balance --ignore-assertions ||    0.38 |                0.44 |
+-----------------------------++---------+---------------------+

Here's another run including the 100k sample file:

Best times:
+--------------------------------------------++---------+---------------------+
|                                            || hledger | hledger-assignments |
+============================================++=========+=====================+
| -f data/100x100x10.journal print           ||    0.06 |                0.06 |
| -f data/1000x1000x10.journal print         ||    0.31 |                0.32 |
| -f data/10000x1000x10.journal print        ||    2.84 |                3.04 |
| -f data/100000x1000x10.journal print       ||   29.63 |               33.51 |
| -f data/100000x1000x10.journal print ff    ||   17.48 |               18.22 |
| -f data/100x100x10.journal register        ||    0.07 |                0.06 |
| -f data/1000x1000x10.journal register      ||    0.36 |                0.36 |
| -f data/10000x1000x10.journal register     ||    3.41 |                3.49 |
| -f data/100000x1000x10.journal register    ||   35.97 |               35.96 |
| -f data/100000x1000x10.journal register ff ||   17.88 |               20.36 |
| -f data/100x100x10.journal balance         ||    0.06 |                0.07 |
| -f data/1000x1000x10.journal balance       ||    0.31 |                0.28 |
| -f data/10000x1000x10.journal balance      ||    1.97 |                2.05 |
| -f data/100000x1000x10.journal balance     ||   19.96 |               20.69 |
| -f data/100000x1000x10.journal balance ff  ||   18.39 |               20.37 |
+--------------------------------------------++---------+---------------------+

I think these numbers are saying the new code brings a slight performance penalty, whether you use assertions or not. With luck it can be optimised further, if not I think it's not a blocker.

@simonmichael
Owner

I should say, I slightly mistrust that last run. I'll post another one shortly.

@simonmichael
Owner
$ quickbench -w hledger,hledger-assignments
Running 15 tests 1 times with 2 executables at 2016-12-09 13:44:16 PST:

Best times:
+--------------------------------------------++---------+---------------------+
|                                            || hledger | hledger-assignments |
+============================================++=========+=====================+
| -f data/100x100x10.journal print           ||    0.07 |                0.05 |
| -f data/1000x1000x10.journal print         ||    0.33 |                0.32 |
| -f data/10000x1000x10.journal print        ||    2.88 |                3.06 |
| -f data/100000x1000x10.journal print       ||   30.72 |               32.19 |
| -f data/100000x1000x10.journal print ff    ||   17.47 |               18.48 |
| -f data/100x100x10.journal register        ||    0.08 |                0.06 |
| -f data/1000x1000x10.journal register      ||    0.38 |                0.34 |
| -f data/10000x1000x10.journal register     ||    3.51 |                3.64 |
| -f data/100000x1000x10.journal register    ||   36.02 |               36.35 |
| -f data/100000x1000x10.journal register ff ||   18.00 |               19.01 |
| -f data/100x100x10.journal balance         ||    0.05 |                0.05 |
| -f data/1000x1000x10.journal balance       ||    0.25 |                0.29 |
| -f data/10000x1000x10.journal balance      ||    1.83 |                2.07 |
| -f data/100000x1000x10.journal balance     ||   18.15 |               18.76 |
| -f data/100000x1000x10.journal balance ff  ||   16.50 |               18.00 |
+--------------------------------------------++---------+---------------------+
@johannesgerer
Contributor
johannesgerer commented Dec 9, 2016 edited

Peformance: I am a bit surprised that the penalty is that is that small, considering that other than in the original version, the whole list of transactions needs to be created in an array and then copied. But I would need to see some profiling to say more.

Assertion check in extra step: The assignments require basically the same loop over temporally sorted objects, which implies that not doing checks there also, would be suboptimal.

Names: I changed some names. Feel free to change anything you want.

Easier implementation: The current implementation looks to me as a direct translation of the verbal description of the code, where every sentence is one line/statement. Plus, my Haskell style might be different from yours. I would mind if you changed it, I do not hang on to code I have written, I am only interested in the feature.

Most important: Do you think you could push the merged version to hackage? I need it for another project of mine I want to push (https://github.com/johannesgerer/buchhaltung).

@johannesgerer
Contributor
johannesgerer commented Dec 9, 2016 edited

PS: Performance: Other than the use of ST there has not been any performance optimization at the expense of simplicity.

The point is, that even without assertions checks, the balances need to be built, in order to be able to infer the assigned amounts. You could add a case to journalBalanceTransactions that simply returns fmap balanceTransaction $ jtxns j if there are no assertions in the whole journal. But this is probably premature optimization.

@simonmichael
Owner

I'll merge this in the next day or two, and it can be improved if needed.

Unfortunately for buchhaltung (which I'm looking forward to see) I think it will be a little longer until it's on hackage. I aim to release 1.1 this month, but I'd like to get a few more things in. I'll try to make it soon.

@simonmichael
Owner
simonmichael commented Dec 10, 2016 edited

PS this only works for installing from source, but you could provide a stack.yaml which can pull dependencies from a branch on github as well as hackage. (For buchhaltung, I mean.)

@johannesgerer
Contributor

What about 1.0.3, as the new functionality is a superset of the existing one?

@simonmichael
Owner
hledger-lib/Hledger/Read.hs
@@ -126,8 +127,11 @@ defaultJournalPath = do
--
readJournalFiles :: Maybe StorageFormat -> Maybe FilePath -> Bool -> [PrefixedFilePath] -> IO (Either String Journal)
readJournalFiles mformat mrulesfile assrt prefixedfiles = do
- (either Left (Right . mconcat) . sequence)
+ (right mconcat1 . sequence)
@ony
ony Dec 10, 2016 Collaborator

Why can't we use mconcat here?

As I understand the only difference between mconcat1 and mconcat is that custom implementation avoids doing mappend (last x) mempty and starts with mappend (last (init x)) (last x).
If there is some issues with that maybe we should re-consider type Journal.

@johannesgerer
Contributor

@ony Yes, this is already merged: #437

@simonmichael simonmichael merged commit 45401e5 into simonmichael:master Dec 10, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@simonmichael
Owner

Documented and merged. Thanks @johannesgerer !

@mstksg mstksg added a commit to mstksg/hledger that referenced this pull request Feb 3, 2017
@johannesgerer @mstksg johannesgerer + mstksg Balance Assignments and accounts resetting (#438)
* Changed behavior of `readJournalFiles` to be identical to `readJournalFile` for singleton lists

* Balance Assertions have to be simple Amounts

* Add 'isAssignment' and 'assignmentPostings' to Hledger.Data.Posting and Transaction

* Implemented 'balanceTransactionUpdate', a more general version of 'balanceTransaction' that takes an update function

* Fixed test cases.

* Implemented balance assignment ("resetting a balance")

* Add assertions to show function

* updated the comments

* numbering is not needed in journalCheckBalanceAssertions

* remove prices before balance checks

* rename functions
345ef32
@mstksg mstksg added a commit to mstksg/hledger that referenced this pull request Feb 3, 2017
@simonmichael @mstksg + mstksg lib: fix typo breaking tests (#438) cfcc521
@mstksg mstksg added a commit to mstksg/hledger that referenced this pull request Feb 3, 2017
@simonmichael @mstksg + mstksg doc: document balance assignments (#438) bc41023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment