New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) allow more postings in csv-imported transactions #630

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@MatthiasKauer
Contributor

MatthiasKauer commented Oct 19, 2017

Refers to #627

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Oct 19, 2017

Owner

@MatthiasKauer could you amend this commit to fix the failing test and re-push it (force-push to this same PR is fine).

Owner

simonmichael commented Oct 19, 2017

@MatthiasKauer could you amend this commit to fix the failing test and re-push it (force-push to this same PR is fine).

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Oct 19, 2017

Owner

PS also rebase against latest master if possible.

Owner

simonmichael commented Oct 19, 2017

PS also rebase against latest master if possible.

@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Oct 23, 2017

Contributor

I have taken a look and I'm not sure how this case should be supported in the future.

My idea would be to remove support for "amount-in" and "amount-out" entirely.
The csv file from the test

10/2009/09,Flubber Co,50,
11/2009/09,Flubber Co,,50

could then be parsed by rules

account1 Assets:MyAccount
date %1
date-format %d/%Y/%m
description %2
amount1 %3
amount2 %4
currency $

First of all, are you okay with such a breaking change?

If so, could you give me a pointer to the implementation?
As I understand it now, getAmount returns a MixedAmount or throws an error.
I need to allow some such errors to occur for some of the amounts so amount2 isn't mandatory.
Having neither amount1 nor amount2 should constitute an error though, so the group of amounts needs to be checked.
Once this is done, I need to create postings with empty amount such that hledger later infers the amount. Is this possible or are you enforcing the amount? If it is enforced can I somehow use the same deduction function hledger uses to calculate the missing amount here. With values in amt1, amt2, amt3, and the existence of amt4, I would then infer the value of amt4.

If postings without amount are allowed, I would simply output the following, however:

2009/09/10 Flubber Co
    Assets:MyAccount             $50
    income:unknown

2009/09/11 Flubber Co
    Assets:MyAccount
    expenses:unknown        $50
Contributor

MatthiasKauer commented Oct 23, 2017

I have taken a look and I'm not sure how this case should be supported in the future.

My idea would be to remove support for "amount-in" and "amount-out" entirely.
The csv file from the test

10/2009/09,Flubber Co,50,
11/2009/09,Flubber Co,,50

could then be parsed by rules

account1 Assets:MyAccount
date %1
date-format %d/%Y/%m
description %2
amount1 %3
amount2 %4
currency $

First of all, are you okay with such a breaking change?

If so, could you give me a pointer to the implementation?
As I understand it now, getAmount returns a MixedAmount or throws an error.
I need to allow some such errors to occur for some of the amounts so amount2 isn't mandatory.
Having neither amount1 nor amount2 should constitute an error though, so the group of amounts needs to be checked.
Once this is done, I need to create postings with empty amount such that hledger later infers the amount. Is this possible or are you enforcing the amount? If it is enforced can I somehow use the same deduction function hledger uses to calculate the missing amount here. With values in amt1, amt2, amt3, and the existence of amt4, I would then infer the value of amt4.

If postings without amount are allowed, I would simply output the following, however:

2009/09/10 Flubber Co
    Assets:MyAccount             $50
    income:unknown

2009/09/11 Flubber Co
    Assets:MyAccount
    expenses:unknown        $50
@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Oct 23, 2017

Contributor

I have played with the REPL some more now and also read the docs for Amount
and MixedAmount.
It appears that an empty Mixed amount can be created via Mixed [] but it prints 0 and I'm not sure how it would print in a posting yet.
An alternative I have found is encoding the error in a maybe.

either (\x -> Nothing) (Just . Mixed . (:[])) $ runParser (evalStateT (amountp <* eof) mempty) "" $ pack "USD -10"
<interactive>:49:10: warning: [-Wunused-matches]
    Defined but not used: ‘x’
Just USD -10

This loses the error message, however, and either should be just as flexible as maybe anyway, right?

New plan: Parse all amount values into a list [Either (ParseError Char MPErr) Amount]. Ensure that the number of "right" values is at least "number of specified accounts - 1". If so, calculate the implied amount somehow. If not, output an error containing all individual parse errors.

What do you think?

Contributor

MatthiasKauer commented Oct 23, 2017

I have played with the REPL some more now and also read the docs for Amount
and MixedAmount.
It appears that an empty Mixed amount can be created via Mixed [] but it prints 0 and I'm not sure how it would print in a posting yet.
An alternative I have found is encoding the error in a maybe.

either (\x -> Nothing) (Just . Mixed . (:[])) $ runParser (evalStateT (amountp <* eof) mempty) "" $ pack "USD -10"
<interactive>:49:10: warning: [-Wunused-matches]
    Defined but not used: ‘x’
Just USD -10

This loses the error message, however, and either should be just as flexible as maybe anyway, right?

New plan: Parse all amount values into a list [Either (ParseError Char MPErr) Amount]. Ensure that the number of "right" values is at least "number of specified accounts - 1". If so, calculate the implied amount somehow. If not, output an error containing all individual parse errors.

What do you think?

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Oct 23, 2017

Owner

Making amount-in/amount-out unnecessary sounds nice! I'd be fine with that if there's another way to do it and things are still robust. Amount inferring is done by balanceTransaction and journalBalanceTransactions. Note how fearsomely complicated these are, partly to support balance assignment. Perhaps it's simpler when reading CSV, or perhaps not. Experimentation needed for sure.

Owner

simonmichael commented Oct 23, 2017

Making amount-in/amount-out unnecessary sounds nice! I'd be fine with that if there's another way to do it and things are still robust. Amount inferring is done by balanceTransaction and journalBalanceTransactions. Note how fearsomely complicated these are, partly to support balance assignment. Perhaps it's simpler when reading CSV, or perhaps not. Experimentation needed for sure.

MatthiasKauer added some commits Oct 16, 2017

transformed to (Maybe Account, Either Amount)
Accounts can be specified or not.
Amounts may be successfully parsed or not.
Currently not further differentiating whether they are even specified in
the first place. This should go hand in hand with the account spec.
Adjust failing tests
"amount" doesn't exist anymore so tests have to be adjusted.
3 tests still fail because:

- balance assignments are not working
- postings do not arrive in the correct order (which is also the problem
  for balance assignments)
@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Oct 25, 2017

Contributor

Although far from finished, I have made further progress with this.
I can now handle the situation where amount1 and amount2 (previously amount-in/out) are specified.

Current problems:

  • postings are not in the correct order. I intend to fix that by creating a type for (Account, Amount, Account Idx) instead of the pairs (Account, Amount) I now use and sorting by Idx later on.
  • Error messages are not there yet. This will hopefully not be too hard.
  • Rule validation should be adjusted
  • Docs should be adjusted
  • my usage of costOfMixedAmount (-amt))) should be tested

Do you have further advice? Anything I am not thinking about?

Contributor

MatthiasKauer commented Oct 25, 2017

Although far from finished, I have made further progress with this.
I can now handle the situation where amount1 and amount2 (previously amount-in/out) are specified.

Current problems:

  • postings are not in the correct order. I intend to fix that by creating a type for (Account, Amount, Account Idx) instead of the pairs (Account, Amount) I now use and sorting by Idx later on.
  • Error messages are not there yet. This will hopefully not be too hard.
  • Rule validation should be adjusted
  • Docs should be adjusted
  • my usage of costOfMixedAmount (-amt))) should be tested

Do you have further advice? Anything I am not thinking about?

@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Oct 31, 2017

Contributor

@simonmichael Should I continue with the points I discussed above?

Contributor

MatthiasKauer commented Oct 31, 2017

@simonmichael Should I continue with the points I discussed above?

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 1, 2017

Owner
Owner

simonmichael commented Nov 1, 2017

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 1, 2017

Owner

Do these commits both increase the number of postings that can be generated (allowing 2, 3 or 4), and also remove the need for amount-in/amount-out ? Are these things necessarily connected ?

Owner

simonmichael commented Nov 1, 2017

Do these commits both increase the number of postings that can be generated (allowing 2, 3 or 4), and also remove the need for amount-in/amount-out ? Are these things necessarily connected ?

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 1, 2017

Owner

I'm not sure what the "transformed to (Maybe Account, Either Amount)" commit is for, either from the code changes or the description ?
Why must the "amount" keyword be dropped ? We don't really want to break all existing CSV rules.

Owner

simonmichael commented Nov 1, 2017

I'm not sure what the "transformed to (Maybe Account, Either Amount)" commit is for, either from the code changes or the description ?
Why must the "amount" keyword be dropped ? We don't really want to break all existing CSV rules.

@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Nov 2, 2017

Contributor

Thanks for getting back to me. Let me address your comments.

Do these commits both increase the number of postings that can be generated (allowing 2, 3 or 4), and also remove the need for amount-in/amount-out ? Are these things necessarily connected ?

Increasing the number of postings is the intention. amount-in/amount-out have been replaced with amount1/amount2 because I found it tricky to handle them in the presence of amount2.

Why must the "amount" keyword be dropped?

Fair point. We should just treat amount as an alias of amount1, shouldn't we?

Contributor

MatthiasKauer commented Nov 2, 2017

Thanks for getting back to me. Let me address your comments.

Do these commits both increase the number of postings that can be generated (allowing 2, 3 or 4), and also remove the need for amount-in/amount-out ? Are these things necessarily connected ?

Increasing the number of postings is the intention. amount-in/amount-out have been replaced with amount1/amount2 because I found it tricky to handle them in the presence of amount2.

Why must the "amount" keyword be dropped?

Fair point. We should just treat amount as an alias of amount1, shouldn't we?

@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Nov 2, 2017

Contributor

I'm not sure what the "transformed to (Maybe Account, Either Amount)" commit is for, either from the code changes or the description ?

Here I am trying to deal with the fact that account1, 2, 3 and amount1, 2, 3 may or may not be specified in the rules file and/or not be present in the csv file. If an account is not specified, we don't need an amount for it but maybe we should warn if someone specifies amount3 without account3.
A missing amount, however, has to be supported to have the functionality that amount-in/amount-out fulfilled previously. In this commit, I am trying to calculate the one missing amount which is implied by the others.

I don't know a priori which of the amounts would be missing so I worked on this list of Maybe and Either pairs.
This is probably not idiomatic at all given my limited Haskell experience.

Contributor

MatthiasKauer commented Nov 2, 2017

I'm not sure what the "transformed to (Maybe Account, Either Amount)" commit is for, either from the code changes or the description ?

Here I am trying to deal with the fact that account1, 2, 3 and amount1, 2, 3 may or may not be specified in the rules file and/or not be present in the csv file. If an account is not specified, we don't need an amount for it but maybe we should warn if someone specifies amount3 without account3.
A missing amount, however, has to be supported to have the functionality that amount-in/amount-out fulfilled previously. In this commit, I am trying to calculate the one missing amount which is implied by the others.

I don't know a priori which of the amounts would be missing so I worked on this list of Maybe and Either pairs.
This is probably not idiomatic at all given my limited Haskell experience.

@simonmichael simonmichael added the csv label Nov 5, 2017

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 5, 2017

Owner

Thanks @MatthiasKauer. With your comments and a second read-through I understand better. I know this is WIP; it breaks some tests as you know, it doesn't tackle docs yet, and I find it still a bit too tricky to know what's going on. Perhaps these commits would be better combined into one, with more of the old stuff cleaned out. Or we could find some self-complete, semantically meaningful, but smaller steps. I will see if I can come up with anything, unless you beat me to it.

Owner

simonmichael commented Nov 5, 2017

Thanks @MatthiasKauer. With your comments and a second read-through I understand better. I know this is WIP; it breaks some tests as you know, it doesn't tackle docs yet, and I find it still a bit too tricky to know what's going on. Perhaps these commits would be better combined into one, with more of the old stuff cleaned out. Or we could find some self-complete, semantically meaningful, but smaller steps. I will see if I can come up with anything, unless you beat me to it.

@MatthiasKauer

This comment has been minimized.

Show comment
Hide comment
@MatthiasKauer

MatthiasKauer Nov 6, 2017

Contributor

Thank you for taking another look at this @simonmichael
I realize that I have been somewhat lax with my commit messages here. I apologize and will try to include more information in the next commits or the accompanying messages here.

Regarding squashing commits: I would do that just before merging. Now, the separation still helps me to understand my thought process along the way and lets me back out to intermediate states.
You can see the overall diff on github too, no?

As I mentioned above, I think the main functional challenge is currently that postings are not in the correct order (meaning account3 may appear before account1).
After that, there are many fixes on the boundary: error reporting, docs, testing.

We could accept that postings are not in order but we don't have to. I believe it can be fixed by creating a triple type for (Account, Amount, Account Idx) instead of the pairs (Account, Amount) I now use and sorting by Idx later on. Is that a good approach?

Although I won't (be able to) stop you from implementing the changes yourself, I am quite interested in bringing this to a conclusion. Should I continue with what I suggested? Or should we break this up into smaller steps like: 1) replace amount-in/out with amount(1)/amount2 functionality, 2) allow additional accounts with amounts that must be present (cannot be implied) 3) allow implied amounts in all accounts

What do you think?

Contributor

MatthiasKauer commented Nov 6, 2017

Thank you for taking another look at this @simonmichael
I realize that I have been somewhat lax with my commit messages here. I apologize and will try to include more information in the next commits or the accompanying messages here.

Regarding squashing commits: I would do that just before merging. Now, the separation still helps me to understand my thought process along the way and lets me back out to intermediate states.
You can see the overall diff on github too, no?

As I mentioned above, I think the main functional challenge is currently that postings are not in the correct order (meaning account3 may appear before account1).
After that, there are many fixes on the boundary: error reporting, docs, testing.

We could accept that postings are not in order but we don't have to. I believe it can be fixed by creating a triple type for (Account, Amount, Account Idx) instead of the pairs (Account, Amount) I now use and sorting by Idx later on. Is that a good approach?

Although I won't (be able to) stop you from implementing the changes yourself, I am quite interested in bringing this to a conclusion. Should I continue with what I suggested? Or should we break this up into smaller steps like: 1) replace amount-in/out with amount(1)/amount2 functionality, 2) allow additional accounts with amounts that must be present (cannot be implied) 3) allow implied amounts in all accounts

What do you think?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 6, 2017

ghost commented Nov 6, 2017

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 6, 2017

Owner

Cleanups now deployed at http://hledger.org/csv.html - just expanded the Description so far. I am done for today. Happy to comment on end-user doc ideas/drafts.

I won't have much time for code review in the coming week. As a simplifying constraint, let's say I'll prioritise reviewing "merge-ready" commits - rebased against master, free of regressions, clear etc.

Owner

simonmichael commented Nov 6, 2017

Cleanups now deployed at http://hledger.org/csv.html - just expanded the Description so far. I am done for today. Happy to comment on end-user doc ideas/drafts.

I won't have much time for code review in the coming week. As a simplifying constraint, let's say I'll prioritise reviewing "merge-ready" commits - rebased against master, free of regressions, clear etc.

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 6, 2017

Owner

PS and please excuse me for getting a bit more "efficient" with this, but I can't afford more. If you can attract more eyeballs/reviewers to this topic, that might be another way to get more input.

Owner

simonmichael commented Nov 6, 2017

PS and please excuse me for getting a bit more "efficient" with this, but I can't afford more. If you can attract more eyeballs/reviewers to this topic, that might be another way to get more input.

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Nov 6, 2017

Owner

Design thoughts posted, back on #627.

Owner

simonmichael commented Nov 6, 2017

Design thoughts posted, back on #627.

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael Mar 25, 2018

Owner

Closing, see #627.

Owner

simonmichael commented Mar 25, 2018

Closing, see #627.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment