Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The print-unique command needs to allow field selection for comparison #1046

Open
alerque opened this issue Jun 11, 2019 · 5 comments
Open
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. needs:code To unblock: needs code/code updates needs:docs To unblock: needs corresponding documentation or doc updates print-unique

Comments

@alerque
Copy link
Collaborator

alerque commented Jun 11, 2019

Related to #943 but not quite the same...

The print-unique command is of limited use without some configuration as to what it actually compares. At the moment it only works on part of the description, the text field proper without the code part in other places considered part of the description. In my use case processing imported transactions I'm actually looking for uniq codes (or in one use case, a combination of the code + description). I can also conceive of wanting to match the amount too to get unique transactions, not just unique payees.

What fields are compared should be configurable.

@alerque alerque added the A-WISH Some kind of improvement request, hare-brained proposal, or plea. label Jun 11, 2019
@simonmichael
Copy link
Owner

simonmichael commented Jun 12, 2019

Interesting idea, though I'm not totally clear on the real world use cases for more powerful uniqueness checking. If you feel this is valuable enough, would you like to try mocking up the UI and docs here ?

@simonmichael simonmichael added needs:design To unblock: needs more thought/planning, leading to a spec/plan needs:docs To unblock: needs corresponding documentation or doc updates needs:mockup/screenshot To unblock: needs a rough mockup, eg in plain text, or a screenshot needs:value-proposition To unblock: needs clearer justification, review of benefits vs costs labels Jun 12, 2019
@alerque
Copy link
Collaborator Author

alerque commented Jun 12, 2019

It's possible that I'm using the wrong tool, but here is my scenario. The only digital format I can get out of my bank is an XLS sheet of the last N transactions. I can download these whenever I want, but the result is inevitably duplicate transactions. I'm converting the XLS to CSV, then importing to Ledger. Here is a sample set of problem entries from the resulting ledger:

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-0240334-802 AİDAT EKİM, KASIM, ARALIK 2017
    Para Transferi                  ₺2880.00
    Assets:TRY:Caleb:Garanti       ₺-2880.00

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-KOMİSYON+BSMV TAHSİLATI-802 AİDAT EKİM, KASIM, AR
    Expenses:Fees:Banking              ₺4,90
    Assets:TRY:Caleb:Garanti          ₺-4,90

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-0510319-802 AİDAT EKİM, KASIM, ARALIK 2018
    Para Transferi                  ₺3060.00
    Assets:TRY:Caleb:Garanti       ₺-3060.00

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-0510319-802 AİDAT EKİM, KASIM, ARALIK 2018
    Para Transferi                  ₺3060.00
    Assets:TRY:Caleb:Garanti       ₺-3060.00

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-KOMİSYON+BSMV TAHSİLATI-802 AİDAT EKİM, KASIM, AR
    Expenses:Fees:Banking              ₺5,50
    Assets:TRY:Caleb:Garanti          ₺-5,50

Note that there are three wire transfers here, but only two of them are unique (with "unique" codes). Deduplicating these works pretty easily with print-unique because even though the code is ignored, the description line is different.

Then there are two entries for fees associated with the two wire transfers. Normally this would be duplicated too but in this case the fee was added later and the first time I imported it didn't have the fee, a later download did.

Deduplicating the fee transactions is harder. The description line should have been unique (by chance of my including the year date in the memo) but the XLS only has truncated values, so these two years are showing the same description line. They have different codes, but print-unique isn't including the code in the comparison. Just using the code wouldn't work either, because the fees have the same code as the transaction they are associated with.

This results in the awkward output of hledger print-unique having removed something that was actually unique, the description just happened to be truncated:

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-0240334-802 AİDAT EKİM, KASIM, ARALIK 2017
    Para Transferi                  ₺2880,00
    Assets:TRY:Caleb:Garanti       ₺-2880,00

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-KOMİSYON+BSMV TAHSİLATI-802 AİDAT EKİM, KASIM, AR
    Expenses:Fees:Banking              ₺4,90
    Assets:TRY:Caleb:Garanti          ₺-4,90

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-0510319-802 AİDAT EKİM, KASIM, ARALIK 2018
    Para Transferi                  ₺3060,00
    Assets:TRY:Caleb:Garanti       ₺-3060,00

Ideally I would be able to use hledger print-unique --fields date,code,description to print and deduplicate transactions where all of the date, code, and description fields are unique and get the following result:

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-0240334-802 AİDAT EKİM, KASIM, ARALIK 2017
    Para Transferi                  ₺2880.00
    Assets:TRY:Caleb:Garanti       ₺-2880.00

2018/01/19 (2018-01-19-08.52.24.534905) INT-EFTEMR-KOMİSYON+BSMV TAHSİLATI-802 AİDAT EKİM, KASIM, AR
    Expenses:Fees:Banking              ₺4,90
    Assets:TRY:Caleb:Garanti          ₺-4,90

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-0510319-802 AİDAT EKİM, KASIM, ARALIK 2018
    Para Transferi                  ₺3060.00
    Assets:TRY:Caleb:Garanti       ₺-3060.00

2018/10/04 (2018-10-04-09.00.51.446265) INT-EFTEMR-KOMİSYON+BSMV TAHSİLATI-802 AİDAT EKİM, KASIM, AR
    Expenses:Fees:Banking              ₺5,50
    Assets:TRY:Caleb:Garanti          ₺-5,50

(And yes, my bank's exports are inconsistent in their use of number formatting! I do clean that up in the next step by filtering the ledger through a print with an explicit commodity format declaration.)

@simonmichael
Copy link
Owner

simonmichael commented Jun 12, 2019

Great example, thanks. Though I have to admit I'm still not really clear. The current print-unique was used for something I don't remember. print-unique --fields=FIELDS (with some default set of fields) sounds good.

Except, if we can avoid options it's always better. Why not always check all fields ?

@alerque
Copy link
Collaborator Author

alerque commented Jun 12, 2019

In my use case, checking all fields would be more useful than the current behavior, BUT I could imagine a use case for not being so strict. In the cases of a journal that has been imported an modified, being able to still flush out duplicates even if comments have been added or categories tweaked would be nice.

As it stands I'm a little unsure about what use case it does currently work for.

@simonmichael
Copy link
Owner

So I guess:

@simonmichael simonmichael added help wanted needs:code To unblock: needs code/code updates and removed needs:design To unblock: needs more thought/planning, leading to a spec/plan needs:mockup/screenshot To unblock: needs a rough mockup, eg in plain text, or a screenshot needs:value-proposition To unblock: needs clearer justification, review of benefits vs costs labels Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. needs:code To unblock: needs code/code updates needs:docs To unblock: needs corresponding documentation or doc updates print-unique
Projects
None yet
Development

No branches or pull requests

2 participants