Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expr: OR queries with date: are not handled correctly #2178

Closed
redneb opened this issue Feb 27, 2024 · 15 comments
Closed

expr: OR queries with date: are not handled correctly #2178

redneb opened this issue Feb 27, 2024 · 15 comments
Labels
A-BUG Something wrong, confusing or sub-standard in the software, docs, or user experience. impact4 Affects more than a few users. queries Filtering options, query arguments.. severity2 Minor to moderate usability/doc bug, reasonably easy to avoid or tolerate.

Comments

@redneb
Copy link

redneb commented Feb 27, 2024

I am trying to use the balance command with an expr: query that includes a date: term, but I am getting incorrect results.

Here's a complete but very minimal example. I have the following journal file (with just one transaction):

2023-12-22
	expense:food            10.00
	asset:wallet

and I am running the following command:

$ hledger balance expr:'(date:2024-01 AND acct:expense:food) OR (date:2023-12 AND acct:expense:drinks)'
               10.00  expense:food
--------------------
               10.00  

As you can see, balance takes into account the transaction that occurred on 2023-12-22 (which is the only transaction in this minimal example), even though it does not match with the date term which is date:2024-01 (i.e. January of 2024). Interestingly enough, if I remove the second operand of OR, which should have been inconsequential as it is a sub-expression that doesn't match with anything, the problem goes away:

$ hledger balance expr:'(date:2024-01 AND acct:expense:food)'
--------------------
                   0  

This is counterintuitive: mathematically speaking, <subexpr1> is superset of <subexpr1> OR <subexpr2>, so removing terms from the disjunction should not make the balance smaller, it should only increase it.

Note that this problem does not affect print, as print seems to pickup the correct transactions. But it does affect register and possibly other commands.

Finally:

$ hledger --version
hledger 1.32.3, linux-x86_64
@simonmichael
Copy link
Owner

Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?

@simonmichael simonmichael added the queries Filtering options, query arguments.. label Feb 27, 2024
@simonmichael simonmichael added the A-BUG Something wrong, confusing or sub-standard in the software, docs, or user experience. label Feb 27, 2024
@redneb
Copy link
Author

redneb commented Feb 28, 2024

Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?

Quite likely.

@johanvanl
Copy link

Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?

It is the exact same issue as #2177

I have simplified my example even further so it just includes the date.

@simonmichael
Copy link
Owner

I'll close this in favour of #2177.

@simonmichael
Copy link
Owner

Fixed in master.

@redneb
Copy link
Author

redneb commented Mar 2, 2024

Hey @simonmichael! I just compiled hledger from master and I am getting the same problematic behavior I described in the example above:

$ hledger balance 'expr:(date:2024-01 AND acct:expense:food) OR (date:2023-12 AND acct:expense:drinks)'
               10.00  expense:food
--------------------
               10.00  

This is with:

$ hledger --version
hledger 1.32.99-gcb0b054df-20240301, linux-x86_64

which is a version that includes 3ca208a. So it is likely that this is a different bug than #2177. Can you reopen this?

@simonmichael simonmichael reopened this Mar 2, 2024
@simonmichael
Copy link
Owner

That's weird, I took care to test with your example above. Apologies..

@simonmichael
Copy link
Owner

You're right, there's more to this one that I missed. Thanks for the heads-up.

@simonmichael simonmichael changed the title balance does not handle date: correctly in expr: queries some expr: queries are not handled correctly Mar 2, 2024
@simonmichael simonmichael added severity2 Minor to moderate usability/doc bug, reasonably easy to avoid or tolerate. impact4 Affects more than a few users. labels Mar 2, 2024
@simonmichael
Copy link
Owner

Seeking your insights once again @chrislemaire, if you remember this stuff.
In postingsReport > matchedPostingsBeforeAndDuring > journalValueAndFilterPostingsWith, we can see the boolean query getting muddled as the postings report is identifying postings before and during the report period:

$ stack exec -- hledger -f 2178.j reg expr:'(date:2023 AND drinks) OR (date:2024 AND food)' --debug 4
q:
 Or
   [ And
     [ Date DateSpan 2023
     , Acct
       ( RegexpCI "drinks" )
     ]
   , And
     [ Date DateSpan 2024
     , Acct
       ( RegexpCI "food" )
     ]
   ]
dateq:
 Or
   [ Date DateSpan 2023
   , Date DateSpan 2024
   ]
requestedspan: DateSpan 2023-01-01..2024-12-31
journalspan: DateSpan 2023-12-22
pricespan:  DateSpan ..
requestedspan': DateSpan 2023-01-01..2024-12-31
intervalspans: [ DateSpan 2023-01-01..2024-12-31 ]
reportspan: DateSpan 2023-01-01..2024-12-31
beforeandduringq:
 And
   [ Or
     [ Acct
       ( RegexpCI "drinks" )
     , Acct
       ( RegexpCI "food" )
     ]
   , Date DateSpan ..2024-12-31
   ]
amtsymq:    Any
reportq:
 And
   [ Date DateSpan ..2024-12-31
   , Or
     [ Acct
       ( RegexpCI "drinks" )
     , Acct
       ( RegexpCI "food" )
     ]
   ]
beforestartq: Date DateSpan ..2022-12-31
postingsReport items:
...

@simonmichael
Copy link
Owner

simonmichael commented Apr 8, 2024

Since expr: queries were added, it's possible for a query (with OR) to specify multiple different date periods.
This is problematic for report semantics in several ways. For example,
expr:'(date:2023 AND drinks) OR (date:2024 AND food)' produces two disjoint result sets, and
expr:'date:feb or date:may or date:nov' produces three disjoint report periods with holes between them.
Can all of our reports handle holes properly, calculate historical starting balances properly, etc ?

Even though the similar issue #2177 seemed to be resolved by a fix, I think that more generally across all of our reports we really can't handle this without further thought. At least, I think holes will not always be handled correctly, and I don't know how we can calculate historical starting balances correctly when multiple starting dates are possible.

This is a release blocker, and I think in the short term, we must simply disallow OR-ing of date periods.

Any thoughts on this pro or con, and on what the semantics should be if/when we support this ?

@simonmichael simonmichael changed the title some expr: queries are not handled correctly expr: OR queries with date: are not handled correctly Apr 8, 2024
@redneb
Copy link
Author

redneb commented Apr 9, 2024

I agree that there are some issues with the semantics of date: in disjunctive queries. However, it seems these ambiguities do not uniformly affect all commands. Commands such as aregister, activity, balance, print, and register, for instance, can work with such queries without ambiguity. They straightforwardly report transactions or tally up balances from a specified subset of transactions, a behavior likely to align with user expectations without causing surprises.

Conversely, for commands like balancesheet and incomestatement, this would indeed be problematic. For these commands, perhaps making the use of date: within expr: queries, regardless of being disjunctive or not, impermissible would be better. Users would then need to rely on flags such as -p, -b, -e, etc., to define a clearly unambiguous report period.

I believe disjunctive queries involving dates, particularly with commands like balance, have legitimate and valuable use cases. It would be a shame if this functionality were not supported merely due to complications arising in other contexts.

@redneb
Copy link
Author

redneb commented Apr 9, 2024

Here's an idea that could potentially work for all report types: For any given query, we first determine the "date envelope" period, which is the convex hull of all dates and periods mentioned across all query terms. Then, we generate a report for this period, but with a twist: we disregard transactions that do not match the query, including any date terms.

However, I've not thoroughly analyzed this idea, so there may be flaws I haven't considered.

Edit: On second thought, I am not sure that I like this idea.

@simonmichael
Copy link
Owner

I agree in principle that many reports could do something useful with disjoint date periods. But I think pretty much all of them have modes that would give non-intuitive/broken results, as we've seen above and in #2177. Each report would need testing, design and enhancement work. Also keep in mind that certain kinds of queries working for only some reports/report modes would be a confusing UX, we prefer consistency where possible. Finally this might be a cool power feature, but it's not a common need. So I don't have time for it myself, but if anyone would like to work on it, I can help test. A good first step would be a survey of current reports and their modes and quick analysis of the impact of OR'd date periods.

@simonmichael
Copy link
Owner

simonmichael commented Apr 9, 2024

Related: https://lemmy.world/post/14088295 shows one workaround for producing an income statement with disjoint subperiods (the first N days of each month; solution: extract just the transactions in those periods to a temp journal).

@simonmichael
Copy link
Owner

5be3ee9 disallowed date: in OR expressions. Moving forward with this fix for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-BUG Something wrong, confusing or sub-standard in the software, docs, or user experience. impact4 Affects more than a few users. queries Filtering options, query arguments.. severity2 Minor to moderate usability/doc bug, reasonably easy to avoid or tolerate.
Projects
None yet
Development

No branches or pull requests

3 participants