Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn-filters: prototype implementation of transaction filters. PR is for review of design and for comments. #78

Merged
merged 13 commits into from
Dec 18, 2016

Conversation

jaa127
Copy link
Contributor

@jaa127 jaa127 commented Jan 31, 2016

This is prototype of stackable transaction filter system. Those filters
operate on AST level and can be glued together with filter stacks.
With filter stacks it is possible to define different semantics for combined filters,
for example AND or OR filter stacks, or define some totally different logic.

  • base/main/../Ast: TxnFilters and TxnFilterStacks
    • trait TransactionFilter
      => EndDateTxnFilter, BeginDateTxnFilter, PayeeTxnFilter
    • trait TxnFilterStack
      => ANDTxnFilterStack
  • base/main/../Process: Processor.process: use TxnFilters
  • base/main/../App: Demo usage of Filters
  • base/test/../ProcessorTest, ComplexProcessorTest: TxnFilters = None

It would be nice if you could review this design. If design is ok, I will refine PR with actual command line switches and tests, docs etc.

This is prototype of stackable transaction filter system. Those filters
operate on AST level and can be glued together with filter stacks.
With filter stacks it is possible to define different semantics for
combined filters, for example AND or OR filter stacks,
or define some totally different logic.

 - base/main/../Ast: TxnFilters and TxnFilterStacks
     - trait TransactionFilter
         => EndDateTxnFilter, BeginDateTxnFilter, PayeeTxnFilter
     - trait TxnFilterStack
         => ANDTxnFilterStack
 - base/main/../Process: Processor.process: use TxnFilters
 - base/main/../App: Demo usage of Filters
 - base/test/../ProcessorTest, ComplexProcessorTest: TxnFilters = None
@hrj
Copy link
Owner

hrj commented Jan 31, 2016

Filtering would be useful; so this is welcome.

However, who is this intended for? Will the user specify filters in the config file, or is it for programmers that directly use the API?

If former, then we don't really need a hierarchy of filter classes; they would only be useful for programmers. Am I right?

@jaa127
Copy link
Contributor Author

jaa127 commented Jan 31, 2016

These should be useful and handy way to implement filters, which user selects, so I don't think these are usefull only for programmers.

Original idea was that user will specify filters on from the command line or from GUI, and there could be several different filter from which to choose: Time, account names, annotations, and user can freely combine these.

For example usecase could be you would like to investigate some special transactions and filtering semantic could be period of time, and name of some accounts (all transactions touching those accounts) and certain payee (and so on).

So idea is that filters will be applied dynamically based user's selection and we can support different usecases which user builds from CLI or from GUI. For that reason there are those different types, and that stacking system.

E.g. ANDTxnFilter stack will be build based on users selections, and whole thing can be filtered on one go.

Also I think this should not be supported from conf-file, or at least there should be big notice on reports, which states that reports are based on filtered transactions.

It's too easy to forget a filter to config file and "lose" some transactions ...

@hrj
Copy link
Owner

hrj commented Jan 31, 2016

(Feel free to disagree with me and push back)

These should be useful and handy way to implement filters, which user selects, so I don't think these are usefull only for programmers.

Well, what you are essentially building is a combinator library for filtering expressions. If the filtering feature is for end-users (who are not expressing the filters in Scala), then the combinators aren't that useful; they are an extra layer of abstraction. A simple function that returns a boolean is sufficient.

So idea is that filters will be applied dynamically based user's selection and we can support different usecases which user builds from CLI or from GUI. For that reason there are those different types, and that stacking system.

In that case, what do you think about making these filters report-level rather than AST-level? It is rare for txns to be enabled / disabled dynamically; what one typically wants is to (a) drill down into details of a report, or (b) get a larger picture and hide the details (without changing the result).

Some of this is already supported in GUI, but it needs a proper filtering framework like this and support in the CLI as well.

Also I think this should not be supported from conf-file, or at least there should be big notice on reports, which states that reports are based on filtered transactions. It's too easy to forget a filter to config file and "lose" some transactions ...

I totally agree about the warning.

@jaa127
Copy link
Contributor Author

jaa127 commented Jan 31, 2016

Well, what you are essentially building is a combinator library for filtering expressions. If the filtering feature is for end-users (who are not expressing the filters in Scala), then the combinators aren't that useful; they are an extra layer of abstraction. A simple function that returns a boolean is sufficient.

I will think about that, it could be that I am missing something about how it would be possible to chain multiple filters dynamically in Scala.

In that case, what do you think about making these filters report-level rather than AST-level? It is rare for txns to be enabled / disabled dynamically; what one typically wants is to (a) drill down into details of a report, or (b) get a larger picture and hide the details (without changing the result).

That's interesting opton and I thought about that. First of, personally I have usecases where I run balance over some criteria, and it's nice to see resulting data related to only that criteria. It could be payee, annotation or tag for example. With that feature you could generate for example profit-loss report for some specific event or work.

If filtering is done on AST level, then this kind of reporting is really easy to do. Also, with AST level filtering it could be possible to create "streams" where you are watching accounts and transactions which are matching only certain conditions. And if there is huge pool of transactions readily on AST-form (e.g. on some reporting server), then it would be nice to do AST level filtering on that pool.

On the other hand, there are some other kind of questions which can not be answered on AST level, for example "list all transaction where debit/credit total is over XXX ", "at which point this account tripped certain level" etc. So there is need also for those other, report level filters.

This is still prototype/demo level implementation, but it's functional
and it is possible to play with system (from command line).

 - base/main/../Ast: mechanisms (description, xmlDescription)
     to get filter descriptions.

 - base/main/../Config: Handle cli arguments with getCompleteSettings

 - base/main/../Report: xmlExport, xmlJournalExport, xmlBalanceExport:
     Print notice/info about filters.

 - cli/main/../App: Use filters, and print notice.

 - gui/main/../UI: Use filteres, NO NOTICE atm.
@jaa127
Copy link
Contributor Author

jaa127 commented Feb 7, 2016

Hi, this is "proto v2" of txn-filters. It's possible to play with filters from CLI interfcase --filters and one or more filter specifiers: begin,end,payee

--filter begin=ISO-DATE end=ISO-DATE payee=REGEX

It could have been possible to implement combined filters with function compositions, but on that route it would have been more difficult implement notice/warning functionality for reports, exports etc? If there is some better way to implement it, it could be fixed fairly localized way in future.

If this is way to go, then there are still missing some features:

  • Txn annotation filters
  • Account name filteres
  • Filter definitions from conf-file
  • Tests

This adds filter support with configuration files, and adds initial,
basic documentation about filters.
 - base/src/main/../Ast: AnnotationTxnFilter and AccountTxnFilter
 - base/src/main/../Config: CLI/Conf switches for those
When exporting in ledger format, warning was missing "ACTIVE FILTER"
text.
Test cases and variations are listend on readme.md.
All test cases use filter-targets.ledger as an input.

At the moment following filters are tested:
 - begin
 - end
 - annotation
 - payee
 - account
@jaa127
Copy link
Contributor Author

jaa127 commented Feb 9, 2016

Hello,

One big thing which is missing is that there isn't any warning on GUI if there are active filters affecting results.

There are also some things which should be fixed (scallop will exit with wrong arguments), but those are not related to these changes. Also behaviour with unknown filter names could be better - now it will just throw runtime error. Other than that, this should be start shaping as functional filter system.

This commit implements transactions filter descriptions on info tab
and notice also on statusbar on bottom.

 - base/src/main/Ast: New helper which is shared with cli and gui
    FilterStackHelper.getFilterWarnings
 - cli/src/main/App: Use new helper
 - gui/src/main/UI: Filter description on info tab and on status bar
@jaa127
Copy link
Contributor Author

jaa127 commented Feb 10, 2016

(521fca8) Implements GUI warnings, so that missing part of functionality is now implemented.


val txnFilters =
if (cliConf.filters.isEmpty) {
None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create an AcceptAllFilter, you could avoid the extra layer of Option[Filter] throughout the code and just use Filter everywhere.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, the current code is better because you can issue warnings about filters more easily.

@hrj
Copy link
Owner

hrj commented Feb 12, 2016

Overall, this is awesome!

I think the begin and end filters should be renamed to after and before. There is a bit of ambiguity about inclusiveness. Does the filter "include" the given date or not? With after and before it is less ambiguous: it will be exclusive. In normal conversation, if I say "after 12 February 2016" it means all dates that follow 12th Feb, leaving 12th Feb itself.

And, I would like to see a proper expression grammar for filters, that can be used in command line as well as config files. For example, I would like to specify filters like this:

((after=2016-1-1 AND before=2016-3-1) OR name = /.*Bank.*/

It should be easy to add support for such a grammar using parser combinators. And you can directly map to the filter case classes in the combinator itself.

@hrj
Copy link
Owner

hrj commented Feb 12, 2016

Note to myself: Check if txn filtering happens before or after the constraints check (eod constraints for example).

@jaa127
Copy link
Contributor Author

jaa127 commented Feb 14, 2016

Overall, this is awesome!

Cool, that's nice to hear and thank you for review and comments!

I think the begin and end filters should be renamed to after and before. There is a bit of ambiguity about inclusiveness. Does the filter "include" the given date or not? With after and before it is less ambiguous: it will be exclusive.

I agree about ambiquity of terms, but disagree after and before, at least as sole time filters. Reason is that it will make an odd UI from usability perspective (I will try to explain that later). One obvious solution would be write better and more clear description of the nature of filter, or one option would be to implement both inclusive and exclusive filters, because there might be goog use case for before.

Typical usecase is probably some natural timespan or start of timespan. Eg. Jan 2016 or Y2015
If exlusive filters (e.g. ]a,b,c,d[ => (c,d)) are only filters, then e.g. Jan 2016 would be defined: after=2015-12-31 before=2016-02-01. It's not clear from UI point of view , that it is actually Jan 2016.
If this is written with inclusive filters: begin=2016-01-01 end=2016-01-31 it's more obvious that it's Jan 2016. Same thing with years: after=2014-12-31 before=2016-01-01 vs. begin=2015-01-01 end=2015-12-31

It's getting even more interesting with March, eg. March 2016.
after=2016-02-28 before=2016-04-01 except that 2016 is leap year, and there is 29th day in Feb, so above is wrong and not very clear way. With inclusive filters that would have been begin=2016-03-01 end=2016-03-31

For removing abiquity of how many days there are on each month, before would be handy:
Feb 2016: begin=2016-02-01 before=2016-03-01 or April (with 30 days) begin=2016-04-01 before=2016-05-01.

Summary

  • "time span starting filter" (begin?) should be inclusive e.g [a..
  • there is a good use case for exclusive "time span ending filter" (before or end?) e.g ..d[
  • I would like to drop after filter, because I don't see other use for it other than to remove ambiquity, and it makes odd UX for filters, imho.

Actually inclusive begin and exclusive end would be 1:1 with ledger-cli. This would also make really nice UI for truncated dates: Y2015: begin=2015 end=2016

The before is nice because it has clear meaning about exclusiveness, but on the other hand end is more common term in this context. So I am inclining towards inclusive begin and exclusive end with good and clear explanation. What do you think?

And, I would like to see a proper expression grammar for filters, that can be used in command line as well as config files.

Yes, that would be nice and handy in some cases. However, I would like to first finish these basic filters first and leave that expression grammar for some later time. It could be implemented backwards compatible way, which is also quite natural from UX point:

  • if there is no operator, then AND is implied
  • or if implied AND is too lax, then it could be that expression filter definition must start with parenthesis?

That way it would be possible to support "simple" filter definitions, and more complex expression definitions on same time. What do you think, is that ok plan?

Changed `end` to be exclusive and renamed AccountTxnFilter to
AccountNameTxnFilter. Improved documentation.

 - base/src/main/../Ast:
    Renamed AccountTxnFilter -> AccountNameTxnFilter
    Changed end from inclusive to exclusive
 - doc/abandon.md: document all filters, add example and link to
   testcases.
 - testCases/sclT0005-filters: Adjust end dates and ref-files.
@jaa127
Copy link
Contributor Author

jaa127 commented Feb 22, 2016

Hello,

Please see (b1bad32), this commit changed end from inclusive to exclusive and renamed AccountTxnFilter to AccoundNameTxnFilter. There is also new issue #81 for "expression filters". There is also enhanced documentation about filters.

If you like, we can think something else for begin and end, but I think current form is good as it is. For full explanation please see: #78 (comment).

If there isn't anything else, then this should be in state to be merged.

@hrj
Copy link
Owner

hrj commented Mar 19, 2016

So I am inclining towards inclusive begin and exclusive end with good and clear explanation. What do you think?

It would be highly confusing if one of them is exclusive and the other inclusive.

I strongly prefer not relying on explanations. The more straightforward the better. How about the following filters:

  • after and before : exclusive
  • onOrAfter and onOrBefore : inclusive

The on-or-after style of specifying dates is common in my part of the world. Even if someone is not familiar with it, this wording doesn't require any explanations. It is a little verbose but I prefer verbosity over ambiguity.

It could be implemented backwards compatible way, which is also quite natural from UX point:

  • if there is no operator, then AND is implied
  • or if implied AND is too lax, then it could be that expression filter definition must start with parenthesis?

Here again, I would strongly prefer a bit of verbosity instead of implied behaviour. What might seem natural to one person might be confusing to another.

Let's go for filter expressions!

@jaa127
Copy link
Contributor Author

jaa127 commented Mar 23, 2016

Hi,

names could be onOrAfter and before very well.

However, more I have thought about it, more I think begin of span (left condition) should be inclusive and end of span (right condition) exclusive. There are several reasons for that:

  • Looping over multiple periods of time is trivial when end condition is not included in the set
  • It is always well defined what these conditions means, and it will not depend on accuracy (eg. after=2014 is that 2014-01-02 or 2015-01-01? When before=2015 is naturally well defined (it's all time where t < 2015-01-01T00:00:00). Same thing with onOrAfter, it is naturally well defined and it doesn't depend on defined accuracy.
  • Supporting multiple different ways to define begin and end conditions will complicate UI and code
  • It is consistent with ledger-cli

Let's go for filter expressions!

These would be cool and nice, however there are more urgent features (personally for me), which I am working on at the moment (groupby, strict account validation, and probably some commodity conversion functionality). So I would like to postpone that part of filter expression. If there is need for you, I don't mind if you implement this. I also believe that implied AND on that case is not too confusing.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.3%) to 66.567% when pulling 6f247e5 on jaa127:txn-filters into 7b30a7f on hrj:master.

@jaa127
Copy link
Contributor Author

jaa127 commented Mar 23, 2016

Thanks for fixing coveralls! I merged the master (no renaming yet), so now there is clean table to move forward when we have decided what to do with this filter stuff.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.5%) to 68.463% when pulling bffea3e on jaa127:txn-filters into 3ded91d on hrj:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.5%) to 68.463% when pulling ea92dd9 on jaa127:txn-filters into 3ded91d on hrj:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.0%) to 68.905% when pulling 24c873f on jaa127:txn-filters into 3ded91d on hrj:master.

@jaa127
Copy link
Contributor Author

jaa127 commented Dec 9, 2016

Hello,
Time selectors has been renamed to be less confusing, so I hope this could be merged as it is now. I think the rest of the ideas and further enhancements could be implemented later. Also if we like to change time-selectors e.g. more in line with #103 that can also be done after merge.

This is getting to be quite big changeset, and it would be nice to not have this lingering in branch and conflicting with other future changes.

@hrj
Copy link
Owner

hrj commented Dec 9, 2016

Hey @jaa127 !

I appreciate your efforts here and looking forward for more.

I didn't find time to review your latest changes yet; will try to do this weekend.

However, I feel it is better to address all points before the merge. If you get the time, please go ahead and address them now. Else, I will do them myself as soon as I get free time.

cheers,
~hrj

@jaa127
Copy link
Contributor Author

jaa127 commented Dec 10, 2016

Hi,
It would be nice if you have time to take look of this PR. This PR does basic filtering, and as such should be "ready". If there is need for more complex filtering expressions or query language, that could be implemented backward compatible way later on. Personally I think filtering expressions and more complex queries are better to be done on separate branch / PR.

Then there is also question of how that query language should be implemented in abandon, and how much could/should be lifted to external engine. For example Abandon could do AST processing, and then feed each transaction to in-memory, transient SQLite database. SQLite supports Common Table Expressions (with recursive) and hence you can make really complex expressions leveraging those recursive queries. So IMHO, I think it doesn't make sense to progress too far with these filters in Abandon alone and without support some query engine.

@hrj hrj merged commit 887a0f9 into hrj:master Dec 18, 2016
@hrj
Copy link
Owner

hrj commented Dec 18, 2016

I did a quick review and the code looks good. 👍

Merging with the hope that the UI changes are done soon in a follow up PR.

Thanks @jaa127

@hrj
Copy link
Owner

hrj commented Dec 18, 2016

For example Abandon could do AST processing, and then feed each transaction to in-memory, transient SQLite database. SQLite supports Common Table Expressions (with recursive) and hence you can make really complex expressions leveraging those recursive queries.

Another simpler solution is to allow scripting via Scala.

@jaa127 jaa127 deleted the txn-filters branch December 19, 2016 20:20
@jaa127 jaa127 restored the txn-filters branch December 23, 2016 09:41
@jaa127 jaa127 deleted the txn-filters branch December 23, 2016 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants