Intelligent import autofill #26
Comments
[not-tagged:"suggestion" tagged:"feature" bulk edit command] |
There's a problem with the way I planned this. Smart replace should also cover account transfers, which rules out the edit-in-import-table concept. I haven't thought this through yet, but the first idea to come to mind would be to remember values at import for each imported transaction, so that we can identify them later and make correct replacements. |
|
|
The CSV lists from my bank actually show a unique transaction ID. If this the feature you're contemplating, you could:
I guess such a simple check does the trick for a personal finance app. Regexp replace for transfer targets would be just awesome. Ideally it would predict my success rate by telling me how many previously assigned transfers to my "groceries" account match the super fancy expression I am typing (and my failure rate, the matches with transfers assigned to other accounts). Really like your work. Keep it up! |
The transaction ID is used for automatically "binding" transactions to existing ones during imports. This feature comes from OFX import where it's not possible to have the same transaction ID from two different accounts. I agree that in the case of CSV imports, it could also be used to connect transfer accounts. However, I'm afraid this new feature could bring all sorts of new complications. I'd have to think about it. Meanwhile, I created #233 for this feature. |
Another similar piece of software I recently reviewed let's you set up or learn rules, and then apply them by selecting transactions and pressing the auto fill button. This way it happens after the import step, so may avoid some of the difficulties. Could be problematic if it's not easy to tell which transactions need auto filling (maybe up to user's own process or workflow, or a tag on transactions that's set once they've been autofilled, or a "last import" view like in iPhoto, etc for the most recently imported data) |
I'd like to take a stab at this ticket. This and #166 are probably the most important tickets for my own use. I do all my management once a month or I let it go for a couple of months and I'm curious about where I am, and then update if I have a need. Every time I start again, I have a small script that collects up my transactions (e-statements from my credit cards and banks), eliminates double transactions, renames things based on an occurrence file, etc. So general pre-processing. If it doesn't work out, I look at the problem, roll back, make a change to the pre-processing, and try again. It would be good to be able to integrate this kind of work into moneyguru, or make the process "plugin-accessible". I'm going to read through, experiment a little, and post some ideas. |
Awesome! as always, don't hesitate to ask me if you need any guidance. This is a long standing ticket which I always wanted to implement but I've always seen it as too big to fit my will (as it was with the amount drawing thing). I'm glad to see you're not afraid of big tickets :) |
I started to write a wiki page for the feature here. It's just started, and my initial thoughts are an outlook style set of rules. If [Column Name] [is/is not] [equal to/contains/matches] [value/regular expression] Then Some of the easier rules will be added automatically when the user edits a meaningless name (MEANINGLESS CRAP BLAH ** BLAH to groceries) which would create the rule If [Account Name] [is] [equal to] [MEANINGLESS CRAP BLAH ** BLAH] then [map to account name] [groceries]. The user could create a group or set of these rules, and save them much like csv layouts are saved. This would also be a very good place to provide a plugin interface that plugin authors could write there own custom logic for importing. Again, I'll provide more of this in a formal way with the wiki page. |
It's a great writeup and questions that arise from your proposition are kind of similar to the discussion in #13 (which is much older than 2013. these dates are due to me having to move issues to github): Do we really want to include that much "UI complexity" into the app? The thing is that, like with calculated transactions, implementing the actual matching rules is rather simple. The complex part comes at the UI level and that complexity comes down not only on the developer, but also the user. I don't know about you, but I find designing mail filters in a mail client is tiresome. In #13, we tried to find if there wasn't some cleverer way of solving the problem (and then we never implemented anything :) ). I suspect that if the feature in this ticket is implemented with a good old matching rules editor, it will be less used than if it's implemented in some cleverer way. In the wiki, we mention cases of variable values, such as To prevent mis-autofill, maybe that autofilling procedures could involve some kind of confirmation dialog where we show replacements that are about to take place and then allow the user to cancel some of them. We could then "teach" the algo to work better. Also, very early in moneyguru development, we were asking ourselves "what if moneyguru was a personal finance manager for developers?". I say that because we could make it so that default autofill rules could be enhanced by plugins. Code is much more expressive than select boxes in a row. These are just ideas I throw around to fuel your thoughts. |
(now that I re-read stuff from #13, I realize that there's a lot of context missing and it's hard to make sense of it. it wasn't a very good idea to reference it here... nevermind that.) |
Yes, I'm not done with the wiki (I was just editing it as you appended this). It's basically my thoughts as I go along figuring out how things currently work, and finally get to the implementation. My thought is that I'd get this document from a fluid state to a more defined one. My other thought that the first few iterations would be to find a target in the core GUI code that would be a good target for a well designed component that implements a defined interface that could be used for a plugin. If we did implement an learning heuristic or an "e-mail rules" style UI interface, the back-end would use this component style with well defined interface, making plugins extensible. This would be very much like the Trac style of writing plugins. For instance, provide an I don't have something concrete in mind for interfaces yet. But whatever features we choose to implement, we'd do so first defining an abstract interface that plugin authors could come behind us and change our silly mistakes without having to learn all the cross-platform toolkit communication shtuff which is where I am buried in reading currently :). I do really appreciate the feedback though. I don't expect you to read through the whole thing all the time, I'll let you know when it's finished and you can read it if you'd like. It's more just a sounding board for my own thoughts instead of diving into code changes up front. |
+1 . Plugins came way too late in moneyguru's design, but if we can think about them when implementing new features, it can only be good. The relationship between UI layers and plugins is a tricky problem, however. |
First crack at abstracting some current import action "fixing" logic into plugins. Created an ImportActionPlugin in core.plugin, and used that as a base class to create a one-class-per-action hierarchy. There is two way communication between the actions and the import winodw (hence the use of Broadcaster/Listener classes). From what I can see, all previous functionality is retained (though I still have to sweep through the tests).
If you have a chance to quickly glance over the above commit, I've taken the "Fix Import" logic and abstracted it into plugins. The base class for import actions is All functionality should still be retained, I've tested manually with some test import files. I have to fix all the tests (but not change their intention!), to confirm that for certain though. This is just a first pass attempt at getting a plugin API going, but it isn't hard to see even someone with limited programming experience extend these individual actions by dropping in a quick file in the plugins directory. from core.plugin import ImportActionPlugin
class MySwitcherAction(ImportActionPlugin):
NAME = "My Switcher Plugin"
ACTION_NAME = "Switch with Durdy!"
def perform_action(self, selected_pane, panes, transactions):
"If Hurdy in the description, replace with Durdy and throw him a quick 20."
# or match against regex's whatever as comments indicate earlier
for txn in transactions:
txn.description = txn.description.replace("Hurdy", "Durdy")
txn.amount += 20 More work to follow. |
Sweet. Very elegant start, congratulations! The only problem I see is with plugin subclasses living in the core code, but as you already mention in this ticket, a plugin overhaul is needed to bring this ticket to completion (something I totally agree with). So this isn't really a problem because these plugins will be moved elsewhere before the end. Side note: the example you just gave is not a very good one because History note: I once implemented a new transaction model ( #16 ) that would allow this code to work, but users didn't like it and I reverted the feature back to typical splits :( |
Yes, the plugin architecture. How do you want to handle recommendations about that? It goes to a higher level then this ticket. New ticket? Maybe wait a little until concrete suggestions build up? I don't know about your point mixing plugin and core code. I think Trac's pretty vibrant plugin eco-system of plugins came from them implementing interfaces and then immediately using those interfaces in their core code. See, for example: https://github.com/edgewall/trac/blob/trunk/trac/ticket/api.py They declare a bunch of interfaces, and then use them in their components. The purpose of the interfaces is to define some publicly accessible methods for communicating with components. The purpose of using a Aside from that, the plugins that were defined are refactored core code. Originally, I had moved the specific bits into the plugin_examples folder, but I was continually deleting my local cache of them so they would be copied and regenerated. It's just a lot easier to have it live alongside the place it's used. The only difference between what was there and now we elevated it into another class, and we're just using the plugin architecture to make sure it's extensible. Our use of it in the core also ensures its, well... usable! On the |
Not all imported files can yield transactions with more than two splits, but QIF and OFX can. As for the plugin thing, I don't have a strong opinion about this and I trust that the end result will be elegant. Whatever you prefer will be fine by me. But yeah, we should create a new ticket for the plugin overhaul. I'll do that right away. |
Oh, I also wanted to comment about this section in your wiki page moneyGuru already has some form of limited auto-matching builtin. You might want to check the current implementation. Right now, it only does auto-matching when importing OFX file that have reference IDs in it (which are typically guaranteed to be unique). Then there's this whole "binding" mechanism with the little "lock" icon in between and all. Have you seen this? It might influence your analysis. (there are example OFX files with references in them in the test suite if you want to try it out) |
Fixed all tests, missed some logic regarding the filtering of panes to only those able to perform an action. Also, some of the date-names were not being set until after the "selected pane" update method was called. The user would not notice it, but the tests expected the name to be set more immediately.
I hadn't read it at the time of writing the wiki entry, and my only experience had been with the CSV files at the time. I've since read the documentation on importing into a new account, and I will be experimenting with the feature and reviewing the code before making any additional assumptions or changes. All tests now pass in the last commit. None of the tests had to be changed (which I guess is a testament to your high level testing). So at this point, I'm going to do some feature review as you stated, and after I have a solid grasp try to determine how the user experience might change... do we make the import table edit-able, how would users see before/after changes, etc. Then go from there. |
Attached are my initial thoughts as to how I'm going to proceed with this ticket form here. I'll update the wiki before progressing, but the "import pipline" (image attached) will look something like this:
So, in our bubble diagram, "Apply select action" (step 5) is already implemented (we still need to collect up the plugins from the plugin directory). "Always apply actions" (step 1) is also easily implemented. The only difference I might suggest is that these actions be able to be performed when selected by the user on single or multiple selected transactions as well as the entire pane or set of panes. Under "Hand edit data", the import table will have to accept edits. The binding/unbinding via drag and drop is as moneyguru currently acts, but I was thinking of adding a third state of "questionable bind" where the lock would be followed by a question mark. This would indicate to the user that the automated binding rules were applied, and they should review the result of the application of these rules closely before continuing. Finally, we have the intelligent import autofill, which will take a first crack based on the pre-processing steps and then allow the user to make their edits and record the changes. We'll want to provide a pane for configuration preferences (whatever they might be) or the ability to completely turn off the intelligent import autofill in case it acts outside of reason for some users. Again, it's behavior needs to be formally written up in the wiki proposal. See the attached image. |
I don't think I've ever thought out the import process as deeply, so I have questions rather than critiques.
Looks good to me. I haven't looked at your latest commit yet, I'll do so as soon as I have free time. |
Keep in mind that "always apply actions", "bind transactions", and the extended "select actions" will for the majority of users be the same as current functionality or no-ops. That is where your custom code will go. This concept pretty much rules out the UI layer complexity representing rules, replaced by creating dead simple plugin code. So with that in mind... Your first question... (1.) We have the intelligent import autofill which changes transaction attributes, and binding rules that bind based on transaction attributes. So the transaction attributes need to be changed before the binding rules take effect. And if the transaction attributes were incorrectly changed, the binding rules will need another try at the corrected transaction. Let's say I had an import of my credit card statement. I have a binding rule that says: amounts from checking to pay my credit card on the same date in the same amount should always bind to the similar one from my checking account. I only pay my cards from my checking account. My credit card statements represents these transfers as "Date Posted, Payment - Thank you, (transaction ID), -400.23". I don't see it bind right away because the transfer isn't correct. I change it from "Payment - Thank you" to "Checking". This edit needs to go back to the intelligent import autofill so it remembers that change for the future. The binding rules now have the correct account "Checking" for the transfer, and bind to the transaction from my bank checking account statement that already existed from a prior input. Note that now that I read it, is this binding? To me, it means the other side of a transaction between two accounts as well as previously imported transactions from a file with an overlapping date range as the one currently being imported. I think that's correct. Your second question... (2.) So we want to provide this as an option for plugin developers. So I could see someone developing an "always apply" plugin that the plugin developer tells users to go to their data directory and edit the CSV file where the first column is original descriptions and the right is replacements. This might not work for every case, but it's the first example I could think of where I would want to have some kind of file. When I do the intelligent import autofill, I'd expect that I'd like to take a hard look at customizations available there as well. But it's for whatever data that the plugin developer wants to store between application runs. Your third question... (3.) That's my sloppy writing... it's "overrides w/ user data" which is my shorthand for "overrides with user data". I don't know if you need to review the commit, but feel free. It was just a clean up of the original commit and making the tests work. I did change the per-index name replacements to a total refresh of the names for all actions per your one comment. I plan on going through your reviews that are cosmetic or "good design or code style" and implement them together to minimize your time. |
Started some experimentation on making the import table editable by assigning each pane it's own document object (with a few minor tweaks). This also gives import action plugins a document object through which they can make their changes before it is progressed onto the working document.
This will likely be the last commit on this prototype branch before composing the demonstration and choosing a way forward. Added an ``ExampleRuleAutofill`` plugin which shows feasibility for a statistics based plugin.
I'm freezing the issue_26 prototype branch of my fork where it is warts and all, and will be using it to construct a short video (~5-10 minute) demonstration of the highlights. Feel free to pull it down on your end and experiment under the caveat that it's alpha and backed by minimal if any new-feature tests (though the current test suite passes) and applicable only to the Qt end. My plan is to present the new features in the video demo and discuss the prototype implementation, and we can negotiate which features and implementation details are a "go" or "no-go" and break them into individual chunks. I'll provide a suggested list in a yes/no format so you don't need to spend a lot of time on that. After we're agreed about what works and won't work, I'll create a new branch off of develop and use a test-first migration of the prototype code in a feature-by-feature path that we have negotiated. For example, converting the existing swaps into import action plugins first, then one new plugin at a time, ensure the Cocoa side matches functionality, next plugin, etc. |
I edited the wiki page to include a breakdown of what's required feature by feature. Note that I still owe you and will do a video, but I wanted to create the document for my own purposes and to organize my thoughts. The first feature is blocking, and would need to be done for each subsequent feature. But each subsequent feature can be implemented independently of the others.
The first would have to be a "yes" to proceed with the others, but the remaining can be a "yes"/"no". I'd prefer to implement each, but won't do it if you don't think there is a prospect of it being merged or want me to focus on one over the others then I'll work within that. |
I don't have much to add, it's a good plan. Of course, if you're up to the task, I have no objection to you implementing the bulk of it. The tricky part will be the As I wrote earlier, I'm more than willing to help you out, so if you end up stranded, let me know. Also, of course, there's a prospect of your code being merged. These new features in your proposal look really interesting. As for the UUID proposal, I'm not very warm to it. I see it as a serialization tool more than anything else and I wouldn't add it to a plugin API without having considered alternatives first. However, after having read your wiki page and looked at the code in
|
|
Ok, then let's go for a UUID, but let's not expose it directly and simply override |
Will do.
|
Starting with some tests for import actions as part of the first feature addition described in the wiki.
I prototyped the equality overrides between entries I think it will work out well. I've also been doing a refactor of The intent is that the |
I'm sorry to say we might be on a false start regarding the The approach you seem to have taken is to rename
|
("false start" may be too strong a term. bubbling up features to |
Well, the refactor is pretty much done right now (though I'm open to redoing it). See the next comment with the video + gist. This is the way that I see the code base, or at least my impression of it. The It also has things the user cares about. Undo, redo, save points, communication with windows and views, autosaving, stopping of edition, and querying for global scope. The attempt at the refactor was to separate these purposes. In the import window, there isn't anyway to undo/redo right now. Also if perform actions through the document class a modest number of times, it records every transaction change each time (which is a waste of resources in my mind). So you wouldn't want your So that was my mindset when I refactored. Also, I wanted to give some talking points as well, and I didn't have a clear direction as to what you specifically would want to see. I'm completely open to suggestions about alternative ways to refactor. Discussing some of your points:
|
I made a short video illustrating the benefits of just the base functionality. https://www.youtube.com/watch?v=Bhv6yL40quA&feature=youtu.be I also created a gist with the plugin which is showcased. |
Also if you wanted to draw down the issue_26_base branch, checkout document.py back to develop, and refactor to your ideal state I would be OK with that. Just make sure the ImportDocument class is similar to its current state and the new tests pass. |
I think I will, I'm rather uncomfortable with letting this much go into YAGNIThe base principle here is that You Ain't Gonna Need It, which is something that I try to follow and a method of development that is particulartly suitable with high level tests. I'm really afraid of over-engineering. However, I also recognise the new "unknown possibilities" principle introduced with plugins and I follow you on that, so I'm willing to step a bit on the YAGNI side. However, if I can prove that something cannot possibly be used by a plugin, I hope that we'll agree to keep it out of RepeaterThe In the case of If Schedules and budgetsMaybe that someday we will allow the import of schedules and budgets (but please, let it be in the scope of another ticket :) ), but for now, it's impossible. So even if a plugin added budgets to our PropertiesThe problem with properties is that they shouldn't live in the So maybe that we can keep some of those properties in Also, we should probably initialize Date rangeIf we don't use it, we might as well not have the code in there. It's also faster because even with the "All transaction" range, there's always an analysis of all transactions going on at each change to re-evaluate the bounds of the current date range (yes, even with all transactions, we need an upper bound, otherwise, we could have tables with an infitity of schedule spawns. It's Global scopeThat flag is there only for schedule spawn editing. If we don't have schedules in our Account exclusionEasy one, not possibly useful in any way in Next stepAs I offered earlier, I really don't mind doing it. I don't think I'll be able to finish it today, but I'll try not to slow you down too much. |
About the So yes, we'll end up subclassing |
Yes, so my plan is that I am going to close my modifications of We'll at the very least need We can postpone the I would like to get the import work wrapped up within the next month Feb timeframe? |
The role of `BaseDocument` is to be the minimal common denominator for `Document` and the upcoming `ImportDocument` from #26 `BaseDocument` only has transactions/entries related public methods and doesn't (yet) broadcast any notification. The extraction isn't 100% clean, as we can see in the comments I've added: I couldn't figure out a way to entirely break free of any schedule spawn related code. That's why we end up propagating `global_scope` everywhere. It's not very pretty, but it's the best I could come with. Tox is still happy. The next step will be to start merging this branch with `issue_26_base` from @brownnrl and see if everything happens happily.
Synchronized! I was finishing up my first commit as you commented. My plan was actually to merge my work with your I think that my commit referenced above satisfies the conditions in your previous comment. |
There's the docstring matter that I pushed away. In your branch, you added |
Looks good, I will try to merge and test on my end tomorrow if I have time, but definitely within the week. Can we start a PR for issue_26_base into hsoft/develop just after that? Or would you want to wait for the other additional features? |
Yeah, we can open a PR after that. We will end up merging it all, it's just that I prefer to review one chunk at a time. |
First pass at defining a simple test and some logic around the import autofill.
Time flies and I'd like to release a v2.9 soon (to test that big refactoring in the wild!), which would make this ticket out of scope for this release. I hope you don't mind @brownnrl. |
Sorry I've been MIA on this, I feel bad about it. Been really busy for the past 3 months selling our old home and buying a new one (which MG was very, very helpful with if not crucial for figuring out what we can afford based on our worth). Pretty much sucked up all my time outside of work. We're moved in, but I've been trying to get the home network set up / electrical / etc. When I can get some stable free time, hope to get back to committing soon. |
No problem at all, I understand completely and I'm not trying to pressure you! I just wanted to make sure that you were OK with me taking over some ticket that we had been reserving for you. |
No issues, I am not one to stand in the way of progress. I just got my home PC out of the box. Not the fun unboxing, the unboxing having previously boxed unboxing. :) So it may still be a bit before I'm back to MG contributions, hopefully soon, but probably not before you'll put out the 2.9 release. |
When importing, it's always the same. There are some meaningless descriptions and they are always replaced by the same thing. There should be way to do that automatically.
First, for this feature to work, it must be possible to edit descriptions, payee and checkno in the import table. The moneyGuru document would maintain a list of all replacement that were made. For example, if, in the import table, I change "MEANINGLESS CRAP BLAH ** BLAH" to "Groceries", moneyGuru will remember it.
Then, there will be the "Smart Replace" (or some cleverer name) button in the import window that, as soon as there's one value that can be replaced, will be enabled. When clicked, a menu with all possible replacements will pop up (there will also be a "Perform all" item). When the user clicks on an item, a replacement occurs.
These replacements will be saved with the moneyGuru file.
The text was updated successfully, but these errors were encountered: