New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 26 base #420

Closed
wants to merge 22 commits into
base: develop
from

Conversation

Projects
None yet
2 participants
@brownnrl
Contributor

brownnrl commented Jan 20, 2015

Import plugins base functionality per the section ImportActionPlugin replacement of existing five actions.

Comment inline on any commit and I'll try to fix and address. I'll start in on the other items in the wiki, I think
the bayesian autofill since that should be segregated to a single class and file.

brownnrl added some commits Jan 13, 2015

(#26) Intelligent Import Autofill
Starting with some tests for import actions as part of the
first feature addition described in the wiki.
Base Document Refactoring
Was using function parameters to pass cached or calculated values to
base document that were also required by the subclass (instead of
recalculating those same values again).  Thought that I shouldn't
expose those parameters publically to avoid confusion about their
intent.
Base Document Refactoring
Through budgets, onto schedules next.
Base Document Refactoring
Completes base document refactor.
Base Document Refactor
Forgot to do a check prior to setting the ahead_months property.  Was
causing ``test_setting_prop_makes_doc_dirty`` to fail in
``core.tests.gui.docprops_view_test``.  Added check.
@hsoft

This comment has been minimized.

hsoft commented on core/tests/gui/import_window_test.py in b6c3d08 Jan 17, 2015

I'm a little puzzled by your usage of a float value here (-1.). moneyguru's amounts never deal with floats (internally, the amount is stored in int which represent the number of cents (or whatever subunit the currency has)). IIRC, it will be properly converted as soon as it's coerced with an Amount instance, but using a float introduce uncertainty as to how it will be processed.

Of course, you could be wanting to test that uncertainty. In that case, a comment would be appropriate.

This comment has been minimized.

Owner

brownnrl replied Jan 17, 2015

That's funny, I am working on this now, and I just finished fixing this test after I discovered it was failing. I looked at the Amount class a little more closely, and got educated.

I don't think it coerces, I can't += 1 there either (I get a TypeError' Amounts can only be compared with other amounts or zero'). Have to convert it to an amount after reading the currency.

Your comment is useful in retrospect. :)

@hsoft

This comment has been minimized.

hsoft commented on b6c3d08 Jan 17, 2015

Good stuff. I like the way you test.

@hsoft

This comment has been minimized.

hsoft commented on core/plugin.py in ead06be Jan 18, 2015

I don't think that this is needed. __init__() is going to be called to both superclasses automatically if you don't add an __init__() method yourself.

This comment has been minimized.

Owner

brownnrl replied Jan 20, 2015

I'm not sure... tested with 3.3 and 3.4...

https://gist.github.com/brownnrl/86bd07c89e718cf0bc5e

This comment has been minimized.

hsoft replied Jan 20, 2015

Some kind of misconception has slipped into my mind then, sorry.

This comment has been minimized.

Owner

brownnrl replied Jan 20, 2015

No worries, you had me convinced it was some sort of exception to the MRO. Thought I would test it to get clarity.

brownnrl added some commits Jan 20, 2015

Added the ``ImportDocument`` class, fixed some usage of ``BaseDocument``
centering around fewer available functions (though same functionality
existed through alternative calls).
(#418) Importing transactions does not update undo stack in Qt UI
I noticed that the ``transaction_imported`` callback (is that what this
is?) didn't match the message name (correct usage?) of
``'transactions_imported'``.

Right?  I'm not 100% on this, but making the two align seems to fix the
issue and all tests pass.
@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 8, 2015

It's been a while since there's been activity here and I'm wondering: is there anything required on my part for this PR to continue going forward? Is it ready to merge?

@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Feb 9, 2015

Sorry about that, I've had a bit of an uptick in work at my day job and some increase in my other job as well. That and I'm trying to finish up my instrument ticket on my pilot's license.

The time I did have to work on moneyguru I spent trying to get my own workflow coded using this PR's base functionality. It's actually worked out pretty well so far. Things like my credit card QIF download come "pre-categorized" so the transfer column is there with how the credit card company views my transactions, but never quite aligned with my own account naming scheme. But it's very helpful information.

So I have my plugin check what the selected target account is, and apply some of the logic based on that to categorize each transfer. It's worked out pretty well with just this PR.

For my own case, it would be nice to be able to edit the fields to do the mop up work when there just isn't enough information.

I know the majority of users won't make their own plugin, so the autofill is important. I find the problem interesting too, so I will be working on it. I'll have a couple more free hours this week to throw at it.

Sorry for the delay. It's up to you if you want to accept the PR into develop. I think it's been pretty solid for my own use case. I was going to submit the other functionality in separate PRs assuming this one was going to be merged. You could just as well pull it down from my fork and keep it as a separate issue_26_base branch, and I'll submit the PRs for the other features against that branch.

Or wait until the other features are complete. Up to you.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 10, 2015

To be sure: there is absolutely no hurry. I just wanted to be sure that I wasn't unknowingly blocking something.

The plan, IIRC, was to merge your code in develop in small chunks, so if this chunk is, in your opinion, ready to merge, I'll gladly proceed with my complete review. Is it the case?

@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Feb 10, 2015

Yes, in my opinion, this base chunk is ready for review and merge into develop if you like the feature sets from that video and the prototype on issue_26. The remaining features take advantage of this code, but there shouldn't be any overlap. Any changes made to the code from this PR will be a result of your recommendations or (hopefully minimal number of) bugs that you find.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 14, 2015

The UUID again

I began reviewing but I worry about the UUID thing again. I understand the problem that it solves, but it has a side effect: It changes the meaning of from_transaction() to something that is not intuitive.

Normally, when using this method, you expect to clone the transaction into a new, separate entity, but with UUID, it's not the case anymore because you end up with a transaction for which all your splits have a relation of equality (even if you change amounts and stuff) because they have the same UUID.

But then, when I grep the code for replicate() (which uses from_transaction()), a bird's eyes view of the result seem to indicate that everywhere it's used expect a behavior where we want to clone the UUID (undo, import), and other cases where I'm not sure of the implications (schedules, budgets, I'll have to verify this).

That would explain why all tests still pass, but we still end up with a method that is kind of a misnomer and I think we'll have to change its name/semantic/something.

Thinking...

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 14, 2015

If, instead of adding UUIDs, we changed Transaction.set_splits() so that it tries to re-use split instances in simple cases, for example, when the split count is equal? Wouldn't that solve most of our problems?

@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Feb 14, 2015

Hmm, re-use of split instances would probably work.

I didn't add any conditional checks like the one you are describing to from_transaction or set_splits, because I wasn't sure what those conditional checks should be. The question at what point is your Split no longer the same and require new Split instances? When you change the number of splits in the parent transaction? When you change it's value? When you change the account?

I guess it all depends on what kinds of things can store old Split instances and have desire to store relational information that isn't included as members of the Split. Currently, that is your import table relating two Split instances together. Later, it could be a plugin. But, as you've said, KISS and YAGNI.

Practically speaking, the effect of this change would be that if a import plugin changed the number of splits resident in a transaction, then the binding between an existing and imported entry would be broken because the import table would no longer have a working hash / equality check to the new entry. That's probably acceptable.

Would you discuss it further, me try the change on this branch, or would you like to clone and change?

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 15, 2015

I'm still unsure about what to do. I'll try to implement something tomorrow and we'll then be able to compare the two solutions.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 15, 2015

I hoped to be able to come up with a test that would show a regression introduced by the UUID thing, to no avail, because there is no regression: We use Split.__eq__() absolutely nowhere (except, of course, in the specific places where you started using your specific implementation).

However, I know that it's going to bite us later, for example in Document.duplicate_transaction() where replicate() is used with the "old" assumptions. It's just that no bug is introduced because we never use split equality outside the import_window realm. But if we ever do, bang! we'll have a confusing bug in our hands.

I hope I don't look irrationally picky with this. I'll go ahead and implement my set_splits() thing without supporting tests.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 15, 2015

Oh, but I also realize, as I removed the UUID implementation, that the tests still pass. I'll add a test for that (that is, to test that a plugin calling change_transaction() with a replicate doesn't break import bindings).

Add test to the new import plugin system
This is to strengten PR #420. We would previously not have a failing
test if we removed the `Split.__eq__()` addition of this PR.

I had to make a few tweaks around to make this all work.
self._match_entries[import_key] = new_match
def _convert_matches(self):

This comment has been minimized.

@hsoft

hsoft Feb 22, 2015

Owner

We already talked about the subject and concluded that lack of comments was simply a temporary situation, but this part of the code would also be a good candidate for comments.

This comment has been minimized.

@hsoft

hsoft Feb 22, 2015

Owner

To comment further, I know that this method is the result of a battle you had with the code to make code-that-is-yet-to-review work. I grasp roughly what it does and, although I find it too complex for my taste, I know that I don't have the full picture yet. So we'll pull it in.

self.matches.append([match_entry.existing, entry])
else:
[import_entry] = [e for e in import_entries
if self._get_matching_key(e) == match_entry.imported]

This comment has been minimized.

@hsoft

hsoft Feb 22, 2015

Owner

Is _get_matching_key() still relevant? I feel that it needlessly complexify many parts of the code.

self.matches.append([entry, other])
self.matches += [[None, entry] for entry in to_import]
continue
#if not e.reconciled?

This comment has been minimized.

@hsoft

hsoft Feb 22, 2015

Owner

That no test is broken by this isn't surprising and kind of touches the issue brought forward by #389: that it doesn't make much sense to show only unreconciled transactions on the left side.

That an unreconciled entry ends up on the left side due to the lack of condition here could only happen if it was first matched automatically and then manually unbound by the user. And in that case, we don't really want to hide that existing entry...

# there are ramifications here to think about in terms of expected behavior.
# old behavior is to store accounts without importing segregated by their respective
# loader account lists...
# self.import_document.clear()

This comment has been minimized.

@hsoft

hsoft Feb 22, 2015

Owner

I'm not sure that I understand the comment here. The current behavior looks correct to me, as it re-cooks existing import documents and add a new import_document along with relevant AccountPane if we've just loaded something to import. Is this something you wanted to discuss with me?

This comment has been minimized.

@brownnrl

brownnrl Feb 22, 2015

Contributor

More cruft from the prototype to "review ready" branch. Not as review ready as it could be. :)

I think at this point I didn't know the scope of the import document, that is to say, whether it should encompass all accounts and their associated panes that are imported (including those imported from different files or cancelled between sessions), just one account per pane, other thoughts along those lines. I had just one universal import document that I cleared between imports. That was when I wrote that comment.

After all alternatives were eliminated for one reason or another as I got to understand how the process worked, I made one import document per new loader.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 22, 2015

I'm finished with my review and my general opinion is favorable (once we address specific points I've mentioned inline). I really like where we're going with this.

As I wrote inline, the core of the matching algorithm is a bit... coarse to my taste, but it's good enough to merge. I'm sure we'll refine it as we go.

@hsoft

This comment has been minimized.

Owner

hsoft commented Feb 22, 2015

(and to be sure, I'm not questioning your ability with the word "coarse", it's just the first word that comes to mind. What you undertook is a huge task, coarseness is a necessary collateral)

@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Feb 22, 2015

Coarse is a very good word for it. I was shooting for functional in a learn-the-system as you go approach. This branch isn't necessarily concerned with import matching, I have another plugin defined that matches based on equality of attributes with decreasing weight as attributes differ (some attributes more important than others) which after implementation should satisfy #389.

That code is still back in the issue_26 prototype waiting to be brought forward after this is stable. My plan is to include some different plugins that make different recommendations (some more "aggressive" than others) through the matching interface as part of testing. That will allow refactoring and clean up of this code.

But this kind of lays ground work. I will go through and clean up with your recommendations, add documentation, and add comments (especially to the matching and recommendation system).

I'll let you know when I'm done. I am going on vacation starting Wednesday through Monday/Tuesday, so you might not see a lot on it this week.

(#420) Issue 26 base PR review edits.
* Added ``tr()`` calls to plugin and action names
* Made ``TransactionSelectionMixin`` a superclass of ``TransactionTableBase``
* Updated some documentation of ``TransactionSelectionMixin``
* Removed unused (and misssspelled) ``_match_probabilties`` from ``AccountPane``
* Removed ``EntryMatch`` tuple, replaced tuple returns in matching plugins with
  ``EntryMatch`` instances.
* Last bullet point required adding an ``__iter__`` method to ``EntryMatch``.
* Removed unnecessary ``_get_match_key`` method
@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Mar 7, 2015

Still have some work to do in code commenting and documentation.

(#420) Issue #26 base feature implementation.
Added a good bit of comments and documentation to the import_window file.
@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Mar 15, 2015

So is the previous commit what you require in terms of commenting? Do you need more?

I'd like to start putting some more time toward doing the classifier in the other branch... I will come back additional commenting and documentation if you feel the need.

brownnrl added some commits Mar 15, 2015

Merge branch 'develop' into issue_26_base
Conflicts:
	core/model/entry.py
	core/tests/base.py
	core/tests/gui/import_window_test.py
@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 17, 2015

The commenting is more than enough. I'd even say that it's spurious in some instances, but I'm not going to start getting picky about that :)

I have to say I'm a bit afraid of the complexity that the import window unit is garnering, but I'm sure we'll be able to elegantly refactor this once your whole PR is merged in.

@hsoft

This comment has been minimized.

hsoft commented on core/plugin.py in 4a8cfed Mar 17, 2015

So you didn't go for a namedtuple approach after all? Is there a specific reason?

This comment has been minimized.

Owner

brownnrl replied Mar 19, 2015

I think I forgot the direction we wanted to go from our conversation (I think the comments were hidden as I pushed changes, though I know you can make them visible). I guess I had it in my head we were going to class based approach. In retrospect, it was probably a poor decision since I even had to add a __iter__ method to the class where I just cast to a tuple anyway to make it compatible with existing code. I'll have to change it back when I return to this section.

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

I have a hard time following the status of this PR. Do you consider it ready to merge @brownnrl?

@brownnrl

This comment has been minimized.

Contributor

brownnrl commented Mar 28, 2015

From my point of view, yes. So we are clear, this PR does two things.

  1. Allows plugins of a certain class to recommend matching existing entries to imported ones, with the highest scoring entries (determined by the plugins) winning out.
  2. Allows plugins of a certain class to extend the "Fix" actions in the import window, and gives the plugins access to the underlying accounts being imported so they can perform those actions.

A branch I am working on now will use that functionality to create a plugin that will change import transfers based on existing account entries. I am currently experimenting with the best approach for that, using two of my own accounts (a checking account first and I will be working on a credit card account next) with about 2+ years of data from 2013-2015 as a basis. Descriptions are for the most part unmodified from the original. At first I was using a Naive Bayes text classifier (exact code borrowed from here), but the results weren't very good. Each account entry description was treated as a document, and the vocabulary was built from those descriptions. But the large number of account entries and vocabulary compared with the small number of occurrences of words per entry (only 3-5) really created a lot of noise that made for poor results.

I switched over to filtering out certain words, ones occurring in a large percentage of entries or only occurring once or twice, and then using the remaining as features per account. This has yielded much more favorable results, so I think I'm moving in the right direction. As of right now, I'm just experimenting with various methods in an ipython notebook on my exported base data trying to come up with the best and simplest approach.

When I finalize on an approach, I'll incorporate it into a plugin using this PR's functionality.

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

Alright, let's do it then! I'm currently in the process of merging the PR. There's a couple of minor adjustments I'll make afterwards though:

  • minor pep8 violations
  • the namedtuple thing, which was never done
  • Use a proper constant (instead of the old True) in tests for apply in perform_swap()
  • Remove SwapType definition (which doesn't make sense anymore) from actual code and define it in test code instead.
  • Add proper docstrings to BaseDocument
  • Make sure that the dev doc still look good afterwards

I hope you don't mind these changes.

Speaking of the dev docs, I was wondering: how do you find it? Is it adequate? Did it help you learn about the code? I spent a lot of time writing it and I'm wondering if it actually adds value :)

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

Oh, there's also this comment that I want to make for the record: Although this is essentially a refactoring change, it does change one behavior: the import table is now re-sorted right after a swap, as we can see in the changes made to the existing import tests.

I don't think that this behavior was intended, but rather a hard-to-avoid consequence of the new matching process.

I'm not sure how I feel about this new behavior. It does make sense to keep the table sorted at all times, but it does make the date swap more confusing to follow. However, with the perspective of allowing plugins to add more mass actions that would possibly alter the dates, it's probably sounder to keep that list sorted.

This comment is there so we don't think later that this was unknowingly introduced behavior.

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

Houston, we have a problem. It looks like 4821d40 which adds a new test was mistakenly reverted. I think it was from the merge at 596cd8f.

Investigating...

hsoft added a commit that referenced this pull request Mar 28, 2015

Merge commit 'a6a034122b45662bfcc277452cf53920e4be0e03' (PR #420) int…
…o develop

Conflicts:
	core/model/entry.py
	core/tests/base.py
	core/tests/gui/import_window_test.py

hsoft added a commit that referenced this pull request Mar 28, 2015

hsoft added a commit that referenced this pull request Mar 28, 2015

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

I solved the problem by merging the branch 2 commits prior (before you tried to merge with develop)

hsoft added a commit that referenced this pull request Mar 28, 2015

import_window: Remove SwapType
We keep it around in tests units for legacy purpose, but it's not in the
core code anymore.

ref #420

hsoft added a commit that referenced this pull request Mar 28, 2015

@hsoft

This comment has been minimized.

Owner

hsoft commented Mar 28, 2015

Well, I'm finished here. Thanks again for this excellent work! It was quite a feat to manage to pull this out, considering how it affects many parts of the code at once.

This PR will technically show as unmerged, but that's just because I had to merge 2 commits prior.

@hsoft hsoft closed this Mar 28, 2015

@hsoft

This comment has been minimized.

Owner

hsoft commented Jul 5, 2015

In the context of #432, I found a couple of problems with the new import system, which were unfortunately not caught by the tests. I'm in the process of enhancing the tests and fixing those problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment