-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, HELP NEEDED] Fix many issues #35
Conversation
This patch makes the dict of mail hash: messages store only the mail file paths. Then, for duplicate mail clusters, we load the messages into RAM before investigating and possibly removing duplicates.
Also fix diff function argument types
Current coverage is 61.02% (diff: 66.66%)@@ develop #35 diff @@
==========================================
Files 3 3
Lines 304 313 +9
Methods 0 0
Messages 0 0
Branches 67 70 +3
==========================================
+ Hits 98 191 +93
+ Misses 194 91 -103
- Partials 12 31 +19
|
For the encoding challenge, I suggest we try decoding with |
Also, it would appear that unlike the rest of python3, the |
Turns out this actually helped RAM usage This reverts commit 42db5f4.
Thanks for the PR! I'm going to review it soon. Do you plan to commit some more stuff or is it ready to merge? Do you have the courage of adding a couple of unit-tests? Even one or two would be great. |
Ah, it appears to be working for me, but some review and testing especially of the encoding handling would be good. I'll take a look at tests. |
There is another issue. I have vim automatically running pylint, and it detects an undefined variable. Clearly I haven't reached this code path, and I can't tell what the variable should be. Could you take a look at this line please? |
Tracked the This really makes me sad. There nothing to prevent anybody to push mistakes. Maildir-deduplicate is really helpful for a lot of people, but the code quality is not very good. Essentially because of its patchy history. If you plan to rely on it for important data, I'll be really happy to merge some unit-tests. Even a 10% in increased coverage would be a great leap forward. |
Working on a super basic test. Don't have heaps of time, but should at least test maildir handling. |
date_only was added in commit c87818c but was made obsolete in commit e847797 . Thanks to @kdmurray91 for pointing this out: #35 (comment)
@kdmurray91 That's the spirit! If you can only bootstrap the unit-tests of the project, it would be easier, little by little, to improve the whole thing. As for the |
This PR is good to merged IMHO. Just waiting for the last unittest commit. I'll probably do a proper release on PyPi after this one is merged. |
The test I have written is really basic, and could do with extension if you have a little time. (it's bedtime in Australia). Things to do:
I'm happy for you to merge & release this as you see fit. There can always be more PRs. |
Thanks @kdmurray91 ! Just waiting for the tests to pass. Then I'll merge it right away. |
Looks like maildirs are expected to have their |
Ah, of course. git won't keep them as they are empty. Doh. |
Fix incoming |
Looks like I've screwed up the silencing of the logs. If you can easily work this out, go ahead and merge and fix this then merge (I've added you as a collaborator on my repo). Otherwise I'll do it tomorow. |
I'll merge the PR and try to fix the issue. |
Thanks mate! |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi,
This started as a fix for #19 , but has grown somewhat.
So far I have:
Message
objects) while detecting possible duplicate mails. TheMesssage
objects are loaded from disk while processing clusters of mail files that have the same hash.cmp
issue described in Python 3 - TypeError: 'cmp' is an invalid keyword argument for this function #34Cheers,
Kevin