[WIP] Modernise ISBG #63

duk3luk3 · 2016-12-28T12:56:10Z

Following up from #61, this is my first stab at modernising ISBG. It's very rough and probably buggy, I have only gotten it to a state where it doesn't syntax error when running. It is intended to have identical behaviour to current ISBG version plus my #61. This needs to be ensured by doing a lot of testing.

It then needs to be cleaned up.

TODO:

Turn ISBG into a class
~~[ ] Use argparse instead of docopt~~
Turn print output into programmatically usable return values / attributes
Use logging module for debug output
Fix up setup.py
Tune up sa-unwrap.py (Turn into proper module / add to setup.py / add docs) or remove it

Rationale for changes:

argparse is part of stdlib, so we could get rid of the docopt dependency. I've also found it perfectly usable in the past and don't see the advantage of docopt.
ISBG should be usable as a library, that include return information
logging module gives us flexible options for debugging / logging output

Feedback very welcome!

Works analogously to `--learnthendestroy`

Due to introduction of --learnunflagged number of messages in learning boxes determined from return of mailbox select command is no longer accurate. So this was changed to be determined from the result of the IMAP SEARCH command.

Logging is good, but whole mail bodies are frequently unwanted. This commit introduces a new --verbose-mails argument and changes the behaviour of `assertok` to only print the first 100 chars of any imap results if not set.

baldurmen · 2016-12-29T22:48:27Z

Hi! I'll take a look at this soon.

In the meanwhile, please don't work on changing docopt to argparse. I actually did the argparse to doctopt conversion a while ago.

I really like docopt. It's very clean (around 500 lines of code) and works really well. My main argument for using it is that it manages cli interface, help menus and tons of stuff for you automatically. All these things have to be done manually with argparse and it's a hassle I prefer not to deal with.

I understand it's not part of the standard python libs, but all modern linux distros should have docopt packages. For the corner cases, there is always the option of adding docopt.py manually somewhere.

duk3luk3 · 2016-12-30T08:03:57Z

OK - that's fine, to discuss things like this is why I opened the PR in this state.

As you'll see I'm sanitizing the options that come out of docopts quite heavily in the new parse_args function to have an easier time working with them in the do_isbg function. What's the preferred way to specify default argument values? I really wanted to get rid of all the if opts["foo"] is not None switches everywhere in the code.

duk3luk3 · 2017-01-02T12:46:28Z

With commit 8a93157, the following call should now work:

./isbg.py --dryrun --verbose --imaphost <host> --imapuser <user> --spaminbox <spaminbox>

I've added --dryrun option to disable / mock all IMAP write operations and Spamassassin calls, as well as limit the number of messages processed to 5 (with 1 "detected" as spam) for testing purposes.

Add --dryrun switches to learning modes (for --learnspambox / --learnhambox) and fixup

duk3luk3 · 2017-01-05T09:29:02Z

I fixed up --learnspambox / --learnhambox in 55839e3, so the following now works:

./isbg.py --dryrun --verbose --imaphost <host> --imapuser <user> --teachonly --learnunflagged --learnhambox "INBOX.Spam.Learn-Ham"

My next step will be installing SA locally and do a less "dry" test.

Script for unwrapping spamassassin reports with the original mail embedded.

Patches up everything with python2 equivalents to new things in python3. Mainly string/bytes handling.

duk3luk3 · 2017-01-11T14:19:37Z

Now contains sa-unwrap.py that extracts email parts with Content-Type: message/rfc822; x-spam-type=original from an e-mail, i.e. the original e-mails from spamassassin spam reports.

I made this script because I need it for my usecase where I need to be able to safely handle spam mails encapsulated in spamassassin spam reports. I've experimented with spamassassin (and spamc/sa-learn) a bit and have not found any conclusive evidence that it treats these emails properly.

If you don't like it I'll happily remove it from the PR, but for now it's in because my ongoing work is on this branch.

This updates the architecture and layout of the python scripts to create proper modules and updates setup.py to make everything installable.

Pulls out spamlearning and hamlearning into a single function with unified logic.

Pulls out the main spamassassin processing into a member function.

duk3luk3 · 2017-01-13T10:20:56Z

More work:

1fd43cd reorganizes the repo to create a proper distribution.
e39e685 and 06cad66 perform some major refactoring, pulling out the main sa-learn and spamassassin processing steps into member functions

Forgot a `self` referring to an exitcode

`isbg.py` now tries to import and use `sa_unwrap.py` to unwrap mails that are SA reports with the original spam mail embedded. This only works if the `isbg` module is loaded properly. To stay compatible with users calling `isbg.py` directly, `unwrap` is replaced by a dummy no-op function if imprting `sa_unwrap` fails.

baldurmen · 2017-01-16T20:35:04Z

i'll be doing a first batch of merge+review this Thursday (promised :D)

duk3luk3 · 2017-01-17T07:22:04Z

Great! Please note that this is definitely not in a release-ready state. But I'll test and fixup the most recent changes I made today.

duk3luk3 · 2017-01-17T09:01:43Z

I pushed a more complete version of the option setter functions in 4f44457.

For now I've kept everything so that just grabbing isbg.py and running it should work. Is that a desired feature? It complicates some things.

(EDIT: Had to push a new commit to fix some typos)

Option setter functions for using isbg programmatically. Includes fixes to some option parsing that were spotted while working on the setter functions. WIP - Untested

baldurmen · 2017-01-17T20:52:32Z

No need to make anything working atm. I'm going to create a separate dev branch. I'm just going to review your code a little and comment on it.

* Set clean default settings in __init__ * Do not use sys.exit on successful run if exitcodes option not set * Changes for py3 compat, especially string encoding - mostly untested

duk3luk3 · 2017-01-18T08:58:39Z

8b814f4 is the result of trying to import isbg and use it programmatically from a python 3 script. It contains clean setting of default options, removes the sys.exit from the end of the script if exitcodes isn't set, and contains some changes to string handling and other things for python 3. It also fixes some typos from the last commit.

It's pretty ugly and most likely fails in all code paths except the ones I've tried, which is mainly the script posted below and one or two manual calls like noted in comments above.

This is right about the point where I should invest some effort in some form of automated testing...

Here is the script I use for calling isbg programmatically:
(commented out is the old popen-based usage)

#!/usr/bin/python3

"""
Uses ISBG https://github.com/isbg/isbg (gepatcht: https://github.com/duk3luk3/isbg/tree/learnthenflag )
to learn spam from mailstores.

Fetches unflagged mails from imap servers, pushes them into spamc for learning, and then optionally flags them.

Parameters:
  * 'doflag': Flag mails after processing

Prerequisites: 
  * python(2)-docopt
  * python(3)-yaml
  * settings file in /var/lib/spamassassin/imaplearn.yaml
  * log dir in /var/log/spamlearning that is writeable for the user running this script

A configuration dict must be created for every IMAP Server to be used.

Template:

IMAP: [
    {
    # IMAP Server
    'server': 'mailstore.example.com',
    # IMAP Username Admin Postfix
    'username': '*admin',
    # IMAP Admin Password
    'password': 'XXXXX',
    # Default spambox / hambox mailbox
    'default-spambox': 'INBOX.Spam.Learn-Spam',
    'default-hambox': 'INBOX.Spam.Learn-Ham',
    'users': [
        # users tuple: Name, Spam Mbox, Ham Mbox (null => use default, false => disabled)
        ['joe', null, null],
        ['hank', 'INBOX.spam.learn_spam', 'INBOX.spam.learn_ham'],
    ]
    },
    {
    'server': 'mailstore2.example.com',
    'username': '*admin',
    'password': 'XXXXXX',
    'default-spambox': 'Spam/Learn-Spam',
    'default-hambox': 'Spam/Learn-Ham',
    'users': [
        ['joe', 'Spamlearning/Learn-Spam', False],
    ]
    },
]
  
"""

import os
import sys
import subprocess
import logging

from isbg.isbg import ISBG

import yaml
with open('/var/lib/spamassassin/imaplearn.yaml') as settfile:
    IMAP = yaml.load(settfile)['IMAP']

logger = logging.getLogger('spamlearn_mailstore')
logger.setLevel(logging.DEBUG)

fh = logging.FileHandler('/var/log/spamlearning/spamlearn_mailstore.log')
fh.setLevel(logging.DEBUG)

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)

logger.addHandler(ch)
logger.addHandler(fh)

#ISBG='/opt/hostrepo/maildmz/isbg.py'

DOFLAG = 'doflag' in sys.argv



for imap in IMAP:
    server = imap['server']
    username = imap['username']
    password = imap['password']
    default_spambox = imap['default-spambox']
    default_hambox = imap['default-hambox']
    users = imap['users']
    for user, spambox, hambox in users:
        imap_username = user + username
        imap_spambox = spambox if spambox is not None else default_spambox
        imap_hambox = hambox if hambox is not None else default_hambox
        try:
            boxparams = []
            if imap_spambox:
                boxparams = boxparams + [ '--learnspambox', imap_spambox ]
            if imap_hambox:
                boxparams = boxparams + [ '--learnhambox', imap_hambox ]

            if len(boxparams) > 0:
                logger.info('Running ISBG spamlearning for user {} on mailstore {}'.format(user, server))

                isbg = ISBG()
                isbg.set_imap_opts(
                    imaphost=server,
                    imapport=993,
                    imapuser=imap_username,
                    imappasswd=password,
                    nossl=False
                )
                isbg.set_reporting_opts(
                    imaplist=False,
                    nostats=False,
                    noreport=False,
                    exitcodes=False,
                    verbose=True,
                    verbose_mails=False
                )
                isbg.set_mailboxes(
                    inbox="INBOX",
                    spaminbox="INBOX.spam",
                    learnspambox=imap_spambox,
                    learnhambox=imap_hambox
                )
                isbg.set_processing_opts(
                    dryrun=False,
                    maxsize=isbg.maxsize,
                    teachonly=True,
                    spamc=True,
                    gmail=False
                )
                isbg.set_learning_opts(
                    learnunflagged=True,
                    learnthendestroy=False,
                    learnthenflag=DOFLAG
                )

                isbg.do_isbg()
                
#                isbg_args = [
#                    ISBG,
#                    '--imaphost', server,
#                    '--imapuser', imap_username,
#                    '--teachonly', '--spamc', '--verbose', '--learnunflagged',
#                    '--noninteractive'
#                    ] + boxparams
#
#                if DOFLAG:
#                    isbg_args.append('--learnthenflag')
#
#                logger.debug('ISBG params (minus password): {}'.format(isbg_args))
#
#                isbg_args = isbg_args + [
#                    '--imappasswd', password,
#                ]
#                isbg = subprocess.Popen(
#                    isbg_args,
#    #                stdout = sys.stderr,
#    #                stderr = sys.stderr,
#                    stdout = subprocess.PIPE,
#                    stderr = subprocess.STDOUT,
#                    stdin = subprocess.PIPE,
#                    )
##                isbg_out, isbg_err = isbg.communicate((password + '\n').encode())
##                logger.info('ISBG output:\n>>>\n{}\n<<<'.format(isbg_out.decode().strip()))
#                isbg_out = isbg.stdout.read().decode()
#                logger.debug('ISBG output:\n===\n{}\n==='.format(isbg_out.strip()))
#                for line in isbg_out.split('\n'):
#                    if line.endswith('learnt'):
#                        logger.info(line)
            else:
                logger.warn('Skipped ISBG spamlearning for user {} on mailstore {} because no mailboxes'.format(user, server))
        except:
            logger.error('Error running isbg!', exc_info=True)

Replace all print calls by logging statements. A log handler prints all log messages to stderr.

Fixes two small issues.

Setting the log handler that logs everything to stderr only in the `isbg_run` method meant for when isbg is run as command / top level script. When isbg is run as a library, the caller is responsible for setting up log output. If we add a log handler this can lead to messages being duplicated.

* Add separate pastuid files for learning folders * Clean up functionality

Add UIDVALIDITY to uid trackfile to make sure we use pastuids properly

duk3luk3 · 2017-01-20T08:12:46Z

I added lots of features because I can't help myself.

* Use sa_unwrap properly * Log all exceptions

baldurmen · 2017-01-23T22:19:09Z

I thought I'd have more time to look at this in depth, but it looks like I don't :(

So here I merge, and I'll look at this eventually. Don't stop! Please open more specific PR on v.2.0 and I'll try to look and merge quickly.

Oh, and please join our IRC channel and our new mailing list!

duk3luk3 added 5 commits December 6, 2016 10:32

Add --learnthenflag

bfe6611

Works analogously to `--learnthendestroy`

Add '--learnunflagged'

45a578c

Correct h_tolearn and s_tolearn values

f78e277

Due to introduction of --learnunflagged number of messages in learning boxes determined from return of mailbox select command is no longer accurate. So this was changed to be determined from the result of the IMAP SEARCH command.

Introduce extra verbose flag to print mail bodies

92f52dc

Logging is good, but whole mail bodies are frequently unwanted. This commit introduces a new --verbose-mails argument and changes the behaviour of `assertok` to only print the first 100 chars of any imap results if not set.

Rewrite isbg into class

db03da0

duk3luk3 mentioned this pull request Jan 2, 2017

getmessage / assertok logic error #66

Closed

Touch-Up to get minimal "happy path" running

8a93157

Fixup learning mode

55839e3

Add --dryrun switches to learning modes (for --learnspambox / --learnhambox) and fixup

duk3luk3 added 2 commits January 11, 2017 14:26

Add sa-unwrap.py

291da3a

Script for unwrapping spamassassin reports with the original mail embedded.

Make sa-unwrap.py py2.7-compatible

6b1a7c4

Patches up everything with python2 equivalents to new things in python3. Mainly string/bytes handling.

Create a proper distribution

1fd43cd

This updates the architecture and layout of the python scripts to create proper modules and updates setup.py to make everything installable.

duk3luk3 force-pushed the modernise branch 2 times, most recently from 40baf02 to 1fd43cd Compare January 12, 2017 09:52

duk3luk3 added 2 commits January 13, 2017 08:17

Refactor spamlearning

e39e685

Pulls out spamlearning and hamlearning into a single function with unified logic.

Refactor spamassassin processing

06cad66

Pulls out the main spamassassin processing into a member function.

duk3luk3 added 2 commits January 13, 2017 11:29

Fix typo

45a951e

Forgot a `self` referring to an exitcode

duk3luk3 force-pushed the modernise branch from 1c61fb4 to 2c3dfbd Compare January 16, 2017 09:35

duk3luk3 force-pushed the modernise branch from 8193a50 to 4f44457 Compare January 17, 2017 08:27

Make option setters

74d1617

Option setter functions for using isbg programmatically. Includes fixes to some option parsing that were spotted while working on the setter functions. WIP - Untested

duk3luk3 force-pushed the modernise branch from 4f44457 to 74d1617 Compare January 17, 2017 09:11

Clean default settings, less exitcodes, py3

8b814f4

* Set clean default settings in __init__ * Do not use sys.exit on successful run if exitcodes option not set * Changes for py3 compat, especially string encoding - mostly untested

duk3luk3 added 2 commits January 18, 2017 13:51

Logging

0b01eb4

Replace all print calls by logging statements. A log handler prints all log messages to stderr.

Decode sa output

e103d70

duk3luk3 force-pushed the modernise branch from 75109e7 to e103d70 Compare January 19, 2017 12:46

duk3luk3 added 2 commits January 19, 2017 13:56

Fix lock and learning boxes

358a219

Fixes two small issues.

duk3luk3 force-pushed the modernise branch 4 times, most recently from 4437d4c to 728a296 Compare January 20, 2017 07:35

Clean up pastuid handling

a64d406

* Add separate pastuid files for learning folders * Clean up functionality

duk3luk3 force-pushed the modernise branch from 728a296 to a64d406 Compare January 20, 2017 07:40

Track uidvalidity

ba2514b

Add UIDVALIDITY to uid trackfile to make sure we use pastuids properly

duk3luk3 force-pushed the modernise branch 2 times, most recently from 99bfd2b to ba2514b Compare January 20, 2017 08:08

baldurmen mentioned this pull request Jan 21, 2017

Add --learnthenflag #61

Closed

baldurmen changed the base branch from master to v.2.0 January 21, 2017 22:37

Fix sa_unwrap, more logging

747a05a

* Use sa_unwrap properly * Log all exceptions

baldurmen merged commit 373963e into isbg:v.2.0 Jan 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Modernise ISBG #63

[WIP] Modernise ISBG #63

duk3luk3 commented Dec 28, 2016 •

edited

baldurmen commented Dec 29, 2016 •

edited

duk3luk3 commented Dec 30, 2016

duk3luk3 commented Jan 2, 2017

duk3luk3 commented Jan 5, 2017

duk3luk3 commented Jan 11, 2017

duk3luk3 commented Jan 13, 2017

baldurmen commented Jan 16, 2017

duk3luk3 commented Jan 17, 2017

duk3luk3 commented Jan 17, 2017 •

edited

baldurmen commented Jan 17, 2017

duk3luk3 commented Jan 18, 2017

duk3luk3 commented Jan 20, 2017

baldurmen commented Jan 23, 2017

[WIP] Modernise ISBG #63

[WIP] Modernise ISBG #63

Conversation

duk3luk3 commented Dec 28, 2016 • edited

baldurmen commented Dec 29, 2016 • edited

duk3luk3 commented Dec 30, 2016

duk3luk3 commented Jan 2, 2017

duk3luk3 commented Jan 5, 2017

duk3luk3 commented Jan 11, 2017

duk3luk3 commented Jan 13, 2017

baldurmen commented Jan 16, 2017

duk3luk3 commented Jan 17, 2017

duk3luk3 commented Jan 17, 2017 • edited

baldurmen commented Jan 17, 2017

duk3luk3 commented Jan 18, 2017

duk3luk3 commented Jan 20, 2017

baldurmen commented Jan 23, 2017

duk3luk3 commented Dec 28, 2016 •

edited

baldurmen commented Dec 29, 2016 •

edited

duk3luk3 commented Jan 17, 2017 •

edited