Conversation
Works analogously to `--learnthendestroy`
Due to introduction of --learnunflagged number of messages in learning boxes determined from return of mailbox select command is no longer accurate. So this was changed to be determined from the result of the IMAP SEARCH command.
Logging is good, but whole mail bodies are frequently unwanted. This commit introduces a new --verbose-mails argument and changes the behaviour of `assertok` to only print the first 100 chars of any imap results if not set.
Hi! I'll take a look at this soon. In the meanwhile, please don't work on changing docopt to argparse. I actually did the argparse to doctopt conversion a while ago. I really like docopt. It's very clean (around 500 lines of code) and works really well. My main argument for using it is that it manages cli interface, help menus and tons of stuff for you automatically. All these things have to be done manually with argparse and it's a hassle I prefer not to deal with. I understand it's not part of the standard python libs, but all modern linux distros should have docopt packages. For the corner cases, there is always the option of adding docopt.py manually somewhere. |
OK - that's fine, to discuss things like this is why I opened the PR in this state. As you'll see I'm sanitizing the options that come out of docopts quite heavily in the new |
With commit 8a93157, the following call should now work:
I've added |
Add --dryrun switches to learning modes (for --learnspambox / --learnhambox) and fixup
I fixed up
My next step will be installing SA locally and do a less "dry" test. |
Script for unwrapping spamassassin reports with the original mail embedded.
Patches up everything with python2 equivalents to new things in python3. Mainly string/bytes handling.
Now contains I made this script because I need it for my usecase where I need to be able to safely handle spam mails encapsulated in spamassassin spam reports. I've experimented with spamassassin (and spamc/sa-learn) a bit and have not found any conclusive evidence that it treats these emails properly. If you don't like it I'll happily remove it from the PR, but for now it's in because my ongoing work is on this branch. |
This updates the architecture and layout of the python scripts to create proper modules and updates setup.py to make everything installable.
40baf02
to
1fd43cd
Compare
Pulls out spamlearning and hamlearning into a single function with unified logic.
Pulls out the main spamassassin processing into a member function.
`isbg.py` now tries to import and use `sa_unwrap.py` to unwrap mails that are SA reports with the original spam mail embedded. This only works if the `isbg` module is loaded properly. To stay compatible with users calling `isbg.py` directly, `unwrap` is replaced by a dummy no-op function if imprting `sa_unwrap` fails.
i'll be doing a first batch of merge+review this Thursday (promised :D) |
Great! Please note that this is definitely not in a release-ready state. But I'll test and fixup the most recent changes I made today. |
I pushed a more complete version of the option setter functions in 4f44457. For now I've kept everything so that just grabbing (EDIT: Had to push a new commit to fix some typos) |
Option setter functions for using isbg programmatically. Includes fixes to some option parsing that were spotted while working on the setter functions. WIP - Untested
No need to make anything working atm. I'm going to create a separate dev branch. I'm just going to review your code a little and comment on it. |
* Set clean default settings in __init__ * Do not use sys.exit on successful run if exitcodes option not set * Changes for py3 compat, especially string encoding - mostly untested
8b814f4 is the result of trying to import isbg and use it programmatically from a python 3 script. It contains clean setting of default options, removes the It's pretty ugly and most likely fails in all code paths except the ones I've tried, which is mainly the script posted below and one or two manual calls like noted in comments above. This is right about the point where I should invest some effort in some form of automated testing... Here is the script I use for calling isbg programmatically: #!/usr/bin/python3
"""
Uses ISBG https://github.com/isbg/isbg (gepatcht: https://github.com/duk3luk3/isbg/tree/learnthenflag )
to learn spam from mailstores.
Fetches unflagged mails from imap servers, pushes them into spamc for learning, and then optionally flags them.
Parameters:
* 'doflag': Flag mails after processing
Prerequisites:
* python(2)-docopt
* python(3)-yaml
* settings file in /var/lib/spamassassin/imaplearn.yaml
* log dir in /var/log/spamlearning that is writeable for the user running this script
A configuration dict must be created for every IMAP Server to be used.
Template:
IMAP: [
{
# IMAP Server
'server': 'mailstore.example.com',
# IMAP Username Admin Postfix
'username': '*admin',
# IMAP Admin Password
'password': 'XXXXX',
# Default spambox / hambox mailbox
'default-spambox': 'INBOX.Spam.Learn-Spam',
'default-hambox': 'INBOX.Spam.Learn-Ham',
'users': [
# users tuple: Name, Spam Mbox, Ham Mbox (null => use default, false => disabled)
['joe', null, null],
['hank', 'INBOX.spam.learn_spam', 'INBOX.spam.learn_ham'],
]
},
{
'server': 'mailstore2.example.com',
'username': '*admin',
'password': 'XXXXXX',
'default-spambox': 'Spam/Learn-Spam',
'default-hambox': 'Spam/Learn-Ham',
'users': [
['joe', 'Spamlearning/Learn-Spam', False],
]
},
]
"""
import os
import sys
import subprocess
import logging
from isbg.isbg import ISBG
import yaml
with open('/var/lib/spamassassin/imaplearn.yaml') as settfile:
IMAP = yaml.load(settfile)['IMAP']
logger = logging.getLogger('spamlearn_mailstore')
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler('/var/log/spamlearning/spamlearn_mailstore.log')
fh.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(ch)
logger.addHandler(fh)
#ISBG='/opt/hostrepo/maildmz/isbg.py'
DOFLAG = 'doflag' in sys.argv
for imap in IMAP:
server = imap['server']
username = imap['username']
password = imap['password']
default_spambox = imap['default-spambox']
default_hambox = imap['default-hambox']
users = imap['users']
for user, spambox, hambox in users:
imap_username = user + username
imap_spambox = spambox if spambox is not None else default_spambox
imap_hambox = hambox if hambox is not None else default_hambox
try:
boxparams = []
if imap_spambox:
boxparams = boxparams + [ '--learnspambox', imap_spambox ]
if imap_hambox:
boxparams = boxparams + [ '--learnhambox', imap_hambox ]
if len(boxparams) > 0:
logger.info('Running ISBG spamlearning for user {} on mailstore {}'.format(user, server))
isbg = ISBG()
isbg.set_imap_opts(
imaphost=server,
imapport=993,
imapuser=imap_username,
imappasswd=password,
nossl=False
)
isbg.set_reporting_opts(
imaplist=False,
nostats=False,
noreport=False,
exitcodes=False,
verbose=True,
verbose_mails=False
)
isbg.set_mailboxes(
inbox="INBOX",
spaminbox="INBOX.spam",
learnspambox=imap_spambox,
learnhambox=imap_hambox
)
isbg.set_processing_opts(
dryrun=False,
maxsize=isbg.maxsize,
teachonly=True,
spamc=True,
gmail=False
)
isbg.set_learning_opts(
learnunflagged=True,
learnthendestroy=False,
learnthenflag=DOFLAG
)
isbg.do_isbg()
# isbg_args = [
# ISBG,
# '--imaphost', server,
# '--imapuser', imap_username,
# '--teachonly', '--spamc', '--verbose', '--learnunflagged',
# '--noninteractive'
# ] + boxparams
#
# if DOFLAG:
# isbg_args.append('--learnthenflag')
#
# logger.debug('ISBG params (minus password): {}'.format(isbg_args))
#
# isbg_args = isbg_args + [
# '--imappasswd', password,
# ]
# isbg = subprocess.Popen(
# isbg_args,
# # stdout = sys.stderr,
# # stderr = sys.stderr,
# stdout = subprocess.PIPE,
# stderr = subprocess.STDOUT,
# stdin = subprocess.PIPE,
# )
## isbg_out, isbg_err = isbg.communicate((password + '\n').encode())
## logger.info('ISBG output:\n>>>\n{}\n<<<'.format(isbg_out.decode().strip()))
# isbg_out = isbg.stdout.read().decode()
# logger.debug('ISBG output:\n===\n{}\n==='.format(isbg_out.strip()))
# for line in isbg_out.split('\n'):
# if line.endswith('learnt'):
# logger.info(line)
else:
logger.warn('Skipped ISBG spamlearning for user {} on mailstore {} because no mailboxes'.format(user, server))
except:
logger.error('Error running isbg!', exc_info=True) |
Fixes two small issues.
Setting the log handler that logs everything to stderr only in the `isbg_run` method meant for when isbg is run as command / top level script. When isbg is run as a library, the caller is responsible for setting up log output. If we add a log handler this can lead to messages being duplicated.
4437d4c
to
728a296
Compare
* Add separate pastuid files for learning folders * Clean up functionality
Add UIDVALIDITY to uid trackfile to make sure we use pastuids properly
99bfd2b
to
ba2514b
Compare
I added lots of features because I can't help myself. |
* Use sa_unwrap properly * Log all exceptions
I thought I'd have more time to look at this in depth, but it looks like I don't :( So here I merge, and I'll look at this eventually. Don't stop! Please open more specific PR on v.2.0 and I'll try to look and merge quickly. Oh, and please join our IRC channel and our new mailing list! |
Following up from #61, this is my first stab at modernising ISBG. It's very rough and probably buggy, I have only gotten it to a state where it doesn't syntax error when running. It is intended to have identical behaviour to current ISBG version plus my #61. This needs to be ensured by doing a lot of testing.
It then needs to be cleaned up.
TODO:
[ ] Use argparse instead of docoptprint
output into programmatically usable return values / attributeslogging
module for debug outputsetup.py
sa-unwrap.py
(Turn into proper module / add tosetup.py
/ add docs) or remove itRationale for changes:
logging
module gives us flexible options for debugging / logging outputFeedback very welcome!