Spamassassin #10

Closed
wants to merge 70 commits into
from

Projects

None yet

3 participants

@msimerson
Member

There are several commits in this branch that all revolve around a single theme, spamassassin and spam plugin tests. The tests exposed a number of areas ripe for improvement. I went a picking.

msimerson added some commits Apr 8, 2012
@msimerson msimerson POD corrections, additional tests, plugin consistency
on files in plugins dir:
  fixed a number of POD errors

  formatted some # comments into POD

  removed bare 1;  (these are plugins, not perl modules)
    most instances of this were copy/pasted from a previous plugin that had it

  set line 1 to be the standard perl shebang: #!/usr/bin/perl
    1. it's a way that automated test modules detect the file as perl
    2. it's a standard way for editors to detect the file is perl
    3. it's "the perl way" as the Larry suggested long ago
    4. half the files had it that way.
    5. anyone that doesn't like it now has a consistent pattern they can search/replace

  removed instances of # vim ts=N ...
    they weren't consistent, many didn't match .perltidyrc

  on modules that failed perl -c tests, added 'use Qpsmtpd::Constants;'
06e38c5
@msimerson msimerson doc fix: changed $TRACE_LEVEL to $TraceLevel 41f4c7d
@msimerson msimerson updated URL to new github repo 71668eb
@msimerson msimerson applied greylisting NFSLock patch
Issue #1 on Google issue tracker. The patch was 'accepted' by Ask in 2007, but never applied.
324fcd0
@abh

If we're going to standardize, how about #!/usr/bin/env perl? (Though I guess in the plugins it's more for editors to detect than anything).

Owner

I agree with Matt. #!/usr/bin/env causes more problems than not, and most of the plugins aren't built to be standalone programs anyway. We're jumping through this hoop mostly for editor autodetection -- these files are mostly not +x anyway. Today, some files have emacs specific headers, others have vi footers -- the one thing that works for both is /#!.*perl/ -- so we could just use #!perl and it would be consistent and work.

Owner

Debates about which is the best shebang to use have gone on for years. Rather than debate further, can we agree that having the standard perl shebang in place is better than the current hodge podge?

Except these aren't stand alone scripts. I'm kind of fond of #!perl (for this case), as it gets you the syntax highlighting without making it look like a script.

Owner

My preference is anything that's consistent, behaves well with test suites, and that my editor (vim) recognizes. The standard shebang is definitely better. #!perl is just as good for consistency and editor recognition. If qpsmtpd has a future on CPAN, then using #!perl in the plugins is likely to cause some interesting test failures.

I have a great idea so that we can all get what we want. Merge in my commit as is. Then, Ask uses sed or a perl one liner to change all the plugin shebangs to #!/usr/bin/env perl and commits it. Then Robert uses sed or a perl one liner and changes them all to #!perl and commits that.

Owner

Committing that change before this one creates a lot of needless busy work for me. It's more than a little disheartening.

abh replied Apr 29, 2012

Sorry about that. I was going to say that it should be easy to get your git repository "in sync" again, but I realize the problem (the busy work) is that this commit had a bunch of different cleanups mixed in.

One thing I often use (virtually for every commit actually) is to use 'git add -p' to stage the commit; that way I can take partial changes from each file to make it easier to make the commits "per change" rather than "per file", if that makes sense.

Give me a second and I will fix it so you can easily rebase your branch.

abh replied Apr 29, 2012

Matt, can you try rebasing your branch off the upstream master now?

I put these changes in there now:

smtpd@dbaa9db

Probably do a 'git rebase -i HEAD~10' and delete this commit from your branch first as it'll otherwise conflict.

abh replied Apr 29, 2012

Actually, if you just do

git rebase upstream/master (or smtpd/master or whatever you call the upstream remote)

and then when it conflicts on this commit, do:

git rebase --skip

then it should take you a bit further.

The next conflict is about the plugins/check_badmailfrom_patterns / plugins/check_badmailfrom change which is easy to resolve. I put up my version if your rebased branch on https://github.com/abh/qpsmtpd/tree/matt

@abh

Moving the use line makes sense, but better move all of them then, no?

Owner

What I really want here is lazy loading of both of those dependencies, but I'm not sure how best to accomplish that. Suggestions are welcome, and I'll follow up with another patch.

abh replied Apr 8, 2012

If you're trying to get lazy loading then you want to use require and an explicit import for the Digest module (or just use Digest::HMAC_MD5::hmac_md5_hex() explicitly). I think using require would be different than how it's done most everywhere else in the code base though.

Owner
msimerson added some commits Apr 8, 2012
@msimerson msimerson suppress log error when $user unset
test for and return earlier when a null sender is encountered.
Prevents using an undefined variable.
b50513b
@msimerson msimerson bump RAM from 50 to 75MB
necessary on my FreeBSD 8 amd64 system. I'm guessing higher requirements
will be the norm on 64 bit systems.
00d139a
@msimerson msimerson Merge branch 'master' of git://github.com/smtpd/qpsmtpd a5dd62c
@msimerson msimerson imported check_qmail_deliverable
requires Qmail::Deliverable, which is a modern replacement for the
apparently popular but no longer available check_deliver plugin.
c1a16f3
@msimerson msimerson refactored, added per-user SA prefs, X-header fixes
* refactored for ease of maintenance
* added support for per-user SpamAssassin preferences
* updated get_spam_score so that score=N.N works (as well as hits=N.N)
* rewrote the X-Spam-* header additions so that SA generated headers are
  not discarded. Admin can alter SA headers with add_header in their SA
  config. Subverting their changes there is unexpected. Making them read
  code to figure out why is an unnecessary hurdle.
* added assemble_message, so we can calc content size which spamd wants
8d643cc
@msimerson msimerson improve grammar, update logging instructions
updated instructions for setting loglevel to use config/loglevel instead
of editing lib/Qpsmtpd to set $TraceLevel
9a86351
@msimerson msimerson whitespace, copyright bump, simplify logic df1c6ba
@msimerson msimerson fixed POD formatting 30d3c63
@msimerson msimerson added comments to logging config files 5a6340e
@msimerson msimerson added tls comments to config/plugins 9944424
@msimerson msimerson added RAM notes to run file d1c6b59
@msimerson msimerson fixed spelling error, added spf code to notes d35faad
@msimerson msimerson added option to skip SA when user is authenticated e51fb74
@msimerson msimerson added dspam plugin d488521
@msimerson msimerson added 'reject agree' option, check for spamassassin note
When 'reject agree'  is set, dspam will only reject emails if both SpamAssassin and dspam agree that the message is spam.

dspam requires a fair bit of training. It seems to have a fairly high initial false positive rate, and this option allows dspam to reject most spam while allowing it's own FP hits through.

also cleaned up some whitespace and moved header parsing into their own methods.

SpamAssassin checks for presense of spamass note before retrieving the header and parsing it.
e593c60
@msimerson msimerson SpamAssassing plugin now saves results in a note 2896fc6
@msimerson msimerson make sure $hook is defined before printing it
This prevents error messages about $hook being undefined in the logs
71e0490
@msimerson msimerson reduce the logging from 2 to 1 lines per recipient d9bd5ad
@msimerson msimerson changed two log calls from INFO to DEBUG (more appropriate) 1f1b353
@msimerson msimerson don't print GeoIP country if not defined
If we don't get a result from the lookup, all we know is that we didn't get a result. Maybe an error, maybe the IP not in the database.
32c4396
@msimerson msimerson prepend auth_flat plugin name to $note
Makes it much easier to figure out where that log entry came from.
a6d5cd4
@msimerson msimerson changed spamassassing plugin note to plugin name bc06d4f
@msimerson msimerson merged check_badmailfrom_patterns into check_badmailfrom 85e31eb
@msimerson msimerson fixed POD type s/te/the/ 5627bf5
@msimerson msimerson refactored p0f plugin, added p0f v3 support 50e38fc
@msimerson msimerson rewrote auth plugins so log messages are consistently prefixed
most plugins had plugin name in the messages somehow. I just made them more consistent by prefixing the message with the plugin name and shortened message.
ef7d53b
@rspier

This boilerplate feels unnecessary if we're making this part of the core.

Owner

good point. I'll fix that right now.

@rspier

missing h?

Owner

yes, that is corrected in a subsequent commit

There's some git magic you can do to merge those into one commit.

Owner

Aye, but that's only if I haven't pushed the commits upstream. Since I've already done that, altering those past commits is a no no. Unless you know something I don't.

@rspier
rspier commented on run in d1c6b59 Apr 29, 2012

This is really any 64bit platform... but we've already increased the ram to 75MB. Is this comment still meaningful?

@rspier
rspier commented on run in d1c6b59 Apr 29, 2012

Maybe better to replace this with something generic? "Some modules/plugins may require more RAM" or something.

Owner

For the person setting up qpsmtpd, a generic message isn't much more helpful than saying nothing at all. The point of adding a comment in the run file is to provide some clues as to where a reasonable starting point is, and the largest factors affecting it. Knowing that the domainkeys plugin requires an extra 40MB of RAM is very helpful. Especially because everything appears to be running along swimmingly, until I noticed the failures in the logs when domainkeys signed messages arrived. Not exactly a graceful failure, and it was far too easy to overlook. The 75MB limit I committed earlier was sufficient on non-signed messages. To get signed messages, I need at least 100MB on my server and 60MB without domain key signatures.

I think any other plugins that impose a RAM footprint that heavy should also be listed in there as well.

@rspier

I don't know who Alan is, but lets use a fake address?

Owner

Oops, a client of mine. I can fix that easy enough in another commit, but I can't imagine how to make that disappear from github history without a lot of pain. Ideas?

I think this is where feature branches and squashed commits come in. You can also do fun things with interactive rebase. Then github forgets about what it knows now, but you may need to do a ff push. This is the edge of my git knowledge.

Owner

I just read about squashed commits, and the instructions have warnings like, "if you’ve already pushed your commits to a central repository, you shouldn’t change previous commits retroactively."

Owner

Are you sure? I have squashed the commits in my local repo and rebased to master. But there's nothing to push to github as "everything is up-to-date". The only references I've found to squashing and github says, "Remember, if you squash commits you’ve already pushed to GitHub, you won’t be able to push that same branch again."

Owner
Owner
abh replied Apr 29, 2012

If it says everything is up to date, then I don't think the history is different. It'd give a different rejection message (without -f) if the branch was different.

A quick way to test would be to make a new branch and push that to github:

git checkout -b master2
git push -u origin master2:master2
git checkout master

Or you can see what commits are different between github and your current HEAD/master:

git fetch origin
git shortlog origin/master...HEAD

(three dots mean show logs "both ways")

Ask

abh replied Apr 29, 2012

I pulled it in (fixed up as one commit).

msimerson added some commits Apr 29, 2012
@msimerson msimerson remove boilerplate from check_qmail_deliverable 41014f7
@msimerson msimerson anonymise dspam plugin note 2af7cb3
@msimerson msimerson added pod DESC to dont_require_anglebrackets dc6c846
@msimerson msimerson added dspam plugin
anonymise dspam plugin note
37bb6c0
@msimerson msimerson added 'reject agree' option, check for spamassassin note
When 'reject agree'  is set, dspam will only reject emails if both SpamAssassin and dspam agree that the message is spam.

dspam requires a fair bit of training. It seems to have a fairly high initial false positive rate, and this option allows dspam to reject most spam while allowing it's own FP hits through.

also cleaned up some whitespace and moved header parsing into their own methods.

SpamAssassin checks for presense of spamass note before retrieving the header and parsing it.
9974f0a
@msimerson msimerson SpamAssassing plugin now saves results in a note 9451e55
@msimerson msimerson make sure $hook is defined before printing it
This prevents error messages about $hook being undefined in the logs
bafd99f
@msimerson msimerson reduce the logging from 2 to 1 lines per recipient eda4937
@msimerson msimerson changed two log calls from INFO to DEBUG (more appropriate) 0bf8b7e
@msimerson msimerson don't print GeoIP country if not defined
If we don't get a result from the lookup, all we know is that we didn't get a result. Maybe an error, maybe the IP not in the database.
25ce760
@msimerson msimerson prepend auth_flat plugin name to $note
Makes it much easier to figure out where that log entry came from.
24ed765
@msimerson msimerson changed spamassassing plugin note to plugin name e9498b1
@msimerson msimerson merged check_badmailfrom_patterns into check_badmailfrom b962456
@msimerson msimerson fixed POD type s/te/the/ 90d73fe
@msimerson msimerson refactored p0f plugin, added p0f v3 support 7a5a5f9
@msimerson msimerson rewrote auth plugins so log messages are consistently prefixed
most plugins had plugin name in the messages somehow. I just made them more consistent by prefixing the message with the plugin name and shortened message.
6bd258e
@msimerson msimerson remove boilerplate from check_qmail_deliverable c431d35
@msimerson msimerson added pod DESC to dont_require_anglebrackets b7ca6dc
@msimerson msimerson Merge branch 'master' of github.com:msimerson/qpsmtpd 1304497
@abh
abh commented on d1c6b59 Apr 29, 2012

I'd suggest just increasing the default to 150MB and be done with it.

Thoughts?

Owner

works for me. That should be sufficient to keep folks from running into silent errors in anything resembling a "normal" config.

@abh
abh commented on df1c6ba Apr 29, 2012

I picked this into upstream; with the t corrected. :-)

@abh
abh commented on c1a16f3 Apr 29, 2012

We support plugins with Full::Proper::Names, no? So Qmail::Deliverable should distribute the qpsmtpd plugin as Qmail::Deliverable::Qpsmtpd or some such, and then that can just be configured.

abh replied Apr 29, 2012

(in other words, the better fix would be a patch to Juerd).

Owner

Why wasn't check_delivery every imported? It was on the TODO and now its gone.

Why do we want to extend qpsmtpd via a plugin that's delivered via a mechanism other than a file in the plugins directory?

abh replied Apr 29, 2012

Why not distribute "extra" plugins via CPAN? If I could do it over, all non-essential plugins should have been CPAN modules and we'd have made that work well.

Owner

I see no reason why not. But that's not the way it happened. I'm trying to gain a little bit of insight, and hopefully avoid wasting time and frustration in the future.

abh replied Apr 29, 2012

It was hard to move in that direction when we didn't start that way, and because each extra plugin often is just a few dozen lines. Seems crazy to make a CPAN distribution for that, right? However in this case there already is a (required) CPAN distribution. It'd be much saner to maintain the plugin code with the other required bits.

I don't believe we actually support Plugin::Modules right now. We treat the plugins as text, read them in, add some boilerplate, and then eval them to load them. But I do like the idea of plugins as modules -- but I think it would need some glue. Look around line 160 of Plugin.pm for the relevant code.

abh replied Apr 29, 2012

_load_plugin in Qpsmtpd.pm is what has support for it; not sure if it still works though but it ought to.

@abh
abh commented on 2af7cb3 Apr 29, 2012

I'll change this to user@example.com.

@abh
abh commented on 50e38fc Apr 29, 2012

picked into upstream.

msimerson added some commits Apr 29, 2012
@msimerson msimerson merged with smtpd/master 7579862
@msimerson msimerson Merge remote-tracking branch 'smtpd/master' f7c9b08
@msimerson msimerson fix grammar in checkpassword comment e61c41d
@msimerson msimerson improved POD language describing vpopmail auth methods 48a185e
@msimerson msimerson detect mysql connection failure in auth_vpopmail_sql
moved 'use ...' up their rightful place
8183ee7
@msimerson msimerson auth_vpopmaild: improved logging, disable cram 27d43bb
@msimerson msimerson only run tests if QPSMTPD_DEVELOPER env is set 18d30d9
@msimerson msimerson converted check_relay comments to POD 0b4f0f5
@msimerson msimerson dspam plugin, added 'reject agree' option, and more
only reject emails if both SpamAssassin and DSPAM agree it's spam (until dspam is trained, SA has a much lower FP rate)
refined POD
added MySQL InnoDB POD
cleaned up trailing whitespace
rewrite X-DSPAM-Result header (in case of preexisting one)
only retrieve first line of dspam output (efficiency gain)
05e2c02
@msimerson msimerson geoip records country name (in addition to code)
Helpful for humans who haven't memorized all the 2 char ISO country codes
1d9f6a8
@msimerson msimerson added initial author attribution of SPF plugin 95e5fc8
@msimerson msimerson spamassassin plugin, use per-user threshold for header rewriting
also, don't perform an lc(leave_old_headers) unless that arg is actually defined
c61efc1
@msimerson msimerson fixed occassional Auth error messages due to undefined $user
and fixed a typo
b4af028
@msimerson msimerson added p0f upgrade notes to Changes 91138b6
@msimerson msimerson spamassassin plugin tests and updates
parsing of the X-Spam-Status header now uses split instead of a regexp. Mining a collection of mail headers revealed that the regexp was not as robust as it could be. Now the parsing should work for everyone, regardless of how they configure SpamAssassin.

the spamassassin note is now a hashref (was : delimited fields). More data is now available, its more easily accessible (no unpacking required), and its extensible so it won't require future modification. I just added the spamassassin note feature so it's unlikely this change will hurt anyone.

Added logging for many more error and status conditions.

renamed reject_threshold option to 'reject', to be consistent with other plugins that have a 'reject' option.
e91fab5
@msimerson msimerson tests for the spamassassin plugin 1e7b70d
@msimerson msimerson dspam plugin updated for SA note parsing
spamassassin note is good enough now that dspam has no need to parse headers. Yay.
121a481
@msimerson msimerson dspam plugin tests
much better logging of disposition
replaced header parsing regexp with split. faster and more robust.
store the dspam results in a note
quiet some of the debug logging
59bdce5
@msimerson msimerson added compatibility with previous config syntax 833c862
@rspier
Member
rspier commented May 1, 2012

Is this pull request supposed to still have 68 commits in it?

@msimerson
Member

I've learned something about pull requests. Pull requests are against a branch. And they aren't against a branch at the moment of the pull request. So long as the pull request remains open, all new commits get added to it.

I have taken your advice and started using git branches. I have some spamassassin & dspam plugin changes in the spamassassin branch. I've got another for logging related changes. And other for the reworked require_resolvable_fromhost plugin. It seems that's not just a way to use git, it's The Way.

@rspier
Member
rspier commented May 1, 2012

What does that mean about this pull request?

@rspier
Member
rspier commented May 1, 2012

If you rebase this against the upstream master, then it should only have the differences between the qpsmtpd/master head and the spamassasin changes you're making? And hopefully that's still pushable? Not sure. I may be giving bad advice.

@msimerson
Member

Well now, that's interesting. I made those commits to the 'spamassassin' branch. Apparently when I merge them into my master branch, they become part of the master pull request. Oh dear. I have much to learn about git and github. Here's what github is saying, "You can add more commits to this pull request by pushing to the spamassassin branch on msimerson/qpsmtpd"

@msimerson
Member

I'm not sure either. I've had very poor luck with rebasing. It seems there's something fundamental about it that I just haven't grasped yet.

@rspier
Member
rspier commented May 1, 2012

Maybe Ask will chime in. I've found that rebasing early and often helps
keep things clean, but I know it can also cause troubles if you do it after
you push.

On Mon, Apr 30, 2012 at 10:42 PM, Matt Simerson <
reply@reply.github.com

wrote:

I'm not sure either. I've had very poor luck with rebasing. It seems
there's something fundamental about it that I just haven't grasped yet.


Reply to this email directly or view it on GitHub:
#10 (comment)

@msimerson msimerson closed this May 5, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment