Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mail collector #40

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Lacrymology
Copy link
Contributor

The beginning of a collector for the mail spool. For now collects only one vhost, and only the count of emails of each user, and an aggregated total. I will open an issue to discuss some things to add to it

config_help.update({
'spool_path': ('Path to mail spool that contains files to be '
'analyzed.'),
'mailbox_prefix': ('Prefix to add to the metric name. Use in case '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't they just set path to be mail.blah ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fixed that in our internal version. Will push the changes next week

@Lacrymology
Copy link
Contributor Author

I rebased onto current master and changed the way prefixes are handled.

@kormoc : the reason why just setting name isn't enough for multiple-domain setups is that the single collector config should handle all the domains. I think this way is right, and it supports not using the prefixes at all for single-vhost cases.

@sbrynen
Copy link

sbrynen commented Jan 6, 2015

This should probably be called the filecount collector, given what it does. All it's doing it counting the number of real files in a directory tree. If you used maildir this would count messages, if you used mbox it would count the number of mailboxes.

However, I can definitely see the use for this elsewhere outside of mail (number of files in a cache directory, number of log files etc etc); whereas you can just change the dir and prefix if you want to make it specific for mail.

@mzupan
Copy link
Member

mzupan commented Jan 6, 2015

I agree with @sbrynen

With the new 4.0 release and separate processes I can see value in having a generic file counting collector that you can have multiple instances per machine if needed.

@Lacrymology
Copy link
Contributor Author

what? no it doesn't.. mbox opens a mailbox file and len(mbox) counts the amount of emails inside that file. Look:

In [1]: import mailbox

In [2]: mbox = mailbox.mbox('/var/mail/vhosts/robotinfra.com/tomas')

In [3]: len(mbox)
Out[3]: 1730

[12:45:34] root@integration-tomas:~ ls -l /var/mail/vhosts/robotinfra.com/tomas 
-rw------- 1 dovecot-agent mail 6490449 Dec 22 23:50 /var/mail/vhosts/robotinfra.com/tomas

@Lacrymology
Copy link
Contributor Author

I do agree that this should be modified to be running several instances rather than the quite silly prefixes hack, though. I didn't know of this new possibility (and I still don't quite know how to set it up)

@sbrynen
Copy link

sbrynen commented Jan 6, 2015

Sorry, I was thinking at it from a maildir point of view (where every single email is a separate file); but what you have still has specific uses as a generic counter against files.

Right now the counter is opening a file, scanning it, finding all the (completely) blank lines and adding one. The same logic would be highly useful for all sorts of things.

@Lacrymology
Copy link
Contributor Author

I guess, but it is using a library that reads a specific and standard email format and other email-specific formats could be added.

I didn't know that's what the mailbox standard defined as a new email, and while you're right about counting blank lines being useful.. I don't know, I have some feelings assuming some things about the mail format.

if that's what you want, all you really need to do is: grep '^$' -rc /var/mail/$user, or in python:

with open(uname) as f:
    count = f.readlines.count(os.linesep)

but I think your own example proves that this is more useful as a generic email counter:

[MailCollector]
format = maildir

###

mboxpath = '/var/mail/%s' % uname
if self.config['format'] == 'maildir':
    mbox = mailbox.Maildir(mboxpath)
elif self.config['format'] == 'mailbox':
    mbox = mailbox.mbox(mboxpath)

self.report(metricname, len(mbox))

This not counting that you can add more useful metrics to this collector, and logic to parse the different inboxes, imap folders, etcetera that I didn't need or couldn't really think of right now. You're right that all or most of these other metrics could be created from combining other more generic collectors, though. I don't know what's preferable, in this case, ease and clarity for the user who wants to count emails, or flexibility at the cost of a more convoluted configuration

@Lacrymology
Copy link
Contributor Author

I'm all for flexibility > usability, generally, but for a user to think of using the "blank lines counter" collector in order to count emails would require them to know the mailbox format details

@bmhatfield
Copy link
Contributor

@sbrynen @mzupan I think I fall a little bit more towards @Lacrymology - generalizing overly specific collectors is a good goal, but the collector in question doesn't feel like that case. There's a mailbox module in Python, and reviewing the code, this feels like a nice "ready to go" collector, IMO.

@mattrobenolt - I'm thinking of merging this, as it's sat around for a while. Do you want to weigh in before I do?

@jaingaurav
Copy link
Member

Hey @Lacrymology, any chance you can add some test cases before I merge this in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants