Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mail tracker #43

Merged
merged 18 commits into from
Mar 11, 2019
Merged

Mail tracker #43

merged 18 commits into from
Mar 11, 2019

Conversation

multun
Copy link
Contributor

@multun multun commented Dec 19, 2018

A prototype mail tracker implementation, along with an IMAP worker.

/!\ The default is to queue and delete unseen emails, don't run this on your own mailbox /!\

The worker can be started this way :

python -m todo.mail_worker --type imap \
    --group 'group name' --task-slug mail-queue \
    --host imap.example.com --user test@example.com --password <(echo foobar123)

TODO:

  • a formatting pattern for the name of newly created tasks
  • task merging / comment moving (in case a comment gets added to the wrong task)
  • tests :'(

Copy link
Owner

@shacker shacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good ideas here. Some significant changes requested here, and we've discussed in the issue whether the EMail model is required at all. Some feedback provided here to keep the process rolling.

@@ -0,0 +1,196 @@
import email
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this worker should be written as a django management command rather than as a standalone python script that has to import django manually. I am thinking of a similar arrangement with a task runner called django-q, where you start the process ./manage.py qcluster in a separate terminal, or as a separate process on a server). That arrangement is just more django-like, and has the benefit of having access to the rest of the framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good idea



def imap_connect(config):
conn = imaplib.IMAP4_SSL(host=config.host, port=config.port)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a management command, this can be reconfigured to pull from the existing django email settings (or custom ones if needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't django settings for sending emails ?
what would be the format of this ? as we can't start all workers that would be configured in the django config, is it actually a good idea ?

Copy link
Contributor Author

@multun multun Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, maybe a Dict[worker_name, worker_settings] would do the trick

Copy link
Owner

@shacker shacker Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't django settings for sending emails ?

The native ones are - this would be a new setting you'd introduce into django-todo. But the code would still pull the actual credentials from django settings file or env vars. So:

TODO_EMAIL_SERVER = [fqdn]
TODO_EMAIL_BOX= "a@b.com"
TODO_EMAIL_PASS = "XYZ"
...

as we can't start all workers that would be configured in the django config

The worker would not start automatically when django starts - it has to be started manually or during deploy. But it's still nicer for consistency not to have non-Django scripts sitting around. This should be a good citizen of the framework.

logger.warning('missing "Subject" header, ignoring message')
return

logger.info('received message:\t[Subject: %s]\t[Message-ID: %s]\t[To: %s]\t[From: %s]',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use .format() or better, f-strings. This python 2.x-style positional token substitution is hard to read.

Copy link
Contributor Author

@multun multun Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate it too, but it's all the logging module supports for delayed formatting
PS: I'll obviously change that, I don't think performance matters as much as readable code :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I did not realize that! Yeah not sure we need to be concerned about that performance detail, but good to know.


# find the most relevant task to add a comment on.
# among tasks in the selected task list, find the task having the
# most email comments in the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is unfinished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoopsie, thanks


related_messages = message.get('references', '').split()

# find the most relevant task to add a comment on.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most relevant? No, we must be certain. The subject line must contain the task ID, or a token that can be positively matched. And tests should prove that a comment can never be added to the wrong task.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, we look for the message_id in the thread, so in a sense this is a "best match without possibility of failure" - do I understand that correctly? In that case, cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's it

logger.info('created comment: %s', repr(comment))


def main(args=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would move into the handle method of a management command.


def main(args=None):
config = _parser().parse_args(args=args)
if config.password_file is not None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pull from django settings - no need for a separate settings file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no settings file. it's all command line, except the password : adding it at a command line parameter would enable anybody to get it once the worker is started.

the user will need to start the worker process outside django anyway, so what's the benefit of using django settings ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a Django project, we can assume that all settings are accessed via django.conf.settings . But here you've introduced:

        with open(config.password_file, 'rb') as pass_file:
            config.password = pass_file.read().decode()

So now we have a Django project with most settings come in via django.conf and and others through your custom config file, which is an extra setup and maintenance hassle for no reason. Your custom config file should not exist.

todo/models.py Outdated
@@ -60,19 +62,60 @@ class Meta:
ordering = ["priority"]


class EMail(models.Model):
# this attribute is there to enable having this weird unique_together constraint.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert comments to docstring if we keep this (though we discussed not needing the EMail model after all?)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the unique_together provides an improvement in mitigating this attack, but does not prevent it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it provides the required level of isolation at database level.
the rest is up to the worker

todo/models.py Outdated
class Comment(models.Model):
"""
Not using Django's built-in comments because we want to be able to save
a comment and change task details at the same time. Rolling our own since it's easy.
"""
author = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
# author and email are mutualy exclusive. If the comment was produced by an email, users make no sense
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should check during import whether an email matches a known user, and if so, assign the comment to the known user. We would only assign email here if we don't find a match.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud ... we could even go a step further and register a new user internally for each new unknown email. Then at some point in the workflow potentially provide the option to register them fully, and they would find their history of emails intact. Not sure we need that for first release, but could be an area where thinking ahead pays off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry but this sounds like a terrible idea. sometimes just having a django user is a privilege.
furthermore, emails still aren't really authenticated. this feature would enable anybody spoofing the source mail address to retrieve the full message history of an unauthenticated user.

It seems we should check during import whether an email matches a known user, and if so, assign the comment to the known user. We would only assign email here if we don't find a match.

Well that would enable anybody to make any user comment an existing issue, as long as the email of the user and a message-id from the thread is known, which doesn't sound acceptable.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in my recent comment on the issue - I think the right way to go for initiating a new ticket is for an authenticated user to start it from a form (which we already have). Then we have a message ID and all subsequent communication can hang off of that. That will be more secure and seems like standard operating procedure for support systems these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be hard to implement based on this initial work. It should just be about disallowing task creation + adding an initial_message_id attribute to the task model.

However, well, I wouldn't use this feature :p

todo/models.py Show resolved Hide resolved
@multun multun force-pushed the mail_tracker branch 2 times, most recently from 0866fbb to c610876 Compare December 21, 2018 22:24
@multun
Copy link
Contributor Author

multun commented Dec 21, 2018

@shacker it should be much better now: no more separate email model, a django management command, and everything in the settings

@multun multun changed the title [DO NOT MERGE] Mail tracker Mail tracker Dec 21, 2018
Copy link
Owner

@shacker shacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good @multun . Thanks for all the hard work on this! Some fairly minor change requests here. Beyond that, can you add a section on "Accepting tickets and responses via email" to the main README? (full documentation on using the feature). I admit the existing tests in todo are scant, but I do want to be firm about requiring them in todo 2.x . Do you think you can come up with a couple-three tests?

todo/mail/producers/imap.py Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
task = models.ForeignKey(Task, on_delete=models.CASCADE)
date = models.DateTimeField(default=datetime.datetime.now)
email_from = models.CharField(max_length=320, blank=True, null=True)
email_message_id = models.TextField(blank=True, null=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank=True is OK but we should never set null=True on text fields in Django (though I admit I've wrestled with cases where that advice doesn't work out great, and the current models in todo already violate this recommendation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ok, will do

todo/models.py Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
@shacker
Copy link
Owner

shacker commented Dec 22, 2018

Just went to run the worker locally for the first time, with ./manage.py mail_worker myworker but it depends on the existence of settings.TODO_MAIL_TRACKERS, which looks like it needs to be a dictionary. I assume that will be documented, but for now can you provide an example of how this setting should be configured?

@multun
Copy link
Contributor Author

multun commented Dec 23, 2018

Whoops sorry about that:

from todo.mail.producers import imap_producer
from todo.mail.consumers import tracker_consumer

TODO_MAIL_TRACKERS = {
    'test_tracker': {
        'producer': imap_producer(
            host='imap.example.com',
            user='user@example.com',
            password='password',
        ),
        'consumer': tracker_consumer(
            group='mail queuers',
            task_list_slug='mail-queue',
            priority=1,
        )
    }
}

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        '': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
    },
}

The mail tracker uses the logging module, so it needs to be configured inside django.

@multun
Copy link
Contributor Author

multun commented Dec 31, 2018

@shacker alright, I've got an idea : how about setting up email notifications so that answering there links the responses to the task ?

That way, it would make sense to enable disabling task creation when unknown emails are received.

It would also be nice to add a button sending an email notification, so that users that didn't get notified for the task are still able to attach emails to the task.

For reference, here's how the mail headers from jira notifications look like:

Message-ID: <JIRA.{issue_id}.{created_ms}.{sequence_id}.{current_ms}@Atlassian.JIRA>
In-Reply-To: <JIRA.{issue_id}.{created_ms}@Atlassian.JIRA>
References: <JIRA.{issue_id}.{created_ms}@Atlassian.JIRA> <JIRA.{issue_id}.{created_ms}@net-services>

The message-id is unique per message, but can just be ignored, as the thing yhat enables referencing the thread is the content of the first references header / the content of the in-reply-to header

@shacker
Copy link
Owner

shacker commented Jan 7, 2019

how about setting up email notifications so that answering there links the responses to the task ?

+1 from me!

@multun multun force-pushed the mail_tracker branch 8 times, most recently from 036b04d to 3908a9c Compare February 12, 2019 16:58
@multun
Copy link
Contributor Author

multun commented Feb 12, 2019

@shacker This branch should be ready now !

It also brings some UI improvements. I can't really move these to another branch as they also involve mail tracking (I can still work something out if you insist)

Thanks for your review and advices !

@multun
Copy link
Contributor Author

multun commented Mar 3, 2019

@shacker do you think you'll have time to review this anytime soon ?

Copy link
Owner

@shacker shacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This is huge, thanks - looking really good. A handful of questions and comments, and then I'll do a full local trial and we'll be good to go.

todo/models.py Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
todo/models.py Show resolved Hide resolved
todo/utils.py Outdated Show resolved Hide resolved
todo/utils.py Outdated Show resolved Hide resolved
todo/utils.py Outdated Show resolved Hide resolved
Signed-off-by: Victor "multun" Collod <victor.collod@prologin.org>
A mail worker is a long running application. And sometimes, the IMAP server
just hangs for hours for no apparent reason. imaplib doesn't enable setting
a timeout, and setting it globally seems fine.
@multun
Copy link
Contributor Author

multun commented Mar 8, 2019

@shacker I think it should be ready now. As this branch brings a bunch of substantial changes (which should not affect too much existing installations, but I wouldn't be too confident about that), I would suggest adding a new major release.

Thanks again for your review !

Copy link
Owner

@shacker shacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@multun Thank you! We are good to go. This has been a huge effort and the largest contribution to todo, ever. Will be great to see some momentum around this feature. Thanks and cheers.

@shacker shacker merged commit c7ad961 into shacker:master Mar 11, 2019
@shacker
Copy link
Owner

shacker commented Mar 11, 2019

@multun Oops, I jumped the gun and merged a bit too soon. I am getting an error on startup:

(gtd-ieZ1wq35) [shacker@LMAC111704 gtd]$ mp check
Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/base.py", line 353, in execute
    output = self.handle(*args, **options)
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/commands/check.py", line 65, in handle
    fail_level=getattr(checks, options['fail_level']),
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/base.py", line 379, in check
    include_deployment_checks=include_deployment_checks,
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/management/base.py", line 366, in _run_checks
    return checks.run_checks(**kwargs)
  File "/Users/shacker/.local/share/virtualenvs/gtd-ieZ1wq35/lib/python3.6/site-packages/django/core/checks/registry.py", line 74, in run_checks
    "with the checks registry must return a list." % check)
AssertionError: The function <function dal_check at 0x10617b2f0> did not return a list. All functions registered with the checks registry must return a list.

I'd investigate and fix myself but the traceback does not point to an obvious place in the code. Thought you might have a quick fix handy?

@shacker
Copy link
Owner

shacker commented Mar 11, 2019

NM I figured it out and posted a fix.

575e264

@shacker
Copy link
Owner

shacker commented Mar 11, 2019

Though this makes me realize that HAS_AUTOCOMPLETE should probably be renamed TODO_HAS_AUTOCOMPLETE and be mentioned in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants