Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline parallelism #101

Closed
foxcpp opened this issue Jun 29, 2019 · 7 comments · Fixed by #108
Closed

Pipeline parallelism #101

foxcpp opened this issue Jun 29, 2019 · 7 comments · Fixed by #108
Assignees
Labels
performance Issue regarding suboptional performance. pipeline Related to message pipeline code. rfc Request For Comments (ongoing discussion / research needed).

Comments

@foxcpp
Copy link
Owner

foxcpp commented Jun 29, 2019

Most pipeline steps do network I/O (DNS, mostly) so sequential execution is very slow. But serial execution of pipeline directives is actually its feature: It is very flexible but still simple model, a minimal domain-specific language, roughly speaking. I think we want to keep it. Instead, add a meta-directive "parallel" to indicate that some block should be parallelized (think of pre-delivery filters/checks). What do you think? Or, perhaps it is possible to redesign it from scratch with parallelism in mind, I'm not sure how to do it though.

@foxcpp foxcpp added performance Issue regarding suboptional performance. pipeline Related to message pipeline code. rfc Request For Comments (ongoing discussion / research needed). labels Jun 29, 2019
@emersion
Copy link
Collaborator

Bleh, I'm not a fan of having users tell which steps are parallel.

@foxcpp
Copy link
Owner Author

foxcpp commented Jul 17, 2019

So I'm thinking about replacing the "scriptable" pipeline with fixed groups of checks.

If more advanced handling is required - it should be done by external programs.

Basically, a group of checks to run on MAIL FROM, RCPT TO, DATA and ability to select delivery targets based on the message recipient.

If maddy is about simplicity - it's all it should do. This is way simpler than a domain-specific language.

I will prepare more complete description of my proposal when I will get home (in 10 days).

@emersion
Copy link
Collaborator

+1, looks like a good solution to me

@foxcpp
Copy link
Owner Author

foxcpp commented Jul 29, 2019

Goals

  • Replace unnecessarily complicated (and inefficient) "pipeline" module.
  • Implement atomic delivery to multiple targets. This is important to avoid duplicate messages on partial failure.
  • Check and return errors to clients as early as possible (if the recipient doesn't exist - RCPT TO should fail).
  • Run all actions that require I/O in parallel where possible.

DeliveryTarget interface

type DeliveryTarget interface {
  Start(ctx *DeliveryContext, mailFrom string) (Delivery, error)
}
type Delivery interface {
  AddRcpt(to string) (string, error)
  Body(r io.Reader) error

  Abort() error
  Commit() error
}

There is a problem - "Commit" still can partially fail when multiple targets are used. Though, it is assumed to be rare enough to be ignored. DeliveryTarget implementation should verify the possibility of delivery before Commit.

Dispatcher module

Decides the set of targets that should handle the message and all checks that should be executed.
Implements DeliveryTarget interface.

Dispatching

Each message recipient is associated with one or more delivery targets.

Recipients that don't have any targets associated after dispatching decisions are handled by default directive in config (usually, it will contain reject directive for SMTP and queue + remote for Submission).

Per-recipient dispatching is done using the destination directive which is similar to the destination step in the current implementation. Duplicate domains are not allowed and order does not matter. I leave regexp support out (for now) due to certain possible issues.

destination example.org example.com {
  deliver_to sql
}
# Takes priority over the block above.
destination foobar@example.org {
  deliver_to remote {
  forward_to smtp://127.0.0.1:26
  }
}
# Duplicate block for example.org. Invalid.
destination example.org {
}

Per-sender dispatching is done using the source directive that works the same way. source can contain a nested destination block but the opposite is not true for simplicity reasons.

Checks

All checks are executed in parallel. Additional checks can be added on the per-sender or per-recipient basis. If one check fails - others are canceled using context.Context.

Due to the per-message "internal" state requirements (see the milter protocol for the example) and general nature of "generic" checks like hypothetical milter module I decided to drop check groups idea and just put them together. Convenience wrapper like FunctionFilter can be added to remove boilerplate from simple checks.

type Check interface {
  NewMessage(ctx *DeliveryContext) (MessageCheck, error)
}

type BodyBuffer interface {
  Open() (io.ReadCloser, error)
  // TODO: Make modifications to the body possible.
}

type MessageCheck interface {
  // DeliveryContext contains all necessary information, hence no arguments.
  CheckConnection(ctx context.Context) error
  CheckSender(ctx context.Context, mailFrom string) error
  CheckRcpt(ctx context.Context, rcptTo string) error

  // Side note: Header field is removed from the DeliveryContext structure.
  CheckBody(ctx context.Context, headerLock *sync.RWMutex, header textproto.Header, body BodyBuffer) error

  // Called when message processing ends even if any of the check functions failed.
  Close() error
}

Modificators

Executed after checks.
Have roughly the same interface except that they can rewrite recipient and sender information freely.

They are executed serially to make the final effect predictable.

Config example

smtp ... {
  target dispatcher {
      checks {
          require_tls
          check_source_rdns
          verify_dkim
      }
      modify {
          alias_sql {
              driver ...
              dsn ...
          }
          # Plus-address notation.
          rewrite "(.*)\+.+@example.com" "$1@example.com"
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          reject
      }
  }
}
submission ... {
  target dispatcher {
      checks {
          require_tls
          require_auth_sender
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          modify {
              dkim_sign
          }
          deliver_to queue { ... }
      }
  }
}

@foxcpp foxcpp added this to the 0.1 milestone Jul 30, 2019
@foxcpp
Copy link
Owner Author

foxcpp commented Jul 30, 2019

Probably it is worth simply inlining configuration for the dispatcher logic into SMTP/Submission modules.

submission ... {
    check {
        require_tls
        require_auth_sender
    }
  
    destination example.com {
        deliver_to local
    }
  
    default {
        modify {
            dkim_sign
        }
        deliver_to queue { ... }
    }
}

instead of

submission ... {
  target dispatcher {
      checks {
          require_tls
          require_auth_sender
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          modify {
              dkim_sign
          }
          deliver_to queue { ... }
      }
  }
}

@foxcpp
Copy link
Owner Author

foxcpp commented Jul 30, 2019

Another improvement. default is not global but per-dispatching decision level. I.e.:

source example.org {
  ...
}
source example.com {
  ...
  destination example.org {
    ...
  }
  # Messages not matched by dest. blocks are handled according to this block.
  default {
    ...
  }
}
# Messages not matched by source blocks are handled according to this block.
default {
  destination ... {
    ...
  }
  default {
    # not matched by any dest. or src. block
  }
}

If block doesn't contain any dispatching directives (source, destination, default) its contents are interpreted as if they were inside an default block. I.e.:

source ... {
  ...
}
default {
  reject
}

is equivalent to

source ... {
  ...
}
default { # default source
  default { # default recipient
    reject
  }
}
source example.org {
  destination ... {
    ...
  }
  default { # default recipient handler when message comes from example.org
    reject
  }
}
default { # default source & recipient handler for non-example.org senders
  reject 
}

is equivalent to

source example.org {
  destination ... {
    ...
  }
  default { # default recipient handler when message comes from example.org
    reject
  }
}
default { # default source handler
  default { # default recipient handler for non-example.org senders
    reject 
  }
}

@foxcpp
Copy link
Owner Author

foxcpp commented Jul 30, 2019

And oh, I forgot that we have custom Resolver interface, then check interface should be adjusted as follows:

type Check interface {
  NewMessage(ctx *DeliveryContext, resolver Resolver) (MessageCheck, error)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue regarding suboptional performance. pipeline Related to message pipeline code. rfc Request For Comments (ongoing discussion / research needed).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants