Pipeline parallelism #101

foxcpp · 2019-06-29T13:57:31Z

Most pipeline steps do network I/O (DNS, mostly) so sequential execution is very slow. But serial execution of pipeline directives is actually its feature: It is very flexible but still simple model, a minimal domain-specific language, roughly speaking. I think we want to keep it. Instead, add a meta-directive "parallel" to indicate that some block should be parallelized (think of pre-delivery filters/checks). What do you think? Or, perhaps it is possible to redesign it from scratch with parallelism in mind, I'm not sure how to do it though.

emersion · 2019-06-29T14:02:05Z

Bleh, I'm not a fan of having users tell which steps are parallel.

foxcpp · 2019-07-17T18:28:45Z

So I'm thinking about replacing the "scriptable" pipeline with fixed groups of checks.

If more advanced handling is required - it should be done by external programs.

Basically, a group of checks to run on MAIL FROM, RCPT TO, DATA and ability to select delivery targets based on the message recipient.

If maddy is about simplicity - it's all it should do. This is way simpler than a domain-specific language.

I will prepare more complete description of my proposal when I will get home (in 10 days).

emersion · 2019-07-17T18:29:58Z

+1, looks like a good solution to me

foxcpp · 2019-07-29T14:00:25Z

Goals

Replace unnecessarily complicated (and inefficient) "pipeline" module.
Implement atomic delivery to multiple targets. This is important to avoid duplicate messages on partial failure.
Check and return errors to clients as early as possible (if the recipient doesn't exist - RCPT TO should fail).
Run all actions that require I/O in parallel where possible.

DeliveryTarget interface

type DeliveryTarget interface {
  Start(ctx *DeliveryContext, mailFrom string) (Delivery, error)
}
type Delivery interface {
  AddRcpt(to string) (string, error)
  Body(r io.Reader) error

  Abort() error
  Commit() error
}

There is a problem - "Commit" still can partially fail when multiple targets are used. Though, it is assumed to be rare enough to be ignored. DeliveryTarget implementation should verify the possibility of delivery before Commit.

Dispatcher module

Decides the set of targets that should handle the message and all checks that should be executed.
Implements DeliveryTarget interface.

Dispatching

Each message recipient is associated with one or more delivery targets.

Recipients that don't have any targets associated after dispatching decisions are handled by default directive in config (usually, it will contain reject directive for SMTP and queue + remote for Submission).

Per-recipient dispatching is done using the destination directive which is similar to the destination step in the current implementation. Duplicate domains are not allowed and order does not matter. I leave regexp support out (for now) due to certain possible issues.

destination example.org example.com {
  deliver_to sql
}
# Takes priority over the block above.
destination foobar@example.org {
  deliver_to remote {
  forward_to smtp://127.0.0.1:26
  }
}
# Duplicate block for example.org. Invalid.
destination example.org {
}

Per-sender dispatching is done using the source directive that works the same way. source can contain a nested destination block but the opposite is not true for simplicity reasons.

Checks

All checks are executed in parallel. Additional checks can be added on the per-sender or per-recipient basis. If one check fails - others are canceled using context.Context.

Due to the per-message "internal" state requirements (see the milter protocol for the example) and general nature of "generic" checks like hypothetical milter module I decided to drop check groups idea and just put them together. Convenience wrapper like FunctionFilter can be added to remove boilerplate from simple checks.

type Check interface {
  NewMessage(ctx *DeliveryContext) (MessageCheck, error)
}

type BodyBuffer interface {
  Open() (io.ReadCloser, error)
  // TODO: Make modifications to the body possible.
}

type MessageCheck interface {
  // DeliveryContext contains all necessary information, hence no arguments.
  CheckConnection(ctx context.Context) error
  CheckSender(ctx context.Context, mailFrom string) error
  CheckRcpt(ctx context.Context, rcptTo string) error

  // Side note: Header field is removed from the DeliveryContext structure.
  CheckBody(ctx context.Context, headerLock *sync.RWMutex, header textproto.Header, body BodyBuffer) error

  // Called when message processing ends even if any of the check functions failed.
  Close() error
}

Modificators

Executed after checks.
Have roughly the same interface except that they can rewrite recipient and sender information freely.

They are executed serially to make the final effect predictable.

Config example

smtp ... {
  target dispatcher {
      checks {
          require_tls
          check_source_rdns
          verify_dkim
      }
      modify {
          alias_sql {
              driver ...
              dsn ...
          }
          # Plus-address notation.
          rewrite "(.*)\+.+@example.com" "$1@example.com"
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          reject
      }
  }
}
submission ... {
  target dispatcher {
      checks {
          require_tls
          require_auth_sender
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          modify {
              dkim_sign
          }
          deliver_to queue { ... }
      }
  }
}

foxcpp · 2019-07-30T13:11:55Z

Probably it is worth simply inlining configuration for the dispatcher logic into SMTP/Submission modules.

submission ... {
    check {
        require_tls
        require_auth_sender
    }
  
    destination example.com {
        deliver_to local
    }
  
    default {
        modify {
            dkim_sign
        }
        deliver_to queue { ... }
    }
}

instead of

submission ... {
  target dispatcher {
      checks {
          require_tls
          require_auth_sender
      }
  
      destination example.com {
          deliver_to local
      }
  
      default {
          modify {
              dkim_sign
          }
          deliver_to queue { ... }
      }
  }
}

foxcpp · 2019-07-30T14:27:56Z

Another improvement. default is not global but per-dispatching decision level. I.e.:

source example.org {
  ...
}
source example.com {
  ...
  destination example.org {
    ...
  }
  # Messages not matched by dest. blocks are handled according to this block.
  default {
    ...
  }
}
# Messages not matched by source blocks are handled according to this block.
default {
  destination ... {
    ...
  }
  default {
    # not matched by any dest. or src. block
  }
}

If block doesn't contain any dispatching directives (source, destination, default) its contents are interpreted as if they were inside an default block. I.e.:

source ... {
  ...
}
default {
  reject
}

is equivalent to

source ... {
  ...
}
default { # default source
  default { # default recipient
    reject
  }
}

source example.org {
  destination ... {
    ...
  }
  default { # default recipient handler when message comes from example.org
    reject
  }
}
default { # default source & recipient handler for non-example.org senders
  reject 
}

is equivalent to

source example.org {
  destination ... {
    ...
  }
  default { # default recipient handler when message comes from example.org
    reject
  }
}
default { # default source handler
  default { # default recipient handler for non-example.org senders
    reject 
  }
}

foxcpp · 2019-07-30T18:12:53Z

And oh, I forgot that we have custom Resolver interface, then check interface should be adjusted as follows:

type Check interface {
  NewMessage(ctx *DeliveryContext, resolver Resolver) (MessageCheck, error)
}

foxcpp added performance Issue regarding suboptional performance. pipeline Related to message pipeline code. rfc Request For Comments (ongoing discussion / research needed). labels Jun 29, 2019

foxcpp added this to the 0.1 milestone Jul 30, 2019

foxcpp mentioned this issue Jul 31, 2019

Performance of the pipeline code #76

Closed

foxcpp self-assigned this Jul 31, 2019

This was referenced Aug 2, 2019

Pre-DATA checks on envelope #54

Closed

New message dispatcher #107

Closed

foxcpp mentioned this issue Aug 24, 2019

New delivery interfaces and new message dispatcher #108

Merged

13 tasks

foxcpp closed this as completed in #108 Aug 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline parallelism #101

Pipeline parallelism #101

foxcpp commented Jun 29, 2019

emersion commented Jun 29, 2019

foxcpp commented Jul 17, 2019

emersion commented Jul 17, 2019

foxcpp commented Jul 29, 2019

foxcpp commented Jul 30, 2019

foxcpp commented Jul 30, 2019 •

edited

foxcpp commented Jul 30, 2019

Pipeline parallelism #101

Pipeline parallelism #101

Comments

foxcpp commented Jun 29, 2019

emersion commented Jun 29, 2019

foxcpp commented Jul 17, 2019

emersion commented Jul 17, 2019

foxcpp commented Jul 29, 2019

Goals

DeliveryTarget interface

Dispatcher module

Dispatching

Checks

Modificators

Config example

foxcpp commented Jul 30, 2019

foxcpp commented Jul 30, 2019 • edited

foxcpp commented Jul 30, 2019

foxcpp commented Jul 30, 2019 •

edited