Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: custom dispatch connection close #2445

Open
sshymko opened this issue Mar 22, 2019 · 4 comments
Open

Question: custom dispatch connection close #2445

sshymko opened this issue Mar 22, 2019 · 4 comments

Comments

@sshymko
Copy link
Contributor

sshymko commented Mar 22, 2019

Imagine a custom dispatch function based on an HTTP request message, such as request path, host, or cookies, etc. It would extract a request identifier from the request message and resolve it to a worker ID via the standard formula crc32($requestId) % $server->setting['worker_num'].

Naive implementation of dispatch by URL path:

$dispatcher = function (\Swoole\Server $server, $fd, $type, $data) {
    if (preg_match('/^GET\s+(?<path>[^\s]+)/', $data, $matches)) {
        $requestId = crc32($matches['path']);
    } else {
        $requestId = $fd - 1;
    }
    $workerId = $requestId % $server->setting['worker_num'];
    return $workerId;
};

$server = new \Swoole\Http\Server('127.0.0.1', 8080);
$server->set(['dispatch_func' => $dispatcher]);
$server->on('request', function ($request, $response) use ($server) {
    $response->header('Content-Type', 'text/plain');
    $response->end("Served by worker {$server->worker_id}\n");
});
$server->start();

There's a problem with the straightforward implementation - the dispatcher function is called 2 times for a single request:

  1. Data fetch $type == 10
  2. Connection close $type == 4

Unlike the 1st call, the 2nd call does not have access to the request data, so it cannot perform the same worker ID resolution as the 1st call to dispatch to the same worker process.

Revised implementation caching result of the 1st call:

class UrlPathDispatcher
{
    protected $dispatchMap = [];

    public function __invoke(\Swoole\Server $server, $fd, $type, $data)
    {
        if (isset($this->dispatchMap[$fd])) {
            $workerId = $this->dispatchMap[$fd];
        } else {
            $workerId = $this->dispatch($server, $fd, $type, $data);
            $this->dispatchMap[$fd] = $workerId;
        }
        if ($type == 4/*connection close*/) {
            unset($this->dispatchMap[$fd]);
        }
        return $workerId;
    }

    protected function dispatch(\Swoole\Server $server, $fd, $type, $data)
    {
        if (preg_match('/^GET\s+(?<path>[^\s]+)/', $data, $matches)) {
            $requestId = crc32($matches['path']);
        } else {
            $requestId = $fd - 1;
        }
        $workerId = $requestId % $server->setting['worker_num'];
        return $workerId;
    }
}

$dispatcher = new UrlPathDispatcher();

The implementation has become considerably more complicated and it now requires to maintain the state, so it cannot be a simple function anymore. It would be great to avoid overcomplicating the implementation of dispatchers by freeing them from handling the connection closure.

Conceptual questions:

  1. Do dispatch functions need to resolve all calls of the same request to the same worker ID?
  2. What would happen if the dispatch function returns different worker ID for connection close?
  3. Is the revised implementation above conceptually correct, particularly use of $fd as cache key?
  4. What is the recommended implementation approach?

Thanks in advance for responding to this ticket which is a question rather than an issue.

@sshymko sshymko changed the title Question: custom dispatch function behavior on connection close Question: custom dispatch function closing connection Mar 22, 2019
@sshymko sshymko changed the title Question: custom dispatch function closing connection Question: custom dispatch connection close Mar 22, 2019
@windrunner414
Copy link
Contributor

It is recommended to use the second dispatcher, otherwise problems may occur

@sshymko
Copy link
Contributor Author

sshymko commented Mar 22, 2019

The approach of caching the dispatch results by connection ID $fd demonstrated above does NOT even allow to implement an algorithm as simple as Round-Robin. Multiple requests of the same connection will end up being routed to the same worker ID rather than all workers sequentially. There seems to be nothing else besides $fd that can be used as a cache key.

The best implementation of the Round-Robin I could come up with does not seem reliable when the next request comes in before the previous one finishes and closes the connection:
https://github.com/upscalesoftware/swoole-dispatch/blob/master/src/RoundRobin.php

@sshymko
Copy link
Contributor Author

sshymko commented Mar 24, 2019

The fallback dispatch by the Client ID ($fd - 1) % $server->setting['worker_num'] is emulation of the built-in fixed client mode dispatch_mode = 2. The problem of duplicating the dispatch modes is raised in #2451.

@sshymko
Copy link
Contributor Author

sshymko commented Mar 24, 2019

Hm, multiple requests can be handled through the same persistent connection. So, there will be multiple calls to the dispatch function with request data ($type == 10) before the connection close call ($type == 4). All those requests can be dispatched to different workers, such as via Round-Robin, cannot they?

@twose, @matyhtf,
Does it even matter what worker performs the connection close? It appears that any worker can close the connection correctly. Is the complicated dispatch logic shown above unnecessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants