Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HttpClient] Never process more than 300 requests at the same time #38690

Closed
wants to merge 1 commit into from

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Oct 23, 2020

Q A
Branch? 5.x
Bug fix? no
New feature? no
Deprecations? no
Tickets -
License MIT
Doc PR -

Almost one year ago, I wrote this patch to put curl handles on an internal queue when more than 300 requests are run concurrently.

I have a reproducing script from @Seldaek that mirrors packagist instances and that makes roughly a million requests while doing so.

This script is able to chunk the requests by batches, e.g completing 500 reqs, then 500 again, etc. until the end.

Batching is required because curl/the network hangs otherwise when concurrency goes high (>5k).

I wrote this patch hoping to remove the need for batching requests in smaller chunks. It works, but at some point it still hangs.

I'm therefore submitting this patch for the record as it would need more work to be made bulletproof under the said situation. Since batching just works well enough, this might not be needed after all.

/cc @fancyweb @jderusse FYI since you're interested in high throughput uses of HttpClient.

Note that neither AmpHttpClient is able to handle the load here. /cc @kelunik

@nicolas-grekas
Copy link
Member Author

For the record also, here is the script I use to run the requests.
It is extracted from https://github.com/composer/mirror
I run it from the root of the symfony/symfony repository.
Check the XXX notes in the source for adjusting the load of the test.
Ensure you have a good network connection to really see what's going on :)

<?php

use Symfony\Component\HttpClient\AmpHttpClient;
use Symfony\Component\HttpClient\CurlHttpClient;
use Symfony\Component\HttpClient\NativeHttpClient;
use Symfony\Component\HttpClient\Exception\TransportException;
use Symfony\Component\Finder\Finder;
use Symfony\Component\Lock\Factory;
use Symfony\Component\Lock\Store\FlockStore;

ini_set('memory_limit', '3G');
ini_set('display_errors', true);
ini_set('error_reporting', -1);
ini_set('date.timezone', 'UTC');

require __DIR__ . '/vendor/autoload.php';

set_error_handler(function ($errno, $errstr, $errfile, $errline) {
    throw new \ErrorException($errstr, $errno, E_ERROR, $errfile, $errline);
});

class Mirror {
    private $client;
    private $url = 'https://repo.packagist.org';
    private $hostname = 'repo.packagist.org';

    public function sync()
    {
#        $this->client = new AmpHttpClient();
        $this->client = new CurlHttpClient();
#        $this->client = new NativeHttpClient();

        $rootResp = $this->download('/packages.json');

        //if ($rootResp->getHeaders()['content-encoding'][0] !== 'gzip') {
        //    throw new \Exception('Expected gzip encoded responses, something is off');
        //}
        $rootData = $rootResp->getContent();

        $rootJson = json_decode($rootData, true);
        if (null === $rootJson) {
            throw new \Exception('Invalid JSON received for file /packages.json: '.$rootData);
        }

        $requests = [];
        foreach ($rootJson['provider-includes'] as $listing => $opts) {
            $listing = str_replace('%hash%', $opts['sha256'], $listing);
            $listingResp = $this->download('/'.$listing);

            $listingData = $listingResp->getContent();
            if (hash('sha256', $listingData) !== $opts['sha256']) {
                throw new \Exception('Invalid hash received for file /'.$listing);
            }

            $listingJson = json_decode($listingData, true);
            if (null === $listingJson) {
                throw new \Exception('Invalid JSON received for file /'.$listing.': '.$listingData);
            }

            foreach ($listingJson['providers'] as $pkg => $opts) {
                $provPath = '/p/'.$pkg.'$'.$opts['sha256'].'.json';
                $provAltPath = '/p/'.$pkg.'.json';

                $provPathV2 = '/p2/'.$pkg.'.json';
                $provPathV2Dev = '/p2/'.$pkg.'~dev.json';

                $userData = [$provPath, $provAltPath, $opts['sha256']];
                $requests[] = ['GET', $this->url.$provPath, ['user_data' => $userData]];

                $headers = rand(0,1) === 0 ? ['If-Modified-Since' => gmdate('D, d M Y H:i:s T')] : [];
                $userData = [$provPathV2, null, null];
                $requests[] = ['GET', $this->url.$provPathV2, ['user_data' => $userData, 'headers' => $headers]];

                $headers = rand(0,1) === 0 ? ['If-Modified-Since' => gmdate('D, d M Y H:i:s T')] : [];
                $userData = [$provPathV2Dev, null, null];
                $requests[] = ['GET', $this->url.$provPathV2Dev, ['user_data' => $userData, 'headers' => $headers]];

                if (\count($requests) > 10000) { #XXX change here the max num of requests to do
                    break 2;
                }
            }

            $listingsToWrite['/'.$listing] = [$listingData, strtotime($listingResp->getHeaders()['last-modified'][0])];
        }

        $hasFailedRequests = false;
        echo count($requests), "\n";

        while ($requests && !$hasFailedRequests) {
            $responses = [];
            foreach (array_splice($requests, 0, 1000) as $i => $req) { #XXX change here the max num of requests per batch
                $responses[] = $this->client->request(...$req);
            }

            $i = 0;
            foreach ($this->client->stream($responses) as $response => $chunk) {
                try {
                    // TODO isLast here can probably be dropped after we upgrade to symfony 4.3.5 with https://github.com/symfony/symfony/pull/33643
                    if ($chunk->isFirst()/* || $chunk->isLast()*/) {
                        if ($response->getStatusCode() === 304) {
                            echo '3';
                            $response->cancel();
                            continue;
                        }

                        if ($response->getStatusCode() === 404) {
                            $userData = $response->getInfo('user_data');

                            // provider v2
                            if (null === $userData[1]) {
                                echo '4';
                                $response->cancel();
                                continue;
                            }
                        }
                    }

                    if ($chunk->isLast()) {
                        echo '.';
                        $userData = $response->getInfo('user_data');

                        if (null === $userData[1]) { // provider v2
                            $providerData = $response->getContent();
                            if (null === json_decode($providerData, true)) {
                                // abort all responses to avoid triggering any other exception then throw
                                array_map(function ($r) { $r->cancel(); }, $responses);
                                echo 'X1';
                                dump($response);
                                dump(\strlen($providerData));
                                throw new \Exception('Invalid JSON received for file '.$userData[0]);
                            }

                        } else { // provider v1
                            $providerData = $response->getContent();
                            if (null === json_decode($providerData, true)) {
                                // abort all responses to avoid triggering any other exception then throw
                                array_map(function ($r) { $r->cancel(); }, $responses);
                                echo 'X2';
                                throw new \Exception('Invalid JSON received for file '.$userData[0]);
                            }
                            if (hash('sha256', $providerData) !== $userData[2]) {
                                // abort all responses to avoid triggering any other exception then throw
                                array_map(function ($r) { $r->cancel(); }, $responses);
                                echo 'X3';
                                throw new \Exception('Invalid hash received for file '.$userData[0]);
                            }
                        }
                    }
                } catch (\Throwable $e) {
                    dump($e->getMessage());
                }
            }
        }

        if ($hasFailedRequests) {
            return false;
        }

        return true;
    }

    private function download($file)
    {
        try {
            $resp = $this->client->request('GET', $this->url.$file);
            // trigger throws if needed
            $resp->getContent();
        } catch (TransportException $e) {
            // retry once
            usleep(10000);
            $resp = $this->client->request('GET', $this->url.$file);
            // trigger throws if needed
            $resp->getContent();
        }

        if ($resp->getStatusCode() >= 300) {
            throw new \RuntimeException('Failed to fetch '.$file.' => '.$response->getStatusCode() .' '. $response->getContent());
        }

        return $resp;
    }
}

try {
    $mirror = new Mirror();
    echo ($mirror->sync() ? 'SUCCESS' : 'FAIL').PHP_EOL;
} catch (\Throwable $e) {
    echo 'Failed at '.date('Y-m-d H:i:s');
    throw $e;
}

@nicolas-grekas
Copy link
Member Author

I'm closing as explained. Feel free to take over in any way.

@nicolas-grekas nicolas-grekas deleted the hc-queue branch October 23, 2020 13:46
@kelunik
Copy link
Contributor

kelunik commented Oct 23, 2020

@nicolas-grekas I can't reproduce hanging with Curl locally, maybe I need a better connection? I've increased the request numbers, but still no hanging.

As I don't want to make too many requests, I have replaced the inner concurrent requests with https://localhost:1338 (using HTTP/2; otherwise the default 6 connections per host concurrency limit applies), but no hanging there either. With AmpHttpClient I could reproduce some hanging, caused by amphp/http-server detecting the requests as flooding attack, but AmpHttpClient also makes much more concurrent requests than CurlHttpClient. Do you know where that limit is configured for Curl?

@Seldaek
Copy link
Member

Seldaek commented Oct 24, 2020

@kelunik if you hit localhost you got unlimited bandwidth which might not result in hanging in the same pattern I guess.. I mean feel free to hit packagist with the original script, especially during the weekend we have lots of bandwidth to spare.

@kelunik
Copy link
Contributor

kelunik commented Oct 24, 2020

@Seldaek Thanks, that's good to hear. I can reproduce the hanging with Curl. Amp runs into an out of memory error, but seems to run "fine" if I increase the memory to 5G. I've increased the request count and batch size to 100k. It's interesting that both Curl and Amp are limited by CPU time at that point.

Amp spends 22.5% in the GC and around 10% in URL parsing.

@kelunik
Copy link
Contributor

kelunik commented Oct 25, 2020

@Seldaek @nicolas-grekas Thanks for the ping, there's some strange server behavior that uncovered a couple of bugs in amphp/http-client. I have released v4.5.1 to fix these.

Here's what happens: The server closes the HTTP/2 connection after 1000 requests with a GOAWAY frame. Amp immediately closed the connection in that case, while it should have been gracefully shutdown, stopping new requests on the connection but still awaiting pending responses.

However, the server doesn't always send a GOAWAY frame after 1000 requests. It sometimes keeps the connection open but without responding anymore. Amp didn't detect such behaviors and retried on the same connection on timeouts. It pings on transfer timeouts now and shuts the connection down if the PONG takes too long. This results in some hanging until the timeout is reached and a new connection is established, but I guess that's acceptable for such a server behavior.

After many requests, the script gets slower. One thing I noticed is that the server closes the connection after about 600 requests then, which might be some flood detection / prevention.

I've run my tests with the script ported to use amphp/http-client directly to see the exact exceptions and with a concurrency limit of 64 instead of using batching, which reduces memory usage a lot. If requests are batched internally, the request objects have to be kept im memory instead of only the array of requests to make.

Your script based on symfony/http-client still runs into timeouts for me, might be because you're not using the transfer timeout option, do you? I know that the inactivity timeout is disabled by Symfony. This doesn't trigger the timeout detection then, resulting in timed out responses potentially reusing the same connection again.

@nicolas-grekas
Copy link
Member Author

Thanks for investigating, I'm happy this helped identify some bugs.

This doesn't trigger the timeout detection then, resulting in timed out responses potentially reusing the same connection again.

The issue with setInactivityTimeout() is that it assumes that the timeout is bound to a request. But it's not in Symfony's HttpClient: when streaming, the consumer can decide to just wait the time it wants between chunks, and this time can be shorter or longer than the timeout set when doing the request.

We could call setInactivityTimeout() with the timeout option. But that would mean that when streaming, ppl wouldn't be able to wait for longer than this timeout, while they can today if they want to.

What we could do is to always reset the h2 stream when any timeout is raised. How can we do that from the Symfony side?

@kelunik
Copy link
Contributor

kelunik commented Oct 28, 2020

@nicolas-grekas This is a good call. I think the inactivity timeout shouldn't apply if the client is busy, but rather only if the receive buffer is empty and no new data is received.

fabpot added a commit that referenced this pull request Nov 19, 2020
…s break (nicolas-grekas)

This PR was merged into the 4.4 branch.

Discussion
----------

[HttpClient] don't fallback to HTTP/1.1 when HTTP/2 streams break

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | Fix composer/composer#9481
| License       | MIT
| Doc PR        | -

With this change, I don't reproduce the failures that I describe in composer/composer#9481 when running the script in #38690

Apparently curl has an issue when both h1.1 and h2 connections are open to the same host.

Instead of switching to HTTP/1.1 when retrying requests that failed because of an HTTP/2 stream error, I propose to close the http/2 connection when issuing a retry.

With this change, running the mirroring script of packagist works like a charm.

No need to investigate your mirrors @Seldaek, this was definitely a data corruption issue.

Commits
-------

0c92bc5 [HttpClient] don't fallback to HTTP/1.1 when HTTP/2 streams break
@kelunik
Copy link
Contributor

kelunik commented Nov 22, 2020

@nicolas-grekas Thanks again for the ping. It turns out there were a couple of more bugs:

  • The client didn't account discarded DATA frames to the stream window, potentially resulting in hanging connections.
  • There was a potential for HPack state synchronization issues with async event listeners.
  • Frames received after GOAWAY frames haven't been processed, because the parser stopped after parsing a single GOAWAY frame.

These have now been fixed in the following releases:

@kelunik
Copy link
Contributor

kelunik commented Nov 29, 2020

@nicolas-grekas Another week, another update:

  • The inactivity timeout applies like it should now on HTTP/2 with v4.5.4.
  • Reliable shutdown was broken on Nginx as well, which has been fixed in July 2020 in Nginx.

@nicolas-grekas
Copy link
Member Author

Perfect, thanks for the update, this triggered many improvements apparently :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants