Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HttpClient][Contracts] introduce component and related contracts #30413

Merged
merged 3 commits into from Mar 7, 2019

Conversation

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Mar 1, 2019

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #28628
License MIT
Doc PR -

This PR introduces new HttpClient contracts and
component. It makes no compromises between DX, performance, and design.
Its surface should be very simple to use, while still flexible enough
to cover most advanced use cases thanks to streaming+laziness.

Common existing HTTP clients for PHP rely on PSR-7, which is complex
and orthogonal to the way Symfony is designed. More reasons we need
this in core are the package principles: if we want to be able to keep our
BC+deprecation promises, we have to build on more stable and more
abstract dependencies than Symfony itself. And we need an HTTP client
for e.g. Symfony Mailer or #27738.

The existing state-of-the-art puts a quite high bar in terms of features we must
support if we want any adoption. The code in this PR aims at implementing an
even better HTTP client for PHP than existing ones, with more (useful) features
and a better architecture. What a pitch :)

Two full implementations are provided:

  • NativeHttpClient is based on the native "http" stream wrapper.
    It's the most portable one but relies on a blocking fopen().
  • CurlHttpClient relies on the curl extension. It supports full
    concurrency and HTTP/2, including server push.

Here are some examples that work with both clients.

For simple cases, all the methods on responses are synchronous:

$client = new NativeHttpClient();

$response = $client->get('https://google.com');

$statusCode = $response->getStatusCode();
$headers = $response->getHeaders();
$content = $response->getContent();

By default, clients follow redirects. On 3xx, 4xx or 5xx, the getHeaders() and getContent() methods throw an exception, unless their $throw argument is set to false.
This is part of the "failsafe" design of the component. Another example of this
failsafe property is that broken dechunk or gzip streams always trigger an exception,
unlike most other HTTP clients who can silently ignore the situations.

An array of options allows adjusting the behavior when sending requests.
They are documented in HttpClientInterface.

When several responses are 1) first requested in batch, 2) then accessed
via any of their public methods, requests are done concurrently while
waiting for one.

For more advanced use cases, when streaming is needed:

Streaming the request body is possible via the "body" request option.
Streaming the response content is done via client's stream() method:

$client = new CurlHttpClient();

$response = $client->request('GET', 'http://...');

$output = fopen('output.file', 'w');

foreach ($client->stream($response) as $chunk) {
    fwrite($output, $chunk->getContent());
}

The stream() method also works with multiple responses:

$client = new CurlHttpClient();
$pool = [];

for ($i = 0; $i < 379; ++$i) {
    $uri = "https://http2.akamai.com/demo/tile-$i.png";
    $pool[] = $client->get($uri);
}

$chunks = $client->stream($pool);

foreach ($chunks as $response => $chunk) {
    // $chunk is a ChunkInterface object
    if ($chunk->isLast()) {
        $content = $response->getContent();
    }
}

The stream() method accepts a second $timeout argument: responses that
are inactive for longer than the timeout will emit an empty chunk to signal
it. Providing 0 as timeout allows monitoring responses in a non-blocking way.

Implemented:

  • flexible contracts for HTTP clients
  • fopen() + curl-based clients with close feature parity
  • gzip compression enabled when possible
  • streaming multiple responses concurrently
  • base_uri option for scoped clients
  • progress callback with detailed info and able to cancel the request
  • more flexible options for precise behavior control
  • flexible timeout management allowing e.g. server sent events
  • public key pinning
  • auto proxy configuration via env vars
  • transparent IDN support
  • HttpClient::create() factory
  • extensive error handling, e.g. on broken dechunk/gzip streams
  • time stats, primary_ip and other info inspired from curl_getinfo()
  • transparent HTTP/2-push support with authority validation
  • Psr18Client for integration with libs relying on PSR-18
  • free from memory leaks by avoiding circular references
  • fixed handling of redirects when using the fopen-based client
  • DNS cache pre-population with resolve option

Help wanted (can be done after merge):

  • FrameworkBundle integration: autowireable alias + semantic configuration for default options
  • add TraceableHttpClient and integrate with the profiler
  • logger integration
  • add a mock client

More ideas:

  • record/replay like CsaGuzzleBundle
  • use raw sockets instead of the HTTP stream wrapper
  • cookie_jar option
  • HTTP/HSTS cache
  • using the symfony CLI binary to test ssl-related options, HTTP/2-push, etc.
  • add "auto" mode to the "buffer" option, based on the content-type? or array of content-types to buffer
  • etc.
@gnugat

This comment has been minimized.

Copy link
Contributor

gnugat commented Mar 1, 2019

It seems like a shame for HttpClientInterface not to extend PSR-18 ClientInterface, and go for a Psr18Client adapter instead, but that seems consistent with the rest of the codebase so that sounds fine.

A pity HttpClientInterface defines send(string $method, string $url, iterable $options): ResponseInterface rather than send(RequestInterface $request): ResponseInterface. That design choice seems strange when compared to the HttpKernelInterface approach.

Finally it seems odd to define a new ResponseInterface rather than use the HttpFoundation one, but that's arguable (HttpFoundation might be more Server oriented, defining a new Response allows for an optimal client usage).

Sounds like this component was tailored for Symfony's internal use only (to be used by the Mail component), and is being made available for anyone who would want to use it, but given the choice between this component and Guzzle I think Guzzle would still be the go to solution.

@Taluu
Copy link
Contributor

Taluu left a comment

For the options, why not use the OptionsResolver component ?

], static function ($v) { return null !== $v; }),
'socket' => [
'bindto' => $options['bindto'],
'tcp_nodelay' => true,

This comment has been minimized.

@joelwurtz

joelwurtz Mar 1, 2019

Contributor

IMO this option should be set to false, true is very useful on small request, but on medium to large stream (like http) we want to avoid congestion by default.

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Mar 1, 2019

Author Member

Both fopen and curl already buffer as much as they can before writing to the network. This means the TCP Nagle algorithm is not needed and only degrades performance.

@nicolas-grekas nicolas-grekas force-pushed the nicolas-grekas:contracts-http-stream branch from 5c49831 to 5b07802 Mar 1, 2019

@nicolas-grekas

This comment has been minimized.

Copy link
Member Author

nicolas-grekas commented Mar 1, 2019

@fancyweb thanks, comments addressed.

why not use the OptionsResolver component

@Taluu because that would add a dependency and we prefer keeping the component standalone.

Sounds like this component was tailored for Symfony's internal use only

@gnugat not the case at all. The component is standalone and relies on decoupled interfaces. Your choice to use Guzzle. I'm going to replace it personnaly - in "Symfony apps" or "non-Symfony" ones.

@nicolas-grekas nicolas-grekas force-pushed the nicolas-grekas:contracts-http-stream branch 4 times, most recently from d9622bf to 49fa0b4 Mar 1, 2019

@nicolas-grekas nicolas-grekas force-pushed the nicolas-grekas:contracts-http-stream branch from 3d84105 to 53ebfae Mar 5, 2019

@joelwurtz

This comment has been minimized.

Copy link
Contributor

joelwurtz commented Mar 5, 2019

I'm trying to image how someone would write a decoration around the HttpClientInterface to add some behaviors, so let's start by an example with a simple log decoration system.

A naive approach could looks like:

class HttpClientLogger implements HttpClientInterface
{
   private const LOG_BUFFER_SIZE = 1024; //1kb of body in logs: can be an option but not the point here

    private $logger;

    private $client;

    public function __construct(HttpClientInterface $client, LoggerInterface $logger)
    {
        $this->logger = $logger;
        $this->client = $client;
    }

    public function request(string $method, string $url, array $options = []): ResponseInterface
    {
        $response = $this->client->request($method, $url, $options);

        $this->logger("$method request on $uri", ['response' => [
            'status_code' => $response->getStatusCode(),
            'body' => substr($response->getContent(), 0, self::LOG_BUFFER_SIZE),
        ]]);

        return $response;
    }

    public function stream($responses, float $timeout = null): ResponseStreamInterface
    {
        return $this->client->stream($responses, $timeout);
    }
}

This decorator is valid but breaks a lot of promises given by the HttpClientInterface:

  • It breaks laziness for the ResponseInterface (so no more async)
  • It breaks the promise of giving exception if the user don't use the getStatusCode (but was already discuted before so will not extends on that subject)
  • Also it may break consumer application if he was using non buffered response stream (as it will throw an exception when getting the content)

So let's try to make it work on async by adding a new decorator:

class HttpClientLogger implements HttpClientInterface
{
    public const LOG_BUFFER_SIZE = 1024; //1kb of body in logs: can be an option but not the point here

    private $logger;

    private $client;

    public function __construct(HttpClientInterface $client, LoggerInterface $logger)
    {
        $this->logger = $logger;
        $this->client = $client;
    }

    public function request(string $method, string $url, array $options = []): ResponseInterface
    {
        return $this->client->request($method, $url, $options);
    }

    public function stream($responses, float $timeout = null): ResponseStreamInterface
    {
        $responseStream = $this->client->stream($responses, $timeout);

        return new ResponseStreamLogger($responseStream, $this->logger);
    }
}

class ResponseStreamLogger implements ResponseStreamInterface
{
    private $responseStream;
    private $logger;
    private $responsesBuffer = [];

    public function __construct(ResponseStreamInterface $responseStream, LoggerInterface $logger)
    {
        $this->responseStream = $responseStream;
        $this->logger = $logger;
    }

    public function key(): ResponseInterface
    {
        return $this->responseStream->key();
    }

    public function current(): ChunkInterface
    {
        $chunk = $this->responseStream->current();

        // Buffer body
        $response = $this->key();
        $responseKey = spl_object_hash($response);

        if (!array_key_exists($responseKey, $this->responsesBuffer)) {
            $this->responsesBuffer[$responseKey] = "";
        }

        if (strlen($this->responsesBuffer[$responseKey]) < HttpClientLogger::LOG_BUFFER_SIZE) {
            $this->responsesBuffer[$responseKey] .= $chunk->getContent();
        }

        // Last chunk receive, response should be complete so safe to use some methods here
        if ($chunk->isLast()) {
            $url = $response->getInfo('url');

            $this->logger("Request on $url", ['response' => [
                'status_code' => $response->getStatusCode(),
                'body' => $this->responsesBuffer[$responseKey],
            ]]);
        }

        return $this->responseStream->current();
    }

    public function next(): void
    {
        $this->responseStream->next();
    }

    public function rewind(): void
    {
        $this->responseStream->rewind();
    }

    public function valid(): bool
    {
        return $this->responseStream->valid();
    }
}

Here we introduce 2 decorators

  • One on the response stream that allow logging when response is complete
  • One on the http client so we can decorate the response stream

Async is no longer a problem since we only rely on the stream method and log once a response is complete.

But there are other problems here :

  • We don't have the method nor the original url (if there was a redirection) on the log message, but this can be simply fixed by adding those parameters to the info array of the ResponseInterface (should this be done in a decorator also ?)
  • If someone don't use stream method and only need a sync context, the logger will never be called.

Then the only solution to make it works in all context and having the original url and method is by adding a third decorator on the ResponseInterface:

class HttpClientLogger implements HttpClientInterface
{
    public const LOG_BUFFER_SIZE = 1024; //1kb of body in logs: can be an option but not the point here

    private $logger;

    private $client;

    public function __construct(HttpClientInterface $client, LoggerInterface $logger)
    {
        $this->logger = $logger;
        $this->client = $client;
    }

    public function request(string $method, string $url, array $options = []): ResponseInterface
    {
        return new ResponseLogger($this->client->request($method, $url, $options), $this->logger, $method, $url);
    }

    public function stream($responses, float $timeout = null): ResponseStreamInterface
    {
        $subResponses = [];
        $indexedResponses = [];

        foreach ($responses as $response) {
            if ($response instanceof ResponseLogger) {
                // Make sure we pass the decorated response to the underlying client
                $subResponses[] = $response->getDecoratedResponse();
                $indexedResponses[spl_object_hash($response->getDecoratedResponse())] = $response;
            } else {
                $subResponses[] = $response;
            }
        }

        $responseStream = $this->client->stream($responses, $timeout);

        return new ResponseStreamLogger($responseStream, $this->logger, $indexedResponses);
    }
}

class ResponseLogger implements ResponseInterface
{
    private $response;

    private $logger;

    private $requestMethod;

    private $requestUrl;

    private $hasLogged = false;

    public function __construct(ResponseInterface $response, LoggerInterface $logger, string $requestMethod, string $requestUrl)
    {
        $this->response = $response;
        $this->logger = $logger;
        $this->requestMethod = $requestMethod;
        $this->requestUrl = $requestUrl;
    }

    public function getDecoratedResponse()
    {
        return $this->response;
    }

    public function getStatusCode()
    {
        $responseCode = $this->response->getStatusCode();

        if (!$this->hasLogged) {
            // Body may not be retrieved or not wanted avoid logging it
            $this->logger("$this->requestMethod request on $this->requestUrl", ['response' => [
                'status_code' => $responseCode,
            ]]);

            $this->hasLogged = true;
        }

        return $responseCode;
    }

    public function getHeaders()
    {
        if (!$this->hasLogged) {
            // Body may not be retrieve or not wanted avoid logging it
            // However safe to get status code since it should be before headers or in the headers (for http2)
            $this->logger("$this->requestMethod request on $this->requestUrl", ['response' => [
                'status_code' => $this->response->getStatusCode(),
            ]]);

            $this->hasLogged = true;
        }

        return $this->response->getHeaders();
    }

    public function getContent()
    {
        $content = $this->response->getContent();

        if (!$this->hasLogged) {
            // Safe to get content
            $this->logger("$this->requestMethod request on $this->requestUrl", ['response' => [
                'status_code' => $this->response->getStatusCode(),
                'body' => substr($content, 0, HttpClientLogger::LOG_BUFFER_SIZE),
            ]]);

            $this->hasLogged = true;
        }

        return $content;
    }

    public function getInfo(string $type = null)
    {
        $info = $this->response->getInfo();
        $info['request_method'] = $this->requestMethod;
        $info['request_url'] = $this->requestUrl;

        if (null !== $type) {
            return $info[$type];
        }

        return $info;
    }
}

class ResponseStreamLogger implements ResponseStreamInterface
{
    private $responseStream;
    private $logger;
    private $responsesBuffer = [];
    private $indexedResponses;

    public function __construct(ResponseStreamInterface $responseStream, LoggerInterface $logger, $indexedResponses)
    {
        $this->responseStream = $responseStream;
        $this->logger = $logger;
        $this->indexedResponses = $indexedResponses;
    }

    public function key(): ResponseInterface
    {
        return $this->responseStream->key();
    }

    public function current(): ChunkInterface
    {
        $chunk = $this->responseStream->current();

        // Buffer body
        $response = $this->key();
        $responseKey = spl_object_hash($response);


        if (!array_key_exists($responseKey, $this->responsesBuffer)) {
            $this->responsesBuffer[$responseKey] = "";
        }

        if (strlen($this->responsesBuffer[$responseKey]) < HttpClientLogger::LOG_BUFFER_SIZE) {
            $this->responsesBuffer[$responseKey] .= $chunk->getContent();
        }

        // Everything is good about the reponse so we are safe here as status code should be set
        if ($chunk->isLast()) {
            // Get status code before getting the log response to avoid double log
            $statusCode = $response->getStatusCode();

            // Get the log response from the indexed response
            if (!$response instanceof ResponseInterface) {
                if (!array_key_exists($responseKey, $this->indexedResponses)) {
                    return $chunk;
                }

                $response = $this->indexedResponses[$responseKey];
            }

            $url = $response->getInfo('request_url');
            $method = $response->getInfo('request_method');

            $this->logger("$method request on $url", ['response' => [
                'status_code' => $statusCode,
                'body' => $this->responsesBuffer[$responseKey],
            ]]);
        }

        return $chunk;
    }

    public function next(): void
    {
        return $this->responseStream->next();
    }

    public function rewind(): void
    {
        return $this->responseStream->rewind();
    }

    public function valid(): bool
    {
        return $this->responseStream->valid();
    }
}

Even if the code gets longer, we now have a fully logger that sould work, with a small problem on the logging that may different given the first function used by the consumer when in sync context. (but would call it good enough for a first try)

Also had to add a lot of code to ensure that the correct response is passed to the underlying decorator
on the stream method, and getting back the log response on the decorator. (So we get the correct request method and url in log).

Any middleware / decorator that needs to act on the response will need a system like this and I find this rather complex. It's, however, straightforward when only acting on the request.

Maybe there is a simpler solution by using the on_progress callback or another method (would love to have a example if there is), if not the case this may need a new interface in the future to handle those kind of middleware: like an event system that tell when the response is complete / has headers ready / has status ready or an other system (don't have a real idea ATM), in order to avoid a lot of pitfalls

@nicolas-grekas

This comment has been minimized.

Copy link
Member Author

nicolas-grekas commented Mar 6, 2019

@joelwurtz wow very nice analysis :)

The naive decorator is indeed broken, authors must respect the laziness of responses. If that's on purpose, getting the status code should be done using getInfo('http_code'), but only after getHeaders() has been called. Not a use case we should further optimize to me.

In #30414, you can find a decorator that changes the type of responses: it replaces streaming of chunks by streaming of completed responses (see complete() method on ApiClientInterface.)
There is a wide range of alternative streaming decorators we could build also, e.g. streaming of SSE messages, streaming of JSONs, etc. That's not exactly your topic as such decorators don't implement HttpClientInterface.

About decorators that implement HttpClientInterface:

The 2nd one is broken also, because it doesn't wrap the methods of ResponseInterface as you spotted.

We don't have the method nor the original url (if there was a redirection) on the log message, but this can be simply fixed by adding those parameters to the info array of the ResponseInterface (should this be done in a decorator also ?)

Not sure: there are many trivial ways around. I'd prefer keeping info clean with actual info that one cannot reasonably get on their own (at least for the core info - you can extend it as you wish of course.)

Your 3rd try is the good one. There is one mistake: you should use getInfo('http_code') instead of getStatusCode(). Also, you could log it on $chunk->isFirst().

And I reached the same conclusion as yours!

Any middleware / decorator that needs to act on the response will need a system like this and I find this rather complex.
Maybe there is a simpler solution by using the on_progress callback

💯 : the progress callback would have worked for the need. The only thing that you don't get in the progress callback is the body. Should we pass ?string $chunk to the callback? That'd make it even more powerful.

It's, however, straightforward when only acting on the request.

On purpose: I think that's what most decorators will do; adjust the options of the request.

this may need a new interface in the future to handle those kinds of middleware: like an event system that tell when the response is complete / has headers ready / has status ready or an other system

💯 again! You're right about the difficulty of adding a full decorator. We could ship one in the component and make it hook-able via callbacks and/or some middleware interface.
That'd be a really interesting future step for the component!

@joelwurtz

This comment has been minimized.

Copy link
Contributor

joelwurtz commented Mar 6, 2019

Also, you could log it on $chunk->isFirst().

Depend if we went to log the body or not, but yeah we can log as soon as we have a max body size reached or a last chunk.

the progress callback would have worked for the need. The only thing that you don't get in the progress callback is the body. Should we pass ?string $chunk to the callback? That'd make it even more powerful.

Progress callback would indeed make the case if we had the ?string chunk. However you said that this callback is called at different step, i would imagine that some parameters have special value given the step (like -1 for the end ?). Having a constant in the callback would definitely help like:

function progress(string $step, int $streamedSize, int $totalSize, array $info, ?string $chunk);

public const DNS_RESOLVED_STEP  = 'dns_resolved_step';
public const UPLOAD_PROGRESS_STEP = 'upload_progress_step';
public const HEADER_RECEIVE_STEP = 'header_receive_step';
public const DOWNLOAD_PROGRESS_STEP = 'download_progress_step';
public const RESPONSE_COMPLETE_STEP = 'response_complete_step';

However if we want to also log on error, maybe the progress callback would still not work (since we would have no log on dns resolution failure, as progress callback will never be called)

And it was only an exemple on a logger. Another middleware may not work as expected with the progress callback, i would imagine a Retry middleware, which goals is to retry request when the remote server failed, to not be feasible using the on_progress callback (or would be very hacky). So yes definitely need another interface for middleware.

@nicolas-grekas

This comment has been minimized.

Copy link
Member Author

nicolas-grekas commented Mar 6, 2019

I'm super happy to tell you that I found a way to remove the state change when calling getStatusCode(). I "just" had to replace it by an argument to getHeaders/getContent. So here we are, their signature is now: getHeaders(bool $throw = true)/getContent(bool $throw = true). See last commit for the patch. No need for special gymnastic anymore!

@nicolas-grekas

This comment has been minimized.

Copy link
Member Author

nicolas-grekas commented Mar 6, 2019

@joelwurtz, unfortunately, there is no way to implement the steps you describe: e.g. curl has no concept like that and it would be extremely hard to recreate it. fopen also is missing a lot of steps but I reimplemented the major missing ones in fact.

@weaverryan
Copy link
Member

weaverryan left a comment

After playing with this for more than a day now, it's a 👍 from me! The experience is wonderful and now that we've solved a few issues (like removing the getStatusCode() mutation), the "lazy" responses also yield an excellent experience when you just want to make one, sync request.

A few notes:

A) Responses are lazy. It will be important to document/teach that. In practice, it simply means that if you want to wrap your code in a try-catch, you should wrap all of your logic that deals with the response, not just the $client->response() call.

B) Validation on the options is awesome and the options are reasonable easy to find because the defaults are stored on a constant that is documented above every request() method

C) If want to do multiple requests in parallel, that works really well. And by leveraging the user_data option, you can basically "label" each request so that you can handle the responses appropriately.

D) Streaming a single large file also works well - I streamed a large file directly to a local file without any isuses or memory bump.

E) About decoration - @joelwurtz did an awesome job challenging this. For basic cases like logging, the on_progress option can be leveraged to log the status code from the $info that's passed to that callback. More complex decorators are possible - they may not be super easy, due to the async of the responses, but they're possible. And if you're building a decorator for your app and you don't care about async, decorators are super easy.

So, I think this is ready!

@arnaud-lb

This comment has been minimized.

Copy link
Contributor

arnaud-lb commented Mar 7, 2019

Since I came here to say what I disliked (and what I liked as well), I also need to say it when those things have changed or if I changed my mind.

The component actually does a great job at hiding an asynchronous internal and exposing
a synchronous-friendly API.

With the removal of ChunkInterface::__toString() and the getStatusCode() mutation, all the easy traps I could think of have disappeared.

The fact that other requests are running when waiting for a single request (even when not explicitly multiplexing with HttpClientInterface::stream()) is something that I initially overlooked (read too fast, or this was not in the initial PR description, I don't know). With this, having lazy responses completely make sense. I could think of way to avoid them, but not without making the API more complex, or requiring more steps.

@fabpot

fabpot approved these changes Mar 7, 2019

@nicolas-grekas nicolas-grekas force-pushed the nicolas-grekas:contracts-http-stream branch from 4c6539e to f64f2ab Mar 7, 2019

@nicolas-grekas nicolas-grekas force-pushed the nicolas-grekas:contracts-http-stream branch from f64f2ab to fc83120 Mar 7, 2019

@fabpot

This comment has been minimized.

Copy link
Member

fabpot commented Mar 7, 2019

Thank you @nicolas-grekas.

@fabpot fabpot merged commit fc83120 into symfony:master Mar 7, 2019

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
fabbot.io Your code looks good.
Details

fabpot added a commit that referenced this pull request Mar 7, 2019

feature #30413 [HttpClient][Contracts] introduce component and relate…
…d contracts (nicolas-grekas)

This PR was squashed before being merged into the 4.3-dev branch (closes #30413).

Discussion
----------

[HttpClient][Contracts] introduce component and related contracts

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #28628
| License       | MIT
| Doc PR        | -

This PR introduces new `HttpClient` contracts and
component. It makes no compromises between DX, performance, and design.
Its surface should be very simple to use, while still flexible enough
to cover most advanced use cases thanks to streaming+laziness.

Common existing HTTP clients for PHP rely on PSR-7, which is complex
and orthogonal to the way Symfony is designed. More reasons we need
this in core are the [package principles](https://en.wikipedia.org/wiki/Package_principles): if we want to be able to keep our
BC+deprecation promises, we have to build on more stable and more
abstract dependencies than Symfony itself. And we need an HTTP client
for e.g. Symfony Mailer or #27738.

The existing state-of-the-art puts a quite high bar in terms of features we must
support if we want any adoption. The code in this PR aims at implementing an
even better HTTP client for PHP than existing ones, with more (useful) features
and a better architecture. What a pitch :)

Two full implementations are provided:
 - `NativeHttpClient` is based on the native "http" stream wrapper.
   It's the most portable one but relies on a blocking `fopen()`.
 - `CurlHttpClient` relies on the curl extension. It supports full
   concurrency and HTTP/2, including server push.

Here are some examples that work with both clients.

For simple cases, all the methods on responses are synchronous:

```php
$client = new NativeHttpClient();

$response = $client->get('https://google.com');

$statusCode = $response->getStatusCode();
$headers = $response->getHeaders();
$content = $response->getContent();
```

By default, clients follow redirects. On `3xx`, `4xx` or `5xx`, the `getHeaders()` and `getContent()` methods throw an exception, unless their `$throw` argument is set to `false`.
This is part of the "failsafe" design of the component. Another example of this
failsafe property is that broken dechunk or gzip streams always trigger an exception,
unlike most other HTTP clients who can silently ignore the situations.

An array of options allows adjusting the behavior when sending requests.
They are documented in `HttpClientInterface`.

When several responses are 1) first requested in batch, 2) then accessed
via any of their public methods, requests are done concurrently while
waiting for one.

For more advanced use cases, when streaming is needed:

Streaming the request body is possible via the "body" request option.
Streaming the response content is done via client's `stream()` method:

```php
$client = new CurlHttpClient();

$response = $client->request('GET', 'http://...');

$output = fopen('output.file', 'w');

foreach ($client->stream($response) as $chunk) {
    fwrite($output, $chunk->getContent());
}
```

The `stream()` method also works with multiple responses:

```php
$client = new CurlHttpClient();
$pool = [];

for ($i = 0; $i < 379; ++$i) {
    $uri = "https://http2.akamai.com/demo/tile-$i.png";
    $pool[] = $client->get($uri);
}

$chunks = $client->stream($pool);

foreach ($chunks as $response => $chunk) {
    // $chunk is a ChunkInterface object
    if ($chunk->isLast()) {
        $content = $response->getContent();
    }
}
```

The `stream()` method accepts a second `$timeout` argument: responses that
are *inactive* for longer than the timeout will emit an empty chunk to signal
it. Providing `0` as timeout allows monitoring responses in a non-blocking way.

Implemented:
 - flexible contracts for HTTP clients
 - `fopen()` + `curl`-based clients with close feature parity
 - gzip compression enabled when possible
 - streaming multiple responses concurrently
 - `base_uri` option for scoped clients
 - progress callback with detailed info and able to cancel the request
 - more flexible options for precise behavior control
 - flexible timeout management allowing e.g. server sent events
 - public key pinning
 - auto proxy configuration via env vars
 - transparent IDN support
 - `HttpClient::create()` factory
 - extensive error handling, e.g. on broken dechunk/gzip streams
 - time stats, primary_ip and other info inspired from `curl_getinfo()`
 - transparent HTTP/2-push support with authority validation
 - `Psr18Client` for integration with libs relying on PSR-18
 - free from memory leaks by avoiding circular references
 - fixed handling of redirects when using the `fopen`-based client
 - DNS cache pre-population with `resolve` option

Help wanted (can be done after merge):
 - `FrameworkBundle` integration: autowireable alias + semantic configuration for default options
 - add `TraceableHttpClient` and integrate with the profiler
 - logger integration
 - add a mock client

More ideas:
 - record/replay like CsaGuzzleBundle
 - use raw sockets instead of the HTTP stream wrapper
 - `cookie_jar` option
 - HTTP/HSTS cache
 - using the symfony CLI binary to test ssl-related options, HTTP/2-push, etc.
 - add "auto" mode to the "buffer" option, based on the content-type? or array of content-types to buffer
 - *etc.*

Commits
-------

fc83120 [HttpClient] Add Psr18Client - aka a PSR-18 adapter
8610668 [HttpClient] introduce the component
d2d63a2 [Contracts] introduce HttpClient contracts

@nicolas-grekas nicolas-grekas deleted the nicolas-grekas:contracts-http-stream branch Mar 7, 2019

@nicolas-grekas

This comment has been minimized.

Copy link
Member Author

nicolas-grekas commented Mar 8, 2019

Thank you @fabpot and thank you to everyone involved. The discussion has been sometimes passionate, but I'm overall highly impressed by the technical level and quality of the reviews. This component is already the work of the community and not mine anymore. Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.