New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using GuzzleHttp\Pool doesn't free up memory after request is fulfilled #1407

Open
drsect0r opened this Issue Feb 19, 2016 · 20 comments

Comments

Projects
None yet
@drsect0r

drsect0r commented Feb 19, 2016

I need to download a lot of remote packages. I am using the Pool function according to the documentation here: http://docs.guzzlephp.org/en/latest/quickstart.html#making-a-request

I'm not sure how I can free up the used memory. The packages are all small but several tens of thousands. How can I properly unset/free $response? unset(), $response->getBody()->close(); all seem not to do what I want.


use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;

require __DIR__ . '/vendor/autoload.php';

$client = new Client(['base_uri' => '...']);

$requests = function ()
{
    ...

    yield $index => new Request('GET', ...);
};

$pool = new Pool($client, $requests(), [
    'concurrency' => 50,
    'fulfilled' => function ($response, $index)
    {
        $content = $response->getBody()
                            ->getContents();

        file_put_contents('storage/' . $index, $content);

        print 'fulfilled index:' . $index . PHP_EOL;
    },
    'rejected' => function ($reason, $index)
    {
        print 'rejected index:' . $index . PHP_EOL;
    },
]);

$promise = $pool->promise();

$promise->wait();
@mtdowling

This comment has been minimized.

Show comment
Hide comment
@mtdowling

mtdowling Mar 1, 2016

Member

Can you ensure that garbage collection is enabled? Perhaps in your iterator, you can call gc_collect_cycles().

Member

mtdowling commented Mar 1, 2016

Can you ensure that garbage collection is enabled? Perhaps in your iterator, you can call gc_collect_cycles().

@drsect0r

This comment has been minimized.

Show comment
Hide comment
@drsect0r

drsect0r Mar 7, 2016

@mtdowling Thanks for the suggestion! however, I don't see any noticeable drop in memory usage. I call gc_collect_cycles() after a request has finished.

drsect0r commented Mar 7, 2016

@mtdowling Thanks for the suggestion! however, I don't see any noticeable drop in memory usage. I call gc_collect_cycles() after a request has finished.

@mtdowling

This comment has been minimized.

Show comment
Hide comment
@mtdowling

mtdowling Mar 7, 2016

Member

Hm, in that case we'll need more info. Can you provide a complete reproducible code same, PHP version, and curl version?

Member

mtdowling commented Mar 7, 2016

Hm, in that case we'll need more info. Can you provide a complete reproducible code same, PHP version, and curl version?

@mtdowling

This comment has been minimized.

Show comment
Hide comment
@mtdowling

mtdowling Mar 7, 2016

Member

I was unable to reproduce this behavior on PHP 7.0.3. Here's an example I put together to try to reproduce (first run make start-server from a guzzle git checkout):

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Tests\Server;

$client = new Client();
$perfUrl = Server::$url . 'guzzle-server/perf';
echo $perfUrl . "\n";

$requests = function () use ($perfUrl) {
    for ($i = 0; $i < 10000; $i++) {
        yield $i => new Request('GET', $perfUrl);
    }
};

$t = microtime(true);

$pool = new Pool($client, $requests(), [
    'concurrency' => 50,
    'fulfilled' => function ($response, $index) {
        $content = $response->getBody()->getContents();
        file_put_contents('/tmp/guzzle-test-' . $index, $content);
        echo memory_get_usage(true) . "\n";
    },
    'rejected' => function ($reason, $index) {
        print 'rejected index:' . $index . PHP_EOL;
    },
]);

$promise = $pool->promise();
$promise->wait();

echo (microtime(true) - $t) . "\n";
echo memory_get_usage(true) . "\n";

If you run the above code example, you'll see the memory footprint remain constant.

  1. When you say that the pool does not free up memory, how much memory are you talking about?
  2. Does the amount of memory rise with each iteration, or does it remain constant?
  3. In your generator example, you have a placeholder .... What's the actual code that was omitted?
  4. Can you provide suggestions on how I can edit my code example to reproduce the error you are describing?
Member

mtdowling commented Mar 7, 2016

I was unable to reproduce this behavior on PHP 7.0.3. Here's an example I put together to try to reproduce (first run make start-server from a guzzle git checkout):

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Pool;
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Tests\Server;

$client = new Client();
$perfUrl = Server::$url . 'guzzle-server/perf';
echo $perfUrl . "\n";

$requests = function () use ($perfUrl) {
    for ($i = 0; $i < 10000; $i++) {
        yield $i => new Request('GET', $perfUrl);
    }
};

$t = microtime(true);

$pool = new Pool($client, $requests(), [
    'concurrency' => 50,
    'fulfilled' => function ($response, $index) {
        $content = $response->getBody()->getContents();
        file_put_contents('/tmp/guzzle-test-' . $index, $content);
        echo memory_get_usage(true) . "\n";
    },
    'rejected' => function ($reason, $index) {
        print 'rejected index:' . $index . PHP_EOL;
    },
]);

$promise = $pool->promise();
$promise->wait();

echo (microtime(true) - $t) . "\n";
echo memory_get_usage(true) . "\n";

If you run the above code example, you'll see the memory footprint remain constant.

  1. When you say that the pool does not free up memory, how much memory are you talking about?
  2. Does the amount of memory rise with each iteration, or does it remain constant?
  3. In your generator example, you have a placeholder .... What's the actual code that was omitted?
  4. Can you provide suggestions on how I can edit my code example to reproduce the error you are describing?
@drsect0r

This comment has been minimized.

Show comment
Hide comment
@drsect0r

drsect0r Mar 8, 2016

@mtdowling

  1. The familiar Allowed memory size of ... bytes exhausted (tried to allocate ... bytes) memory_limit is set to 1GB, I can see in htop that memory climbs rapidly.
  2. It remains constant for anywhere between 2 and 35 iterations
  3. Parsing about 100k JSON files, conduct sanity checks and yield request based on JSON data.
  4. I'm not sure, I have the impression that my code consumes more memory when more packages are being downloaded. After a request is finished is the request destroyed? (the downloaded content?)

At the moment, I'm unable to provide with a code example. I did put memory_get_usage() in my current code with the following results.

php -q test.php
4194304
...
6291456
...
8388608
...
10485760
...
12582912
...
14680064
...
16777216
...
18874368
...
20971520
...
23068672
...
25165824
...
27262976
cat output.txt | uniq -c
      1 4194304
      3 6291456
      3 8388608
      6 10485760
      2 12582912
      9 14680064
      9 16777216
     21 18874368
      6 20971520
     32 23068672
     16 25165824
     35 27262976

drsect0r commented Mar 8, 2016

@mtdowling

  1. The familiar Allowed memory size of ... bytes exhausted (tried to allocate ... bytes) memory_limit is set to 1GB, I can see in htop that memory climbs rapidly.
  2. It remains constant for anywhere between 2 and 35 iterations
  3. Parsing about 100k JSON files, conduct sanity checks and yield request based on JSON data.
  4. I'm not sure, I have the impression that my code consumes more memory when more packages are being downloaded. After a request is finished is the request destroyed? (the downloaded content?)

At the moment, I'm unable to provide with a code example. I did put memory_get_usage() in my current code with the following results.

php -q test.php
4194304
...
6291456
...
8388608
...
10485760
...
12582912
...
14680064
...
16777216
...
18874368
...
20971520
...
23068672
...
25165824
...
27262976
cat output.txt | uniq -c
      1 4194304
      3 6291456
      3 8388608
      6 10485760
      2 12582912
      9 14680064
      9 16777216
     21 18874368
      6 20971520
     32 23068672
     16 25165824
     35 27262976
@mtdowling

This comment has been minimized.

Show comment
Hide comment
@mtdowling

mtdowling Mar 9, 2016

Member

Interesting. I'll need a reproducible example in order to investigate this further.

Member

mtdowling commented Mar 9, 2016

Interesting. I'll need a reproducible example in order to investigate this further.

@vallieres

This comment has been minimized.

Show comment
Hide comment
@vallieres

vallieres Apr 13, 2016

I too have a similar problem. I'm using Guzzle to loop through a product inventory (between 100 and 20,000 items) and API calls (limit 100 at a time). I only do GET requests, quickly take the response as string, parse it and dump it in a file. However, the peak memory usage keeps increasing.

I also tried using request sink (in case the php://temp was saving the quest in memory), but the total memory used easily gets in the 25MB just after 2-3 calls. Tried to run the script with 20,000 items and it passed 540MB of memory.

Is there a way to release resources used by the client after a successful GET requests?

vallieres commented Apr 13, 2016

I too have a similar problem. I'm using Guzzle to loop through a product inventory (between 100 and 20,000 items) and API calls (limit 100 at a time). I only do GET requests, quickly take the response as string, parse it and dump it in a file. However, the peak memory usage keeps increasing.

I also tried using request sink (in case the php://temp was saving the quest in memory), but the total memory used easily gets in the 25MB just after 2-3 calls. Tried to run the script with 20,000 items and it passed 540MB of memory.

Is there a way to release resources used by the client after a successful GET requests?

@mtdowling

This comment has been minimized.

Show comment
Hide comment
@mtdowling

mtdowling Apr 13, 2016

Member

What version of cURL are you using? Perhaps you're affected by the same cURL memory leak as this user: aws/aws-sdk-php#957 (comment)

Member

mtdowling commented Apr 13, 2016

What version of cURL are you using? Perhaps you're affected by the same cURL memory leak as this user: aws/aws-sdk-php#957 (comment)

@vallieres

This comment has been minimized.

Show comment
Hide comment
@vallieres

vallieres Apr 13, 2016

PHP 5.6.20
curl 7.43.0

I do not seem to have the same version.

vallieres commented Apr 13, 2016

PHP 5.6.20
curl 7.43.0

I do not seem to have the same version.

@drsect0r

This comment has been minimized.

Show comment
Hide comment
@drsect0r

drsect0r Apr 13, 2016

cURL 7.43.0
PHP 7.0.5

I also don't have the same version to the post but I do have the same version of cURL as @vallieres.

drsect0r commented Apr 13, 2016

cURL 7.43.0
PHP 7.0.5

I also don't have the same version to the post but I do have the same version of cURL as @vallieres.

@keltik85

This comment has been minimized.

Show comment
Hide comment
@keltik85

keltik85 Mar 16, 2017

I also have the same problem.

I initialize the requests iterator like this (in total it should 'yield' 40000+ Request objects, but it stops at aproximately 8000):

$requests = function () {
        $result = $this->my_database->get_some_result ();
        $start_time = microtime ( true );
        $counter = 0;
        $result_num_rows = $result->num_rows;

        while ( $row = $result->fetch_object () ) {
            $counter ++;
            if ($counter % 25 == 0) {
                $elapsed_time = microtime ( true ) - $startTime;
                echo "Memory used after '" . $counter . "' requests:" .
                         (memory_get_usage () * 0.000001) . "MB, aproximated runtime:" .
                         (((($result_num_rows / $counter) * $elapsed_time) / 60) / 60) . "\n";
            }
            //echo "A Request was inited for id = " . $row->id . "\n";
            yield new Request ( 'GET', $some_url . $row->id );
        }
    };

And the pool like this:

$pool = new Pool ( $client, $requests (),
               [ 'concurrency' => 50,
                       'fulfilled' => function ($response, $index) {
                        //do nothing
                       },
                       'rejected' => function ($reason, $index) {
                           //do nothing
                       }
               ] );

       // Initiate the transfers and create a promise
       $promise = $pool->promise ();

       // Force the pool of requests to complete.
       $promise->wait ();

keltik85 commented Mar 16, 2017

I also have the same problem.

I initialize the requests iterator like this (in total it should 'yield' 40000+ Request objects, but it stops at aproximately 8000):

$requests = function () {
        $result = $this->my_database->get_some_result ();
        $start_time = microtime ( true );
        $counter = 0;
        $result_num_rows = $result->num_rows;

        while ( $row = $result->fetch_object () ) {
            $counter ++;
            if ($counter % 25 == 0) {
                $elapsed_time = microtime ( true ) - $startTime;
                echo "Memory used after '" . $counter . "' requests:" .
                         (memory_get_usage () * 0.000001) . "MB, aproximated runtime:" .
                         (((($result_num_rows / $counter) * $elapsed_time) / 60) / 60) . "\n";
            }
            //echo "A Request was inited for id = " . $row->id . "\n";
            yield new Request ( 'GET', $some_url . $row->id );
        }
    };

And the pool like this:

$pool = new Pool ( $client, $requests (),
               [ 'concurrency' => 50,
                       'fulfilled' => function ($response, $index) {
                        //do nothing
                       },
                       'rejected' => function ($reason, $index) {
                           //do nothing
                       }
               ] );

       // Initiate the transfers and create a promise
       $promise = $pool->promise ();

       // Force the pool of requests to complete.
       $promise->wait ();

@PMassoels

This comment has been minimized.

Show comment
Hide comment
@PMassoels

PMassoels Jul 14, 2017

I seem to have the same problem. On every call I make the memory increases. The problem is that my scripts extracts 20.000 values from an excel file and uploads them with the api. In the end everything is very slow because every cycle adds 0.15 mb.

PMassoels commented Jul 14, 2017

I seem to have the same problem. On every call I make the memory increases. The problem is that my scripts extracts 20.000 values from an excel file and uploads them with the api. In the end everything is very slow because every cycle adds 0.15 mb.

@istrof

This comment has been minimized.

Show comment
Hide comment
@istrof

istrof Aug 18, 2017

Hey guys, I was experiencing the same issue. Every guzzle request was adding 0.07 - 0.2 MB, so 10K requests were causing memory crash.

Then I realized that Guzzle was dev mode because Symfony commands are working in dev mode by default.
Adding "--env=prod" to the command line reduced 1GB of consumed memory down to 50MB :)

istrof commented Aug 18, 2017

Hey guys, I was experiencing the same issue. Every guzzle request was adding 0.07 - 0.2 MB, so 10K requests were causing memory crash.

Then I realized that Guzzle was dev mode because Symfony commands are working in dev mode by default.
Adding "--env=prod" to the command line reduced 1GB of consumed memory down to 50MB :)

@keltik85

This comment has been minimized.

Show comment
Hide comment
@keltik85

keltik85 Aug 18, 2017

Adding "--env=prod" to the command line reduced 1GB of consumed memory down to 50MB :)

If its not in the docs or wiki or readme it should be. In like bold and italics or even an example....

keltik85 commented Aug 18, 2017

Adding "--env=prod" to the command line reduced 1GB of consumed memory down to 50MB :)

If its not in the docs or wiki or readme it should be. In like bold and italics or even an example....

@vallieres

This comment has been minimized.

Show comment
Hide comment
@vallieres

vallieres Aug 21, 2017

@keltik85 any way to set the env when simply running it as PHP script triggered from a webpage? I do not use Symfony.

vallieres commented Aug 21, 2017

@keltik85 any way to set the env when simply running it as PHP script triggered from a webpage? I do not use Symfony.

@alexeyshockov

This comment has been minimized.

Show comment
Hide comment
@alexeyshockov

alexeyshockov Aug 21, 2017

Contributor

@vallieres, different environments is a Symfony's feature, doesn't apply on "pure" PHP. You can check your framework, BTW, many have such concept now (Laravel, Yii,..).

Contributor

alexeyshockov commented Aug 21, 2017

@vallieres, different environments is a Symfony's feature, doesn't apply on "pure" PHP. You can check your framework, BTW, many have such concept now (Laravel, Yii,..).

@pablolemosassis

This comment has been minimized.

Show comment
Hide comment
@pablolemosassis

pablolemosassis Aug 22, 2017

Also same problem here. CPU going crazy when making lots of calls.
@alekseykuleshov we understand that environments are a framework feature, but how can we directly set Guzzle to production mode? There is no reference at all to "dev mode" in Guzzle docs.

pablolemosassis commented Aug 22, 2017

Also same problem here. CPU going crazy when making lots of calls.
@alekseykuleshov we understand that environments are a framework feature, but how can we directly set Guzzle to production mode? There is no reference at all to "dev mode" in Guzzle docs.

@bong0

This comment has been minimized.

Show comment
Hide comment
@bong0

bong0 Oct 9, 2017

I think the needs-more-information tag is obsolete currently, isn't it?

bong0 commented Oct 9, 2017

I think the needs-more-information tag is obsolete currently, isn't it?

@kgrosvenor

This comment has been minimized.

Show comment
Hide comment
@kgrosvenor

kgrosvenor May 8, 2018

I think i get a memory leak like this when repeating the same call from my client

kgrosvenor commented May 8, 2018

I think i get a memory leak like this when repeating the same call from my client

@celloman

This comment has been minimized.

Show comment
Hide comment
@celloman

celloman Jun 26, 2018

So... how does one actually get Guzzle out of this this mystical 'dev mode' to prevent the memory leak? We are seeing the same .15 MB leak/call... not great when dealing with large numbers of requests...

celloman commented Jun 26, 2018

So... how does one actually get Guzzle out of this this mystical 'dev mode' to prevent the memory leak? We are seeing the same .15 MB leak/call... not great when dealing with large numbers of requests...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment