New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using `GuzzleHttp\Pool` with a huge number of requests eventually exhausts available memory #1932

Closed
garbetjie opened this Issue Oct 5, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@garbetjie

garbetjie commented Oct 5, 2017

Q A
Bug? yes
New Feature? no
Version 6.3.0
PHP Version 7.1.9

Actual Behavior

Batching a large number of requests using GuzzleHttp\Pool causes increasing memory use, until the script runs out of memory. This happens regardless of the handler being used.

Expected Behavior

Memory use to rise initially, but then to level out at some point, and for all the requests to be executed.

Steps to Reproduce

  1. Retrieve a list of all package names from https://packagist.org/packages/list.json, and extract just the package names.
  2. Use GuzzleHttp\Pool::batch to make requests to fetch the data for all of these packages.

See gist https://gist.github.com/garbetjie/d9ef3eb95fc5db33316d4b6799ddc07a for screenshots and source of a test script written to show the error happening. guzzle-pool-test.php is the test file to use.

@alexeyshockov

This comment has been minimized.

Contributor

alexeyshockov commented Oct 5, 2017

It's not a bug.

You are using Pool::batch(), that collects all the responses from the iterator into an array to return them as the result. That's why more and more memory is used. So the behaviour is correct.

If you don't want to collect all the responses, you should use each_limit() function. I slightly modified your script, take a look at it. It uses a constant amount of memory.

@garbetjie

This comment has been minimized.

garbetjie commented Oct 5, 2017

Yes, @alexeyshockov. You are 100% correct in that Pool::batch() collects the responses - I didn't realize that. Thank you for pointing it out.

As a side note, I tried using your solution with each_limit, but it executed each request sequentially. I've updated my original gist with a new solution - still using the pool (https://gist.github.com/garbetjie/d9ef3eb95fc5db33316d4b6799ddc07a/revisions for the changes).

Thanks so much for the help. I'll close this issue now.

@garbetjie garbetjie closed this Oct 5, 2017

@ekojs

This comment has been minimized.

ekojs commented Oct 10, 2018

I have same problem here, if i use the script in singleton format. The request eventually exhausts available memory. Although the script is same as @garbetjie .

Please take look at this https://gist.github.com/ekojs/e112a89aaf1c342d3f06115b9e14a534

@ekojs

This comment has been minimized.

ekojs commented Oct 10, 2018

Sorry to bother, i've solve the problem using yield

$params = function($packageNames){
			foreach($packageNames as $v){
				yield array(
					'method' => 'GET',
					'url' => "https://packagist.org/p/{$v}.json",
					'headers' => [],
					'body' => null
				);
			}
		};

Please take a look in rev 1 https://gist.github.com/ekojs/e112a89aaf1c342d3f06115b9e14a534/revisions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment