Skip to content
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.

Memory exhausted processing large streams #46

Merged
merged 3 commits into from
Dec 18, 2014
Merged

Memory exhausted processing large streams #46

merged 3 commits into from
Dec 18, 2014

Conversation

nikosdion
Copy link
Contributor

Executive summary

Memory exhaustion error when using ReadLimitEntityBody with streams larger thant the available PHP memory.

Technical explanation

The __toString() method in Guzzle\Http\ReadLimitEntityBody will try to read the entire stream into memory, then use substr() to select a subset. If you have a stream larger than the available PHP memory your script will crash due to lack of sufficient PHP memory.

This can be fixed by duplicating some of the code in Guzzle\Stream\Stream's __toString method. This allows us to only read into memory the part of the stream defined by the $limit and $offset in the ReadLimitEntityBody instance. It's not as elegant or academically correct as the existing code but it solves a serious practical shortcoming.

Use case

I've hit this issue while using Amazon SDK for PHP to perform a multipart upload of a 2Gb file to S3 using the AWS4 (v4) signature method. When using the unmodified Guzzle 3 sources it resulted in an immediate PHP memory limit error. After the proposed change the upload completes successfully.

Backwards compatibility issues

No backwards compatibility issues are expected. The proposed change is enclosed in an if-block which makes it run only when $body implements StreamInterface. In any other case it works as a simple decorator (prior behaviour).

# Executive summary

Memory exhaustion error when using ReadLimitEntityBody with streams larger thant the available PHP memory.

# Technical explanation

The __toString() method in Guzzle\Http\ReadLimitEntityBody will try to read the entire stream into memory, then use substr() to select a subset. If you have a stream larger than the available PHP memory your script will crash due to lack of sufficient PHP memory.

This can be fixed by duplicating some of the code in Guzzle\Stream\Stream's __toString method. This allows us to only read into memory the part of the stream defined by the $limit and $offset in the ReadLimitEntityBody instance. It's not as elegant or academically correct as the existing code but it solves a serious practical shortcoming.

# Use case

I've hit this issue while using Amazon SDK for PHP to perform a multipart upload of a 2Gb file to S3 using the AWS4 (v4) signature method. When using the unmodified Guzzle 3 sources it resulted in an immediate PHP memory limit error. After the proposed change the upload completes successfully.

# Backwards compatibility issues

No backwards compatibility issues are expected. The proposed change is enclosed in an if-block which makes it run only when $body implements StreamInterface. In any other case it works as a simple decorator (prior behaviour).
@nikosdion
Copy link
Contributor Author

No response in 25 days is not very polite. I assume you don't care about fixing very simple but highly disruptive bugs, so I'll save you some administrative burden and close this PR myself. Goodbye.

@nikosdion nikosdion closed this Dec 18, 2014
@GrahamCampbell
Copy link
Member

@mtdowling Is not obligated to do anything.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

@nikosdion A better idea would be to politely ping him. He probably just missed this.

@nikosdion
Copy link
Contributor Author

Graham, I really do understand that nobody is obliged to do anything in a FOSS project. I didn't complain about my PR not being merged, I complained about not receiving any reply after a reasonable amount of time. I've already forked Guzzle, applied my patch and shipped my software. I only contributed this PR in the hope that I could help other people.

As you can see in my GitHub profile I am an active open source PHP developer. I've never let a PR slip more than 24 hours before responding. The projects I've contributed to respond within a few days at most if they're interested in my code. That's why 25 days seemed too much to me.

FWIW, copy-pasting the no warranty license clause is an indirect way to tell people to buzz off. Obviously not your intention and I do understand your frustration thinking that I'm just another askhole (yeah, I've read past issues, they sound very familiar in tone and quality, I feel your pain...). Letting me know who I should've pinged is enough and very useful indeed, so thank you very much for that!

PS: Thank you for your FOSS contributions and for your reply in this PR. It's very much appreciated!

@nikosdion nikosdion reopened this Dec 18, 2014
@@ -30,6 +32,21 @@ public function __construct(EntityBodyInterface $body, $limit, $offset = 0)
*/
public function __toString()
{
// If the body implements the StreamInterface use a memory optimised read
if ($this->body instanceof StreamInterface)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think $this->body is always a StreamInterface, so this is unnecessary.

@mtdowling
Copy link
Member

Thanks for the PR and sorry for the delay in responding. I saw this a while ago and meant to respond, but I knew that I needed to do a code review... This caused me to procrastinate, which caused me to forget.

The change looks good, but I have some comments in the code review that need to be addressed.

@nikosdion
Copy link
Contributor Author

Thank you for your reply! I have fixed the code style issues you pointed out. Please let me know if there's anything else I have to change in this PR.

}

$originalPos = $this->body->ftell();
$data = stream_get_contents($this->body->getStream(), $this->limit, $this->offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice this earlier, but instead of getting the stream resource from the underlying stream object, it would be better to read from the underlying stream object directly. By getting the stream resource, we break any decorator chains that a user may have configured. So basically this becomes: $data = $this->read($this->limit);. This should work because the read() function will handle ensuring that the maximum length of the stream is not exceeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I added $this->body->seek($this->offset); before the read to make sure that the cursor is at the correct location before reading. As far as I can see read() doesn’t handle that. That’s why I was getting the stream handle in the first place: my upload to Amazon S3 would be corrupt as I was reading the same 5Mb from the start of the file when I was just using read().=

@mtdowling mtdowling merged commit 5566516 into guzzle:master Dec 18, 2014
@mtdowling
Copy link
Member

Thanks for the contribution. I had to make a minor tweak to account for negative limits, but it's merged in now.

@nikosdion
Copy link
Contributor Author

Thank you very much!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants