Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When passed a filename, download a report in chunks #321

Merged
merged 4 commits into from
Jun 1, 2018

Conversation

toofishes
Copy link
Contributor

@toofishes toofishes commented Jun 1, 2018

Description

Rather than pulling an entire 6-8 GB report into a string in memory to
immediately write it out to a file, use the built-in feature of Net::HTTP to
retrieve the result in chunks and write them out as they are read from the
remote host. This prevents receiving NoMemoryError issues when downloading
huge reports.

While here, also add Travis CI testing for Ruby 2.5.

Motivation and Context

We're hitting out of memory errors trying to download huge reports using this gem.

How Has This Been Tested?

We have a set of VCR-based tests that invoke this gem, and specifically the download method. I was able to use this version of the download method and see our test suite pass successfully.

Unfortunately, we don't have tests for this method in this gem itself. It probably makes sense to do some sort of VCR recording or cassette, which I could attempt if you think that makes sense. EDIT: tests now included.

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have updated the documentation accordingly (if changes are required).
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Rather than pulling an entire 6-8 GB report into a string in memory to
immediately write it out to a file, use the built-in feature of Net::HTTP to
retrieve the result in chunks and write them out as they are read from the
remote host. This prevents receiving `NoMemoryError` issues when downloading
huge reports.
Ruby 2.5 has been out for some time now, we should make sure tests work.
This adds three different tests for three different ways to call this method.
end

it 'downloads report with file object' do
tf = Tempfile.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have to specify a basename for backwards compatibility: https://blog.bigbinary.com/2017/04/04/ruby-2-4-has-default-basename-for-tempfile-create.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed that change as I noticed the build had failed. All ruby versions green now.

- Security Console
body:
encoding: ASCII-8BIT
string: "<NexposeReport version=\"2.0\"></NexposeReport>"
Copy link
Contributor

@gschneider-r7 gschneider-r7 Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some dummy data here to ensure there are enough chunks to iterate? Does vcr actually simulate that properly?

This might be getting into "are you just testing the standard library now" territory, and if the answer to that is yes it's not worth worrying about too much as we would catch problems here in our internal automated testing anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut is the latter; I also don't know if VCR would even emulate that the same way an actual Net::HTTP call would work, or if it is always going to return the data in the yaml file here as a single chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spent a few minutes diving into the VCR and WebMock source code. I didn't see any obvious way to get this to be a multipart or chunked response in the VCR cassette, so I think we're just going to have to rely on the standard library and/or internal testing here.

@toofishes
Copy link
Contributor Author

I should point out that this is very similar to #248, but I attempted to leave the non-filename path alone as much as possible, and not add any block_given? logic.

@gschneider-r7
Copy link
Contributor

This looks good so once I get a chance to test this myself I'll probably merge and release. Thanks for doing this since I completely forgot about the past attempt. 😓

Copy link
Contributor

@gschneider-r7 gschneider-r7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked as expected with a 400 MB PDF report.

@gschneider-r7 gschneider-r7 merged commit b44ca25 into rapid7:master Jun 1, 2018
@gschneider-r7
Copy link
Contributor

Version 7.2.1 is published to RubyGems.

@toofishes
Copy link
Contributor Author

Thank you, appreciate you getting this reviewed and merged so quickly!

@gschneider-r7
Copy link
Contributor

Just a note for anyone coming here later to find out if this change broke their scripts:

It seems like tempfile handling changed between Ruby 2.1.x and 2.2.x versions. If you use a tempfile and do not keep the file reference open after the report is downloaded it seems to get garbage collected immediately after the report completes (likely due to the scope change on the file block) which deletes the temp file from disk. As far as I can tell, when this happens, it means the tempfile is being used incorrectly (e.g. passing the path around instead of the file object). To avoid this just make sure any tempfile usage keeps the file object in scope until it's no longer needed.

For example, this won't work because the file won't exist when you try to open it:

report_file = Tempfile.new('report').path
nsc.download(report.uri, report_file)
File.open(report_file, 'rb') do |f|
  # do stuff with the file
end

Correct usage:

report_file = Tempfile.new('report')
nsc.download(report.uri, report_file) # you can also use report_file.path but it's redundant
File.open(report_file, 'rb') do |f|
  # do stuff with the file
end

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants