Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file.managed memory usage with s3 sources #32916

Closed
giannello opened this issue Apr 28, 2016 · 9 comments
Closed

file.managed memory usage with s3 sources #32916

giannello opened this issue Apr 28, 2016 · 9 comments
Labels
Bug broken, incorrect, or confusing behavior File Servers fixed-pls-verify fix is linked, bug author to confirm fix P3 Priority 3 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@giannello
Copy link
Contributor

Description of Issue/Question

Trying to use file.managed to get a file from S3 fails if the file is too big for the amount of memory available in the minion.

The problem seems to be caused by this call https://github.com/saltstack/salt/blob/develop/salt/utils/s3.py#L128 where the request is initialized once and used in different ways depending on the method passed to query.

A way to avoid the issue might be using streaming, as documented at http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow but that will require a bit of refactoring of the query method.

Versions Report

salt 2015.8.8.2 is definitely affected

@jfindlay jfindlay added the info-needed waiting for more info label Apr 29, 2016
@jfindlay jfindlay added this to the Blocked milestone Apr 29, 2016
@jfindlay
Copy link
Contributor

@giannello, thanks for reporting. Can you post the specific error message you are getting? This may be similar to #31466 or #31506.

@giannello
Copy link
Contributor Author

@jfindlay I'm trying to reproduce the error on a vagrant box, but seems like the OOM killer kicks in before I can get the error from salt. Will need to reproduce the problem on my live systems.
I'm pretty sure it's not tornato-related, even if the issue is similar to the two referenced issues. By reading the code and checking the memory usage at runtime it's clear that the downloaded file is held in memory, and IMHO that is already an issue, a software shouldn't rely on available memory to download a file whose size is not known in advance.

I'll try anyway to reproduce the issue and provide a proper stack trace.

@jfindlay
Copy link
Contributor

jfindlay commented May 9, 2016

@giannello, the issues I referenced were similar problems of downloading files into memory before writing them to disk. I am wondering if we have not fixed all cases of this problem yet if you are seeing it on 2015.8.8.2. Any specific info you can give on how to reproduce or identify the problem would be great, even if you're not able to verify again.

@giannello
Copy link
Contributor Author

How to reproduce

Tested on a 512MB vagrant box with Ubuntu 14.04. big_file is a 1.1G binary file.
Memory allocated by salt-call increases at a rate depending on the download speed. As soon as there's no more memory available, the OOM killer kicks in.

root@dev:/tmp# salt-call state.single file.managed name=/tmp/big_file source=s3://my-bucket/big_file source_hash=s3://my-bucket/big_file.sha1
[INFO    ] Determining pillar cache
[INFO    ] Determining pillar cache
[INFO    ] Loading fresh modules for state activity
[INFO    ] Running state [/tmp/big_file] at time 07:31:28.732270
[INFO    ] Executing state file.managed for /tmp/big_file
[INFO    ] Starting new HTTPS connection (1): my-bucket.s3.amazonaws.com
[INFO    ] Starting new HTTPS connection (1): my-bucket.s3.amazonaws.com
Killed
root@dev:/tmp# salt-call s3.get my_bucket big_file local_file=/tmp/big_file
[INFO    ] Determining pillar cache
[INFO    ] Starting new HTTPS connection (1): my-bucket.s3.amazonaws.com
Killed
root@dev:/tmp#

@lomeroe
Copy link
Contributor

lomeroe commented Jun 1, 2016

@giannello #33599 should fix this in develop. I'm hoping to get backport PRs submitted this week too, but if you could test out the develop fix, I would appreciate it.

@giannello
Copy link
Contributor Author

thanks @lomeroe - I'll test after coming back from holidays.

@jfindlay jfindlay modified the milestones: Approved, Blocked Jun 1, 2016
@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. fixed-pls-verify fix is linked, bug author to confirm fix File Servers and removed info-needed waiting for more info labels Jun 1, 2016
@jfindlay
Copy link
Contributor

jfindlay commented Jun 1, 2016

@lomeroe, thanks for fixing this.

@jfindlay
Copy link
Contributor

jfindlay commented Jun 1, 2016

@lomeroe, @giannello, @rallytime has backported this to 2015.8: #33681.

@rallytime
Copy link
Contributor

Back-porting PRs have been submitted and merged. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior File Servers fixed-pls-verify fix is linked, bug author to confirm fix P3 Priority 3 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

4 participants