-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change content app's working directory dynamically #1503
Conversation
Attached issue: https://pulp.plan.io/issues/9000 |
8088ea7
to
6d1d871
Compare
pulpcore/content/handler.py
Outdated
@@ -753,6 +753,9 @@ def _save_artifact(self, download_result, remote_artifact): | |||
if update_content_artifact: | |||
content_artifact.artifact = artifact | |||
content_artifact.save() | |||
|
|||
os.unlink(download_result.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering a repository that was synced with the on-demand policy, once content app receives N requests targeting the same content unit simultaneously, it tries to download the unit N times. The first downloaded file is associated with an Artifact. The rest of the downloaded files are not cleared up and reside on disk after exiting the method _save_artifact()
.
I believe calling os.unlink()
is not the best idea, but I was not able to come up with something better. Maybe it would be better to check whether download_result.path
starts with settings.WORKING_DIRECTORY
and then call os.unlink()
(just to ma ke sure that we downloaded a file and we will take care of it)? I have no idea how is the method _save_artifact()
used in plugins (if it is even used) and how can this change affect their processes. Since settings.WORKING_DIRECTORY
is a working directory for content app, the aforesaid check might be reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right place to put this os.unlink()
call. I think the call should be after line 715 because if that artifact.touch()
call succeeds then it means that the artifact is already present on the system and thus the temporary file from the download will not be cleaned up from the artifact.save()
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.unlick()
😲
19f9a1e
to
4e8ebcf
Compare
49a9f8a
to
5dad0e2
Compare
# The file needs to be unlinked because it was not used to create an artifact. | ||
# The artifact must have already been saved while servicing another request for | ||
# the same artifact. | ||
os.unlink(download_result.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think we should be manipulating the downloaded files like this. It's hard to fully know the side-effects I think. Maybe I should look at the code more to understand this line's value.
Regarding preventing duplicate downloading, it's tough because across multiple processes that involves DB coordination which is very likely not worth it. Even for 1 process though there are other things for example, it could be significant for the headers_ready callback firing on both downloads but if we deduplicate them and the event already occured the second one won't receive it. Overall I put these kinds of optimizations in the tricky and not worth it category. That's my take, it could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example what if it wasn't used to create an Artifact, but consumed from some other 3rd party process and this code is being called from a newly registered handler that is just calling into this method. It's a contrived example I know, but my general point is, I don't think we really know unlinking is right in 100% of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmbouter If the unlinking of the file does not occur here it is going to remain in /var/lib/pulp/tmp. This line of code is only executed if the same file is requested at the same time and only one of the downloaded files is actually used to create the artifact.
The other option is to have a dedicated directory under /var/lib/pulp/tmp for each instance of the content app and then coordinate cleanup of those directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining this. This makes sense to me now. +1 to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
CHANGES/9000.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Updated the content app's working directory to ``WORKING_DIRECTORY`` specified in ``settings.py``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're exposing this to the users maybe it's worth explaining why this change was made? or how this change fixed the underlying issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Maybe "Fixed a bug where on-demand downloads would fill up /var/run/ by not deleting downloaded files"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think downloaded files that were simply not made into artifacts still counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the changelog wording is adjusted.
As of this commit, content app is no longer storing temporary files in the /var/run/ directory. The temporary files were created during on-demand downloading and were not removed until, e.g., restarting pulp services. closes #9000
5dad0e2
to
e4bd409
Compare
As of this commit, content app is no longer storing temporary files in the /var/run/ directory. The temporary files were created during on-demand downloading and were not removed until, e.g., restarting pulp services.
closes #9000