Change content app's working directory dynamically #1503

lubosmj · 2021-07-16T22:50:39Z

As of this commit, content app is no longer storing temporary files in the /var/run/ directory. The temporary files were created during on-demand downloading and were not removed until, e.g., restarting pulp services.

closes #9000

pulpbot · 2021-07-16T22:51:04Z

Attached issue: https://pulp.plan.io/issues/9000

lubosmj · 2021-07-17T13:37:54Z

pulpcore/content/handler.py

@@ -753,6 +753,9 @@ def _save_artifact(self, download_result, remote_artifact):
            if update_content_artifact:
                content_artifact.artifact = artifact
                content_artifact.save()
+
+                os.unlink(download_result.path)


Considering a repository that was synced with the on-demand policy, once content app receives N requests targeting the same content unit simultaneously, it tries to download the unit N times. The first downloaded file is associated with an Artifact. The rest of the downloaded files are not cleared up and reside on disk after exiting the method _save_artifact().

I believe calling os.unlink() is not the best idea, but I was not able to come up with something better. Maybe it would be better to check whether download_result.path starts with settings.WORKING_DIRECTORY and then call os.unlink() (just to ma ke sure that we downloaded a file and we will take care of it)? I have no idea how is the method _save_artifact() used in plugins (if it is even used) and how can this change affect their processes. Since settings.WORKING_DIRECTORY is a working directory for content app, the aforesaid check might be reasonable.

I'm not sure if this is the right place to put this os.unlink() call. I think the call should be after line 715 because if that artifact.touch() call succeeds then it means that the artifact is already present on the system and thus the temporary file from the download will not be cleaned up from the artifact.save() code.

os.unlick()

😲

lubosmj · 2021-07-19T16:29:40Z

I moved the os.unlink() call, as proposed by @gerrod3. Furthermore, I added an if-statement that decides whether a temporary file will be removed or not, based on its path.

EDIT: After discussing the changes with @dkliban, I removed the if-statement which checks the path of a temporary file.

dralley · 2021-07-21T18:24:59Z

pulpcore/content/handler.py

+                    # The file needs to be unlinked because it was not used to create an artifact.
+                    # The artifact must have already been saved while servicing another request for
+                    # the same artifact.
+                    os.unlink(download_result.path)


I think this is a good mechanism to have in place regardless, but should we also do anything to prevent the same artifact from being downloaded multiple times in parallel to begin with? @bmbouter @gerrod3

Personally I don't think we should be manipulating the downloaded files like this. It's hard to fully know the side-effects I think. Maybe I should look at the code more to understand this line's value.

Regarding preventing duplicate downloading, it's tough because across multiple processes that involves DB coordination which is very likely not worth it. Even for 1 process though there are other things for example, it could be significant for the headers_ready callback firing on both downloads but if we deduplicate them and the event already occured the second one won't receive it. Overall I put these kinds of optimizations in the tricky and not worth it category. That's my take, it could be wrong.

For example what if it wasn't used to create an Artifact, but consumed from some other 3rd party process and this code is being called from a newly registered handler that is just calling into this method. It's a contrived example I know, but my general point is, I don't think we really know unlinking is right in 100% of cases.

@bmbouter If the unlinking of the file does not occur here it is going to remain in /var/lib/pulp/tmp. This line of code is only executed if the same file is requested at the same time and only one of the downloaded files is actually used to create the artifact.

The other option is to have a dedicated directory under /var/lib/pulp/tmp for each instance of the content app and then coordinate cleanup of those directories.

Thanks for explaining this. This makes sense to me now. +1 to this.

bmbouter

Thanks!

dralley · 2021-07-21T20:31:12Z

@bmbouter @dkliban Will the directory-changing require any work on the SELinux policy side?

bmbouter · 2021-07-21T20:33:43Z

@bmbouter @dkliban Will the directory-changing require any work on the SELinux policy side?

I don't know, but probably? This is why we need to have SELinux enforcing added to the CI and our dev environments.

ipanova · 2021-07-22T10:01:39Z

CHANGES/9000.bugfix

@@ -0,0 +1 @@
+Updated the content app's working directory to ``WORKING_DIRECTORY`` specified in ``settings.py``.


since we're exposing this to the users maybe it's worth explaining why this change was made? or how this change fixed the underlying issue?

+1. Maybe "Fixed a bug where on-demand downloads would fill up /var/run/ by not deleting downloaded files"

@dralley this one is tricky. This PR does not primarily focus on deleting downloaded files (compared to fbe2d7f). But, in addition to changing the content app working directory, we delete only redundant files that will be never associated with artifacts. I am fine with your wording, though.

I think downloaded files that were simply not made into artifacts still counts

dralley

LGTM once the changelog wording is adjusted.

As of this commit, content app is no longer storing temporary files in the /var/run/ directory. The temporary files were created during on-demand downloading and were not removed until, e.g., restarting pulp services. closes #9000

lubosmj force-pushed the on-demnad-chdir-9000 branch 2 times, most recently from 8088ea7 to 6d1d871 Compare July 16, 2021 23:21

lubosmj commented Jul 17, 2021

View reviewed changes

lubosmj force-pushed the on-demnad-chdir-9000 branch 6 times, most recently from 19f9a1e to 4e8ebcf Compare July 19, 2021 16:25

lubosmj force-pushed the on-demnad-chdir-9000 branch 2 times, most recently from 49a9f8a to 5dad0e2 Compare July 21, 2021 18:19

dralley reviewed Jul 21, 2021

View reviewed changes

bmbouter approved these changes Jul 21, 2021

View reviewed changes

ipanova reviewed Jul 22, 2021

View reviewed changes

dralley approved these changes Jul 22, 2021

View reviewed changes

Change content app's working directory dynamically

e4bd409

As of this commit, content app is no longer storing temporary files in the /var/run/ directory. The temporary files were created during on-demand downloading and were not removed until, e.g., restarting pulp services. closes #9000

lubosmj force-pushed the on-demnad-chdir-9000 branch from 5dad0e2 to e4bd409 Compare July 22, 2021 22:35

dkliban approved these changes Jul 22, 2021

View reviewed changes

dkliban merged commit f8bd7f8 into pulp:master Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change content app's working directory dynamically #1503

Change content app's working directory dynamically #1503

lubosmj commented Jul 16, 2021

pulpbot commented Jul 16, 2021

lubosmj Jul 17, 2021

gerrod3 Jul 19, 2021 •

edited

Loading

daviddavis Jul 19, 2021

lubosmj commented Jul 19, 2021 •

edited

Loading

dralley Jul 21, 2021 •

edited

Loading

bmbouter Jul 21, 2021

bmbouter Jul 21, 2021

dkliban Jul 21, 2021

bmbouter Jul 21, 2021

bmbouter left a comment

dralley commented Jul 21, 2021

bmbouter commented Jul 21, 2021

ipanova Jul 22, 2021

dralley Jul 22, 2021

lubosmj Jul 22, 2021

dralley Jul 22, 2021

dralley left a comment

		@@ -0,0 +1 @@
		Updated the content app's working directory to ``WORKING_DIRECTORY`` specified in ``settings.py``.

Change content app's working directory dynamically #1503

Change content app's working directory dynamically #1503

Conversation

lubosmj commented Jul 16, 2021

pulpbot commented Jul 16, 2021

Choose a reason for hiding this comment

gerrod3 Jul 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lubosmj commented Jul 19, 2021 • edited Loading

dralley Jul 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmbouter left a comment

Choose a reason for hiding this comment

dralley commented Jul 21, 2021

bmbouter commented Jul 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley left a comment

Choose a reason for hiding this comment

gerrod3 Jul 19, 2021 •

edited

Loading

lubosmj commented Jul 19, 2021 •

edited

Loading

dralley Jul 21, 2021 •

edited

Loading