Use PulpTemporaryFile to store uploaded chunks #133

lubosmj · 2020-08-08T21:02:53Z

closes #7218

pulpbot · 2020-08-08T21:08:43Z

Attached issue: https://pulp.plan.io/issues/7218

mdellweg · 2020-08-10T06:30:37Z

pulp_container/app/migrations/0004_upload.py

-            },
-        ),
-    ]
+    operations = []


Please do not modify existing migrations.

I was getting errors when I removed the function generate_filename():

pulp_container/pulp_container/app/models.py

Lines 362 to 365 in 1380003

def generate_filename(instance, filename):

"""Method for generating upload file name"""

filename = os.path.join(instance.upload_dir, str(instance.pk) + INCOMPLETE_EXT)

return time.strftime(filename)

.
That is why I purged this migration. Do you propose to leave the function there?

In that case, i guess yes.

ipanova · 2020-08-10T13:55:17Z

CHANGES/7218.bugfix

@@ -0,0 +1,2 @@
+Refactored the registry's push API to not store uploaded chunks in /var/lib/pulp, but rather


i'd rather mark this a feature then bugfix

closes #7218

mdellweg · 2020-08-12T09:20:57Z

I am quite confused by the size of this PR. Can you elaborate, why subclassing is neccessary instead of "just" replacing the temporary_file with a pulp_temporary_file?
Did you fix additional issues with the registry upload api?

lubosmj · 2020-08-12T09:55:36Z

@mdellweg, the problem here is that you cannot open a file in the append mode in S3. Therefore, you should not just simply replace the instance of a temporary file. You may either delete and write an updated file back or create a bunch of files that will be later assembled into a single file. I decided to go for the latter.

So, after merging the proposed changes, the registry will work in the following way (assuming that a database server is accessible to multiple Pulp instances):

Create an Upload object (identifying one upload within a repository).
Save each uploaded chunk to the S3 storage as a temporary file.
When the upload finishes, build an artifact from the temporary files.
Delete temporary files and the Upload object.

Speaking of additional issues with the upload API, I did not fix anything. It should work like it was working in the past (let's say for now that it was really working). When I was working on this change, the code was seemingly working because the test passed. However, I have to made a few additional changes because podman behaved differently compared to docker.

mdellweg

Not a full review.
But i tried to express, why i think a lot of the logic here belongs in pulpcore.
It's all about do not reinvent the wheel.

mdellweg · 2020-08-12T10:04:24Z

pulp_container/app/models.py

+    cumulative_size = models.BigIntegerField(default=0)
+
+
+class BlobTemporaryUpload(PulpTemporaryFile):


Should this be an UploadChunk then?

mdellweg · 2020-08-12T10:09:28Z

pulp_container/app/models.py

+class BlobTemporaryUpload(PulpTemporaryFile):
+    """
+    A model used for storing uploaded blob chunks in a temporary file.
+    """


Would it make sense (and maybe the code easier) if you didn't inherit from PulpTemporaryFile, but have a one-to-one relation with cascaded delete here?
It would render this model to be a rich (with the additional offset) join table from Upload to PulpTemporaryFile.

mdellweg · 2020-08-12T10:14:46Z

pulp_container/app/__init__.py


 class PulpContainerPluginAppConfig(PulpPluginAppConfig):
    """Entry point for the container plugin."""

    name = "pulp_container.app"
    label = "container"
+
+    def ready(self):


I think, this should be handled in pulpcore by using PulpTemporaryFile.

mdellweg · 2020-08-12T10:22:42Z

pulp_container/app/models.py

+        self._init_temporary_file(chunk)
+        self._update_upload_size(chunk, chunk_size)


Do those two methods not know chunk and upload from self?

No. Not in the time of initialization.

mdellweg · 2020-08-12T10:24:04Z

pulp_container/app/models.py

+        with NamedTemporaryFile("ab") as temp_file:
+            while True:
+                subchunk = chunk.read(2000000)
+                if not subchunk:
+                    break
+                temp_file.write(subchunk)

+            temp_file.flush()
+
+            self.file = File(open(temp_file.name, "rb"))


Also this part i expect to be handled by using PulpTemporaryFile.

In this case, we are dealing specifically with a chunk object which behaves differently compared to an ordinary file object. I think you need to read chunk in chunks, like so: chunk.read(X). In PulpTemporaryFile, we are dealing just with objects of the type File or PulpTemporaryUploadedFile.

This would require me to implement if else checks for the object's type in PulpTemporaryFile, would not?

Isn't the chunk just a stream of raw data that needs to be written to a file (on whatever storage) for later use?

Maybe we should have broken this change down in two steps:

Handle storage without append

Move from local files to PTFile.

Yes, it is.

The tests for S3 would not pass because I will be trying to open a file in the append mode.

This is what I actually did.

mdellweg · 2020-08-12T10:27:55Z

pulp_container/app/registry_api.py

+        chunks = models.BlobTemporaryUpload.objects.filter(upload=upload).order_by("offset")
+        chunks_files = map(lambda chunk: chunk.file, chunks)
+
+        with NamedTemporaryFile("ab") as temp_file:


Isn't there a "turn this PulpTemporaryFile into an artifact primitive suitable for this?

No, since I am constructing an Artifact object from a bunch of PulpTemporaryFile objects, like I did mention above.

Sorry, i thought you assembled all the chunks into a new PTFile, but then again you can just create the Artifact directly.

I cannot create an Artifact directly because it is not possible to open files in the append mode. I have to iterate through chunks, merge them, and then create an Artifact on a single write.

Yeah, that is what i meant. I first thought you'd "upload" that (locally) assembled file into a new PTFile to make an artifact from it. But that step is clearly unnecessary, because you can create the artifact directly from that assembled file.

lubosmj · 2020-08-13T17:37:09Z

I am putting this PR on hold due to the recent findings of errors in PulpTemporaryFile. The discussion about whether we want to handle the removal of temporary files in plugins or in pulpcore was moved here: pulp/pulpcore#844.

lubosmj · 2020-08-31T07:29:32Z

It looks like all the issues related to PulpTemporaryFile have been resolved. I will continue working on this PR soon.

lubosmj · 2020-09-16T17:56:21Z

The issue has been addressed in pulpcore. Refer to pulp/pulpcore#914.

lubosmj · 2020-10-09T15:18:43Z

I am closing this PR in favour of pulp/pulpcore#914. Additional changes (if any) will be made in a separate PR. The PR will reference the same issue number.

mdellweg · 2020-11-12T09:24:19Z

@lubosmj with pulp/pulpcore#914 merged, this can be picked up again, right?

lubosmj force-pushed the pulp-temporary-file-storage-7218 branch from d1b3b19 to 6857771 Compare August 8, 2020 21:06

pulpbot added the Needs Cherry Pick label Aug 8, 2020

lubosmj force-pushed the pulp-temporary-file-storage-7218 branch 2 times, most recently from 9bfe37a to b2e1821 Compare August 8, 2020 22:24

mdellweg reviewed Aug 10, 2020

View reviewed changes

ipanova removed the Needs Cherry Pick label Aug 10, 2020

lubosmj force-pushed the pulp-temporary-file-storage-7218 branch 3 times, most recently from 06cd286 to 43a2725 Compare August 10, 2020 12:44

ipanova reviewed Aug 10, 2020

View reviewed changes

Use PulpTemporaryFile to store uploaded chunks

f39a0c5

closes #7218

lubosmj force-pushed the pulp-temporary-file-storage-7218 branch from 43a2725 to f39a0c5 Compare August 10, 2020 17:40

mdellweg reviewed Aug 12, 2020

View reviewed changes

lubosmj marked this pull request as draft August 13, 2020 11:01

lubosmj closed this Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PulpTemporaryFile to store uploaded chunks #133

Use PulpTemporaryFile to store uploaded chunks #133

lubosmj commented Aug 8, 2020

pulpbot commented Aug 8, 2020

mdellweg Aug 10, 2020

lubosmj Aug 10, 2020

mdellweg Aug 10, 2020

ipanova Aug 10, 2020

mdellweg commented Aug 12, 2020

lubosmj commented Aug 12, 2020

mdellweg left a comment

mdellweg Aug 12, 2020

mdellweg Aug 12, 2020

mdellweg Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj Aug 12, 2020

lubosmj Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj Aug 12, 2020

mdellweg Aug 12, 2020

lubosmj commented Aug 13, 2020

lubosmj commented Aug 31, 2020

lubosmj commented Sep 16, 2020

lubosmj commented Oct 9, 2020

mdellweg commented Nov 12, 2020

	def generate_filename(instance, filename):
	"""Method for generating upload file name"""
	filename = os.path.join(instance.upload_dir, str(instance.pk) + INCOMPLETE_EXT)
	return time.strftime(filename)

		@@ -0,0 +1,2 @@
		Refactored the registry's push API to not store uploaded chunks in /var/lib/pulp, but rather

		cumulative_size = models.BigIntegerField(default=0)


		class BlobTemporaryUpload(PulpTemporaryFile):

		self._init_temporary_file(chunk)
		self._update_upload_size(chunk, chunk_size)

Use PulpTemporaryFile to store uploaded chunks #133

Use PulpTemporaryFile to store uploaded chunks #133

Conversation

lubosmj commented Aug 8, 2020

pulpbot commented Aug 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdellweg commented Aug 12, 2020

lubosmj commented Aug 12, 2020

mdellweg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lubosmj commented Aug 13, 2020

lubosmj commented Aug 31, 2020

lubosmj commented Sep 16, 2020

lubosmj commented Oct 9, 2020

mdellweg commented Nov 12, 2020