Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PulpTemporaryFile to store uploaded chunks #133

Closed

Conversation

lubosmj
Copy link
Member

@lubosmj lubosmj commented Aug 8, 2020

closes #7218

@lubosmj lubosmj force-pushed the pulp-temporary-file-storage-7218 branch from d1b3b19 to 6857771 Compare August 8, 2020 21:06
@pulpbot
Copy link
Member

pulpbot commented Aug 8, 2020

Attached issue: https://pulp.plan.io/issues/7218

@lubosmj lubosmj force-pushed the pulp-temporary-file-storage-7218 branch 2 times, most recently from 9bfe37a to b2e1821 Compare August 8, 2020 22:24
},
),
]
operations = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not modify existing migrations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting errors when I removed the function generate_filename():

def generate_filename(instance, filename):
"""Method for generating upload file name"""
filename = os.path.join(instance.upload_dir, str(instance.pk) + INCOMPLETE_EXT)
return time.strftime(filename)
.
That is why I purged this migration. Do you propose to leave the function there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, i guess yes.

@lubosmj lubosmj force-pushed the pulp-temporary-file-storage-7218 branch 3 times, most recently from 06cd286 to 43a2725 Compare August 10, 2020 12:44
@@ -0,0 +1,2 @@
Refactored the registry's push API to not store uploaded chunks in /var/lib/pulp, but rather
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather mark this a feature then bugfix

@lubosmj lubosmj force-pushed the pulp-temporary-file-storage-7218 branch from 43a2725 to f39a0c5 Compare August 10, 2020 17:40
@mdellweg
Copy link
Member

I am quite confused by the size of this PR. Can you elaborate, why subclassing is neccessary instead of "just" replacing the temporary_file with a pulp_temporary_file?
Did you fix additional issues with the registry upload api?

@lubosmj
Copy link
Member Author

lubosmj commented Aug 12, 2020

@mdellweg, the problem here is that you cannot open a file in the append mode in S3. Therefore, you should not just simply replace the instance of a temporary file. You may either delete and write an updated file back or create a bunch of files that will be later assembled into a single file. I decided to go for the latter.

So, after merging the proposed changes, the registry will work in the following way (assuming that a database server is accessible to multiple Pulp instances):

  1. Create an Upload object (identifying one upload within a repository).
  2. Save each uploaded chunk to the S3 storage as a temporary file.
  3. When the upload finishes, build an artifact from the temporary files.
  4. Delete temporary files and the Upload object.

Speaking of additional issues with the upload API, I did not fix anything. It should work like it was working in the past (let's say for now that it was really working). When I was working on this change, the code was seemingly working because the test passed. However, I have to made a few additional changes because podman behaved differently compared to docker.

Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review.
But i tried to express, why i think a lot of the logic here belongs in pulpcore.
It's all about do not reinvent the wheel.

cumulative_size = models.BigIntegerField(default=0)


class BlobTemporaryUpload(PulpTemporaryFile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an UploadChunk then?

class BlobTemporaryUpload(PulpTemporaryFile):
"""
A model used for storing uploaded blob chunks in a temporary file.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense (and maybe the code easier) if you didn't inherit from PulpTemporaryFile, but have a one-to-one relation with cascaded delete here?
It would render this model to be a rich (with the additional offset) join table from Upload to PulpTemporaryFile.


class PulpContainerPluginAppConfig(PulpPluginAppConfig):
"""Entry point for the container plugin."""

name = "pulp_container.app"
label = "container"

def ready(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this should be handled in pulpcore by using PulpTemporaryFile.

Comment on lines +387 to +388
self._init_temporary_file(chunk)
self._update_upload_size(chunk, chunk_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those two methods not know chunk and upload from self?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Not in the time of initialization.

Comment on lines +392 to +401
with NamedTemporaryFile("ab") as temp_file:
while True:
subchunk = chunk.read(2000000)
if not subchunk:
break
temp_file.write(subchunk)

temp_file.flush()

self.file = File(open(temp_file.name, "rb"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this part i expect to be handled by using PulpTemporaryFile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we are dealing specifically with a chunk object which behaves differently compared to an ordinary file object. I think you need to read chunk in chunks, like so: chunk.read(X). In PulpTemporaryFile, we are dealing just with objects of the type File or PulpTemporaryUploadedFile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require me to implement if else checks for the object's type in PulpTemporaryFile, would not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the chunk just a stream of raw data that needs to be written to a file (on whatever storage) for later use?

Maybe we should have broken this change down in two steps:

  1. Handle storage without append
  2. Move from local files to PTFile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

  1. The tests for S3 would not pass because I will be trying to open a file in the append mode.
  2. This is what I actually did.

chunks = models.BlobTemporaryUpload.objects.filter(upload=upload).order_by("offset")
chunks_files = map(lambda chunk: chunk.file, chunks)

with NamedTemporaryFile("ab") as temp_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a "turn this PulpTemporaryFile into an artifact primitive suitable for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since I am constructing an Artifact object from a bunch of PulpTemporaryFile objects, like I did mention above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i thought you assembled all the chunks into a new PTFile, but then again you can just create the Artifact directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot create an Artifact directly because it is not possible to open files in the append mode. I have to iterate through chunks, merge them, and then create an Artifact on a single write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what i meant. I first thought you'd "upload" that (locally) assembled file into a new PTFile to make an artifact from it. But that step is clearly unnecessary, because you can create the artifact directly from that assembled file.

@lubosmj lubosmj marked this pull request as draft August 13, 2020 11:01
@lubosmj
Copy link
Member Author

lubosmj commented Aug 13, 2020

I am putting this PR on hold due to the recent findings of errors in PulpTemporaryFile. The discussion about whether we want to handle the removal of temporary files in plugins or in pulpcore was moved here: pulp/pulpcore#844.

@lubosmj
Copy link
Member Author

lubosmj commented Aug 31, 2020

It looks like all the issues related to PulpTemporaryFile have been resolved. I will continue working on this PR soon.

@lubosmj
Copy link
Member Author

lubosmj commented Sep 16, 2020

The issue has been addressed in pulpcore. Refer to pulp/pulpcore#914.

@lubosmj
Copy link
Member Author

lubosmj commented Oct 9, 2020

I am closing this PR in favour of pulp/pulpcore#914. Additional changes (if any) will be made in a separate PR. The PR will reference the same issue number.

@lubosmj lubosmj closed this Oct 9, 2020
@mdellweg
Copy link
Member

@lubosmj with pulp/pulpcore#914 merged, this can be picked up again, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants