-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PulpTemporaryFile to store uploaded chunks #133
Conversation
d1b3b19
to
6857771
Compare
Attached issue: https://pulp.plan.io/issues/7218 |
9bfe37a
to
b2e1821
Compare
}, | ||
), | ||
] | ||
operations = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not modify existing migrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting errors when I removed the function generate_filename()
:
pulp_container/pulp_container/app/models.py
Lines 362 to 365 in 1380003
def generate_filename(instance, filename): | |
"""Method for generating upload file name""" | |
filename = os.path.join(instance.upload_dir, str(instance.pk) + INCOMPLETE_EXT) | |
return time.strftime(filename) |
That is why I purged this migration. Do you propose to leave the function there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, i guess yes.
06cd286
to
43a2725
Compare
CHANGES/7218.bugfix
Outdated
@@ -0,0 +1,2 @@ | |||
Refactored the registry's push API to not store uploaded chunks in /var/lib/pulp, but rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd rather mark this a feature then bugfix
closes #7218
43a2725
to
f39a0c5
Compare
I am quite confused by the size of this PR. Can you elaborate, why subclassing is neccessary instead of "just" replacing the temporary_file with a pulp_temporary_file? |
@mdellweg, the problem here is that you cannot open a file in the append mode in S3. Therefore, you should not just simply replace the instance of a temporary file. You may either delete and write an updated file back or create a bunch of files that will be later assembled into a single file. I decided to go for the latter. So, after merging the proposed changes, the registry will work in the following way (assuming that a database server is accessible to multiple Pulp instances):
Speaking of additional issues with the upload API, I did not fix anything. It should work like it was working in the past (let's say for now that it was really working). When I was working on this change, the code was seemingly working because the test passed. However, I have to made a few additional changes because podman behaved differently compared to docker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review.
But i tried to express, why i think a lot of the logic here belongs in pulpcore.
It's all about do not reinvent the wheel.
cumulative_size = models.BigIntegerField(default=0) | ||
|
||
|
||
class BlobTemporaryUpload(PulpTemporaryFile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an UploadChunk
then?
class BlobTemporaryUpload(PulpTemporaryFile): | ||
""" | ||
A model used for storing uploaded blob chunks in a temporary file. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense (and maybe the code easier) if you didn't inherit from PulpTemporaryFile
, but have a one-to-one relation with cascaded delete here?
It would render this model to be a rich (with the additional offset
) join table from Upload
to PulpTemporaryFile
.
|
||
class PulpContainerPluginAppConfig(PulpPluginAppConfig): | ||
"""Entry point for the container plugin.""" | ||
|
||
name = "pulp_container.app" | ||
label = "container" | ||
|
||
def ready(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this should be handled in pulpcore
by using PulpTemporaryFile
.
self._init_temporary_file(chunk) | ||
self._update_upload_size(chunk, chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do those two methods not know chunk
and upload
from self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Not in the time of initialization.
with NamedTemporaryFile("ab") as temp_file: | ||
while True: | ||
subchunk = chunk.read(2000000) | ||
if not subchunk: | ||
break | ||
temp_file.write(subchunk) | ||
|
||
temp_file.flush() | ||
|
||
self.file = File(open(temp_file.name, "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this part i expect to be handled by using PulpTemporaryFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we are dealing specifically with a chunk object which behaves differently compared to an ordinary file object. I think you need to read chunk in chunks, like so: chunk.read(X)
. In PulpTemporaryFile
, we are dealing just with objects of the type File
or PulpTemporaryUploadedFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require me to implement if else checks for the object's type in PulpTemporaryFile
, would not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the chunk just a stream of raw data that needs to be written to a file (on whatever storage) for later use?
Maybe we should have broken this change down in two steps:
- Handle storage without append
- Move from local files to PTFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
- The tests for S3 would not pass because I will be trying to open a file in the append mode.
- This is what I actually did.
chunks = models.BlobTemporaryUpload.objects.filter(upload=upload).order_by("offset") | ||
chunks_files = map(lambda chunk: chunk.file, chunks) | ||
|
||
with NamedTemporaryFile("ab") as temp_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a "turn this PulpTemporaryFile
into an artifact primitive suitable for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since I am constructing an Artifact object from a bunch of PulpTemporaryFile objects, like I did mention above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, i thought you assembled all the chunks into a new PTFile, but then again you can just create the Artifact directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot create an Artifact directly because it is not possible to open files in the append mode. I have to iterate through chunks, merge them, and then create an Artifact on a single write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is what i meant. I first thought you'd "upload" that (locally) assembled file into a new PTFile to make an artifact from it. But that step is clearly unnecessary, because you can create the artifact directly from that assembled file.
I am putting this PR on hold due to the recent findings of errors in |
It looks like all the issues related to |
The issue has been addressed in pulpcore. Refer to pulp/pulpcore#914. |
I am closing this PR in favour of pulp/pulpcore#914. Additional changes (if any) will be made in a separate PR. The PR will reference the same issue number. |
@lubosmj with pulp/pulpcore#914 merged, this can be picked up again, right? |
closes #7218