Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added temporary storage via settings flag #39

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

GitRon
Copy link

@GitRon GitRon commented Mar 28, 2018

If you use AWS S3 the default storage is not working. The chunks cannot be created and updated.

I added the possibility to activate cross-platform temporary storage. I cannot use CHUNKED_UPLOAD_STORAGE_CLASS because I need to pass the location paramter to the storage.

This is not working.

I decided not to change the way CHUNKED_UPLOAD_STORAGE_CLASS works for better compatibility.

@GitRon
Copy link
Author

GitRon commented Mar 28, 2018

@juliomalegria Please have a look at my PR and consider merging it. In combination with django-storages 1.6.6 and boto3 to works for Amazon S3.

@GitRon
Copy link
Author

GitRon commented May 28, 2018

@juliomalegria Any chance of getting this in the master branch and release a new version?

@GitRon
Copy link
Author

GitRon commented Aug 7, 2018

No update so far?

@jerinpetergeorge jerinpetergeorge added this to the Version 2.0 milestone Nov 30, 2019
@jerinpetergeorge jerinpetergeorge removed this from the Version 2.0 milestone Dec 15, 2019
Ronny Vedrilla added 2 commits December 16, 2019 14:20
# Conflicts:
#	README.rst
#	chunked_upload/settings.py
@GitRon
Copy link
Author

GitRon commented Dec 16, 2019

@jerinpetergeorge I fixed the merge conflicts. Whats your opinion on this topic?

@jerinpetergeorge
Copy link
Collaborator

I have tried the new changes in my local machine. Unfortunately, it's not uploaded to S3

Step to reproduce

Part-1

  1. Download the demo project and supporting files from Here
  2. Extract the zip
  3. create a virtual environment and install requirements.txt (our package not included in requirements.txt since I've added the relevant modules added to the demo)
  4. go to django-chunked-upload-demo/settings.py and fill the following keys,
    1. AWS_ACCESS_KEY_ID
    2. AWS_SECRET_ACCESS_KEY
    3. AWS_STORAGE_BUCKET_NAME
  5. apply migrations
  6. create a superuser
  7. run the Django development server

Part-2 ( checks whether default s3 media/file upload working or not)

  1. go Django admin
  2. Create a new instance of FileUploadTest instance by uploading a file
  3. Checks S3 bucket after instance creation [I was able to successfully upload file to S3 bucket]

Part-3 ( checking CHUNKED_UPLOAD_USE_TEMP_STORAGE settings)

  1. got to root URL
  2. upload any file
  3. Checks S3 bucket after instance creation [upload completed without any errors, But, the file was not uploaded to S3 bucket]

Did I miss something? @GitRon

@jerinpetergeorge
Copy link
Collaborator

jerinpetergeorge commented Dec 18, 2019

@GitRon
I think we should initiate the S3 upload using upload_file() method when we receive the last chunk from the client. But, if we try to upload the whole file to S3 at that moment, probably the NGNIX would close the connection from the browser client.


and there is another method called, upload_part(), it seems like we could upload the chunks in realtime to the S3 bucket.

@GitRon
Copy link
Author

GitRon commented Dec 19, 2019

@jerinpetergeorge I'm sorry but I cant follow you. We are using boto3 as a connector and its file storage class. So you dont have to worry about the upload at all.
The only thing what you need to take care about is that you cannot write the chunks to the target location.

@jerinpetergeorge
Copy link
Collaborator

Oops 😣

Have you tried to reproduce the behavior that I mentioned here ??

Anyway, let me ask you some doubt regarding this, that may help you to understand what I am trying to say

  1. What are the possible values of CHUNKED_UPLOAD_USE_TEMP_STORAGE?
  2. If CHUNKED_UPLOAD_USE_TEMP_STORAGE only support boolean values, (ie True and False)
    1. how can I/ where should I specify my s3 target location?
    2. after the successful completion of chunk upload to the temporary location of the server (the place where the server currently running), how the file will get uploaded to the S3? Should I do it manually? Will Django handle it internally?

OR,

Could you provide a minimal example that demonstrates the S3 upload?
@GitRon

@GitRon
Copy link
Author

GitRon commented Dec 20, 2019

Ok, I think we are talking not about the same thing here 😅

My changes are not about ENABLING cloud storag uploads. Its just prerequisite.

If you are writing to a file storage like a hard drive, it does not matter where you park the chunks. On an object storage it does because you cannot write to an object storage. It only accepts full files as an object.

I gave it a second thought and maybe we should always - without the settings flag - write the chunks to the temp storage. I think this is the main reason why there is a temp-dir, right? To contain temporary data.

What do you think?

@jerinpetergeorge
Copy link
Collaborator

Storing the chunks to a temporary location and moving it to the target location after completion is a good idea. I think we should implement this first before the S3 thing.
Apart from that, If someone using s3 storage as their default storage system, they would expect the file to be uploaded to the S3 bucket. So we should implement a full-fledged s3 upload capability here.

@GitRon
Copy link
Author

GitRon commented Dec 20, 2019

Perfect, I'll change my PR then.

@GitRon
Copy link
Author

GitRon commented Dec 20, 2019

@jerinpetergeorge Ok, the behavior is now as follows:

If you define no storage, it takes my temporary storage. But you can override it with your settings var if you like.

I like this solution, good that we talked about it 👍

@jerinpetergeorge
Copy link
Collaborator

jerinpetergeorge commented Dec 21, 2019

Great! but this is only half of the solution. We should move the completed file to CHUNKED_UPLOAD_PATH when a user uses the default settings, shouldn't we? @GitRon

@GitRon
Copy link
Author

GitRon commented Jan 2, 2020

@jerinpetergeorge Sound good. But doesn't it already work like this? I just collect the chunks in temp storage.

@GitRon
Copy link
Author

GitRon commented Jan 18, 2021

@jerinpetergeorge Somebody asked for my fix on this issue. Whats your status?

I pointed out - after thinging about the things that happended - that me might want to avoid renaming variables etc. To maybe make it stable and compatible again? What do you think? 😃

@jerinpetergeorge
Copy link
Collaborator

Sorry for the utter delay from my side. Thanks for your effort @GitRon and please do let me give a couple of days to recollect the project context ( 😲 ) and review the PR as I am completely slipped away from this.

@jerinpetergeorge
Copy link
Collaborator

@GitRon Thanks for the PR. But, It is hard for me to review the PR since I have slipped away from the context, as I already said. Also, I don't think I can make it to work in the near future.

So, I would recommend you to ask @juliomalegria to make you a collaborator of this repo or fork this project and maintain it yourself.

@GitRon
Copy link
Author

GitRon commented Jan 26, 2021

@jerinpetergeorge @juliomalegria Sure, if you give me permission I'd fix the bug people were complaining about. Would you be able to deploy a new version at the time?

@jerinpetergeorge
Copy link
Collaborator

I won't be able to do that 😞 You can contact @juliomalegria via his personal mail (juliomalegria@gmail.com got from his personal site). If he didn't respond within a week, better you fork the project.

FYI: There is a place called jazzband, and I think we can move this project there and which will help to get more exposure. There are some guidelines to make it work, jazzband guidelines, please have a look.

@GitRon

@GitRon
Copy link
Author

GitRon commented Jan 27, 2021

@jerinpetergeorge I know, you were just in CC. But good idea with the personal email address! Thx!

And good idea for jazzband as well!

@GitRon
Copy link
Author

GitRon commented Feb 4, 2021

@jerinpetergeorge I wrote him. We'll see what happens.

@cr0mbly
Copy link

cr0mbly commented Feb 18, 2021

Any luck with this, I echo the use case that chunks being uploaded to S3 directly would be preferential.

@GitRon
Copy link
Author

GitRon commented Feb 19, 2021

@cr0mbly I am not sure if this is possible and wanted. S3 is not a file system but an object storage. So you'd expect the full file as an object, not some bunch of chunks.

Furthermore: What happens if something goes sideways during the upload? Then you have to clean up some leftover chunks?

IMHO the local temp approach is the cleaner solution. 😃

@cr0mbly
Copy link

cr0mbly commented Feb 21, 2021

@cr0mbly I am not sure if this is possible and wanted. S3 is not a file system but an object storage. So you'd expect the full file as an object, not some bunch of chunks.

Yep thanks for this. I just put some work in on my own to integrate S3 as the default storage and came up against the fact that we can't append file to an S3 object. This is fine for small files as you can just keep updating the file during the backend but when you get larger files loading and downloading the full object into memory becomes less and less preformative.

Just for anyone else looking at this with autoscaling setup we left it as tempstorage and used EFS to maintain persistence of the filesystem during chunk upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants