Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Introduce threading to improve upload speed #46

Closed
wants to merge 1 commit into from

Conversation

ziafazal
Copy link
Collaborator

@ziafazal ziafazal commented Oct 3, 2023

This PR introduce threading to concurrently handle file extract and upload, which would improve upload speed of SCORM package and fix #42

@ziafazal ziafazal requested a review from regisb October 3, 2023 10:20
else:
self.zip_root_path = root_path

max_workers = self.xblock_settings.get("THREADPOOLEXECUTOR_MAX_WORKERS", 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is unzipping each file one by one the part that is causing very slow response times?

I'd like to avoid resorting to multithreading, if possible. It's a little too complex for my taste. Could we unzip locally to a temporary directory before uploading the folder in one go?

Copy link
Collaborator Author

@ziafazal ziafazal Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that issue reporter mentioned slow upload speed over S3. While uploading scorm package we extract it and then upload/save each file one by one. It appears extracting(unzipping) the files is not taking that much time because reporter mentioned speed on local storage is fine which points to upload to S3 being cause of slowness. These changes introduce parallel upload to S3 to reduce upload time.

Copy link
Contributor

@regisb regisb Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you manage to reproduce the issue? I understand it would be a bit tricky to setup an S3 bucket just for that purpose. But I would like to be sure that we're addressing the right thing here. @dyudyunov are you able to test this fix?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi and thanks for mentioning me!

I'll try the fix ASAP (most probably on the next week)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyudyunov have you been able to test it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upload time was close to 30sec according to logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Now I'm wondering whether we can find a solution that does not involve multithreading. Is there a native storage API that would support copying an entire directory, instead of copying data file by file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that in django-storages>=1.14 which is being used in Quince release it is using Threading to optimize upload of files on S3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do we still have to implement threading ourselves or is it handled natively by django-storages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discuss plan is to implement it ourselves for pre quince version and In quince we should remove our implementation and rely on multi threading support of django-storages.

Introduce threading to concurrently handle file extract and upload

chore: added TODO comment
@ziafazal
Copy link
Collaborator Author

ziafazal commented Nov 7, 2023

@regisb could you please take look at this one?

@Abdul-Muqadim-Arbisoft
Copy link

I recently uploaded a SCORM package, sized at 6.2 MB, and it took approximately 67 seconds to complete. My network speeds at the time were as follows: download speed at 14.5 Mbps and upload speed at 18.5 Mbps.

@regisb
Copy link
Contributor

regisb commented Feb 13, 2024

@Abdul-Muqadim-Arbisoft thanks for the feedback. Which version of Tutor did you use? Did you also try this PR?

@Abdul-Muqadim-Arbisoft
Copy link

@Abdul-Muqadim-Arbisoft thanks for the feedback. Which version of Tutor did you use? Did you also try this PR?

I used tutor quince and the above stats I provided are for this pr

@ziafazal
Copy link
Collaborator Author

@Abdul-Muqadim-Arbisoft this issue was report with tutor 16 or prior version so you have test it with tutor 16 or less.

@Abdul-Muqadim2000
Copy link

Abdul-Muqadim2000 commented Feb 20, 2024

@Abdul-Muqadim-Arbisoft this issue was report with tutor 16 or prior version so you have test it with tutor 16 or less.

@ziafazal I will test it today with tutor 16

@Abdul-Muqadim-Arbisoft
Copy link

Abdul-Muqadim-Arbisoft commented Feb 27, 2024

I tested it on tutor 16.1.8, it uploads a SCORM package with s3 storage configured, sized at 6.2 MB, and it takes approximately 7 minutes 30 secs to complete. My network speeds at the time were as follows: download speed at 30 Mbps and upload speed at 19.3 Mbps.

@DawoudSheraz
Copy link

Closing this PR as 1) the issue is not present on quince, and 2) the changes/fix do not work entirely as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Won't fix
Development

Successfully merging this pull request may close these issues.

Abnormal S3 file upload speed of 30s/MB is causing timeouts
6 participants