Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add ahead of time compression of the static files for x86_64 #4390

Merged
merged 3 commits into from Oct 20, 2023

Conversation

stumpylog
Copy link
Member

@stumpylog stumpylog commented Oct 17, 2023

Proposed change

This PR adds compression of the static files, using whitenoise. The static files are compressed once, on x86_64, then used throughout.

I considered making it more configurable, but then I think it would be easy for a user to change this and not run collectstatic, leaving them with no or wrong static files in place. Or a "release" vs "development" build, but that proved to become complicated quickly.

Perhaps in the future GitHub will have native arm64 runners, which would allow this to be revisited and simplified. Some improvements to whitenoise could be built in as well.

In some testing, this reduced main.js to about 50% size when using gzip and to about 40% size when using Brotli (requires HTTPs).

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other (please explain):

Checklist:

  • I have read & agree with the contributing guidelines.
  • If applicable, I have included testing coverage for new code in this PR, for backend and / or front-end changes.
  • If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
  • If applicable, I have checked that all tests pass, see documentation.
  • I have run all pre-commit hooks, see documentation.
  • I have made corresponding changes to the documentation as needed.
  • I have checked my modifications for any breaking changes.

@paperless-ngx-secretary paperless-ngx-secretary bot added backend non-trivial Requires approval by several team members labels Oct 17, 2023
@github-actions github-actions bot added the enhancement New feature label Oct 17, 2023
@shamoon shamoon added this to the v2.0.0 milestone Oct 17, 2023
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Merging #4390 (2ec080a) into dev (9880f9e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##              dev    #4390   +/-   ##
=======================================
  Coverage   95.89%   95.90%           
=======================================
  Files         359      359           
  Lines       13706    13709    +3     
  Branches     1094     1094           
=======================================
+ Hits        13144    13147    +3     
  Misses        558      558           
  Partials        4        4           
Flag Coverage Δ
backend 94.53% <100.00%> (+<0.01%) ⬆️
frontend 97.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/paperless/settings.py 89.52% <100.00%> (+0.10%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@stumpylog stumpylog changed the title Feature: Add ahead of time compression of the static files Feature: Add ahead of time compression of the static files for x86_64 Oct 17, 2023
@stumpylog stumpylog marked this pull request as ready for review October 17, 2023 18:00
@stumpylog stumpylog requested a review from a team as a code owner October 17, 2023 18:00
@stumpylog stumpylog force-pushed the feature-compressed-static-files branch from d362780 to 55d9006 Compare October 17, 2023 18:00
@shamoon
Copy link
Member

shamoon commented Oct 17, 2023

This seems useful (and patching PRs into other OSS projects seems to be a theme lately...), but pardon my ignorance in trying to test this out:

  • No issues running on either x86_64 or arm64 machines I have.
  • On both e.g. main.js seems to be the same size, and seems to be served with brotli even when testing dev on localhost vs this branch either locally or via https, assuming that's what this Content-Encoding: br means? Am I missing something here? Again my understanding of this stuff fairly limited.

Also, just curious if this change requires the css map to be added or did you just not like seeing the error log =)? No problem with that obv, just curious

@stumpylog
Copy link
Member Author

stumpylog commented Oct 17, 2023

  • The CSS map was required. I'm not sure why, but collectstatic failed without it being there.
  • Do you have PAPERLESS_ENABLE_COMPRESSION set to anything? That would be doing some "on the fly" compression, though I still think Brotli should require an HTTPS connection.
    • I definitely saw the difference when doing no compression on the fly with just gzip, but I didn't get the Brotli compression until I used HTTPs. I can grab some more data later.

@shamoon
Copy link
Member

shamoon commented Oct 17, 2023

I dont have PAPERLESS_ENABLE_COMPRESSION explicitly set, no.

So like here's dev with PAPERLESS_ENABLE_COMPRESSION=0:
Screenshot 2023-10-17 at 12 11 04 PM
Screenshot 2023-10-17 at 12 11 12 PM

dev with PAPERLESS_ENABLE_COMPRESSION left off:
Screenshot 2023-10-17 at 12 13 26 PM
Screenshot 2023-10-17 at 12 13 21 PM

and feature-compressed-static-files:
Screenshot 2023-10-17 at 12 14 23 PM
Screenshot 2023-10-17 at 12 14 30 PM

Obviously, I believe that this works =) just was trying to due diligence to test out. Sorry for the hassle.

@stumpylog
Copy link
Member Author

Very curiously different than what I recall seeing. I'll have to look again when I can. I recall main.js coming in at around 1MB and never being compressed for me, when using localhost or a hostname connection. Through my domain, Traefik is handling compression, so it doesn't count.

Now I'm curious about this.

@shamoon
Copy link
Member

shamoon commented Oct 17, 2023

Yea my domain is behind CF and Traefik so at first I was like "oh duh, its those" but I dont understand who/what/how/why is doing it over localhost (not https either). This stuff is really opaque to me 🤔

@stumpylog
Copy link
Member Author

With dev, when I set PAPERLESS_ENABLE_COMPRESSION: 0, I get no compression of main.js and a size of ~1.8 MB:
image

It does look like Brotli is allowed for localhost, so that's actually a good thing for this PR. I'm not sure how your dev was still returning a compressed response when it was disabled. Cache maybe?

When I have feature-compressed-static-files and PAPERLESS_ENABLE_COMPRESSION: 0, main.js is coming in Brotli encoded with ~379 kb:
image

As far as i can tell, this is working as intended, and it's doing the compression ahead of time, rather than per request, so I think it will be helpful (at least for x86_64 at the moment)

Copy link
Member

@shamoon shamoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blah, sorry my bad, I think just user error on my part.

Thanks for the PR

@stumpylog stumpylog changed the title Feature: Add ahead of time compression of the static files for x86_64 Feature: Add ahead of time compression of the static files Oct 19, 2023
@stumpylog stumpylog force-pushed the feature-compressed-static-files branch from 72f420a to 252806a Compare October 19, 2023 15:25
@stumpylog
Copy link
Member Author

Alright, this should be the final form. I ended up finding some other improvements with the image here, so the resulting image actually shrinks, despite adding more files (in the form of the compressed files).

So the new flow of data is something like this:

  • Frontend is built with Node 20
    • this happens only for x86_64, because it's not dependent on the architecture
  • Backend and built frontend are copied to the compression stage
    • this also happens only for x86_64, for the same reasons
    • a minimal set of Python dependencies are installed
    • the static files are collected and post processed to produce the file, the .gz and the .br files
  • the whole folder (built frontend, backend, processed static files) are copied to the final image
    • the chown on the COPY reduces the image size by ~190MB

@shamoon
Copy link
Member

shamoon commented Oct 19, 2023

Appreciate the explanation. I do mostly understand the flow and overall goal but code is still mostly Greek to me. I think at this point you've earned trust with the project to do as you see fit. For my part, can confirm working from an end-user perspective. Thanks stumpy

@stumpylog stumpylog force-pushed the feature-compressed-static-files branch from 6470cfe to 83a6799 Compare October 20, 2023 00:37
@stumpylog
Copy link
Member Author

Weird, not sure why this keeps getting stuck. That's at least twice now, at different times in the build. I may have to go back to the simpler method

@stumpylog stumpylog force-pushed the feature-compressed-static-files branch from 83a6799 to 2ec080a Compare October 20, 2023 15:17
@stumpylog stumpylog requested a review from a team as a code owner October 20, 2023 15:17
@stumpylog stumpylog changed the title Feature: Add ahead of time compression of the static files Feature: Add ahead of time compression of the static files for x86_64 Oct 20, 2023
@stumpylog
Copy link
Member Author

Alright, I ended up going back to the simple method which doesn't add much time at all to the build. Even with the attempts earlier, the ARM64 build seemed to be taking longer and getting stuck in weird ways. The 1hr+ to compress the files under emulation just isn't worth it. Our ARM64 users can still get the on-the-fly compression from the middleware.

Once this finishes building and I double check it on both platforms, I'll merge this one and be done

@stumpylog stumpylog merged commit d480e91 into dev Oct 20, 2023
53 checks passed
@stumpylog stumpylog deleted the feature-compressed-static-files branch October 20, 2023 23:22
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend enhancement New feature non-trivial Requires approval by several team members
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants