Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize build speed #144

Closed
4 of 7 tasks
appieschot opened this issue May 3, 2022 · 23 comments · Fixed by #621
Closed
4 of 7 tasks

Optimize build speed #144

appieschot opened this issue May 3, 2022 · 23 comments · Fixed by #621
Assignees
Labels
enhancement New feature or request

Comments

@appieschot
Copy link
Member

appieschot commented May 3, 2022

Our current implementation generates a Small, Medium and Large image size for each image in our blog. We should drop one of these sizes to improve the total blog size (1.7gb current release).

image

@appieschot appieschot added the enhancement New feature or request label May 3, 2022
@appieschot appieschot self-assigned this May 3, 2022
@appieschot
Copy link
Member Author

  • remark for self check file size naming (medium and large seem to be used interchangeable)

appieschot pushed a commit to appieschot/pnp-blog that referenced this issue May 6, 2022
appieschot pushed a commit to appieschot/pnp-blog that referenced this issue May 6, 2022
appieschot added a commit that referenced this issue May 6, 2022
Removed public folder and updated gitignore. Related to #144
appieschot added a commit to appieschot/pnp-blog that referenced this issue May 11, 2022
appieschot added a commit to appieschot/pnp-blog that referenced this issue May 11, 2022
appieschot added a commit to appieschot/pnp-blog that referenced this issue May 11, 2022
Removed public folder and updated gitignore. Related to pnp#144
@appieschot
Copy link
Member Author

Might not be needed if we use a dept of 1 to do a shallow clone (thanks to a tip from @waldekmastykarz). Will investigate after the holidays

@LuiseFreese
Copy link
Collaborator

Hey @appieschot you still willing to help on this one? Build times are painfully long 😭

appieschot added a commit to appieschot/pnp-blog that referenced this issue Sep 7, 2022
@appieschot
Copy link
Member Author

We have two things running:

  • Deploy to GitHub Pages; triggered on each PR as it is linked to our main branch
  • pages build and deployment: triggered by the github-pages[bot]

In both those workflows different things happen:

  • Our deploy to GH pages takes roughly 5 minutes to checkout all content (just improved with Implemented shallow build. Solves #144 #558 to 2:30 so it's something 🤷)
  • Our Hugo build takes around 12 minutes, we can speed that up by removing one of the image sizes as the majority of the hugo time is spend on those images' sizes. Will have a look at that (requires quite some work as we use those sizes everywhere and we do not want to screw up our look & feel)
  • We should also add the latest version of Hugo and see if the problems 0.101.0 had are solved (and perhaps improves the speed)
  • Zipping the output takes time (since the whole site with all images is 3.3gb that takes time; part of it will be solved by removing 1 of the image sizes. But we also have a few pptx files in our content that hit the 60mb per file..), they total 2gb of file size of our content. I feel that we can majorly improve if we would move those pptx files out of the content. @LuiseFreese how do you feel about that? Could we store the PPTX on OneDrive or SharePoint and generate links in our content? Quick guestimate is we can shave of 66% of our upload time just by deleting those files ..

Once those steps are done, we start the actual deployment if a PR is approved

  • it takes 23 minutes to upload those 3.3GB, assuming we remove the PPTX we can shave of 66% of that as well (still a guestimate as larger files upload a tad quicker then smaller files)

I would opt to fix the pptx, or convert all PPTX to PDF to drastically reduce the size. Happy to hear other input! 🦾

LuiseFreese pushed a commit that referenced this issue Sep 8, 2022
Implemented shalow build. Solves #144
@LuiseFreese
Copy link
Collaborator

hey there @appieschot , first - THANKS for your work - you rock 🎸

  • Please do feel free to run a test with the latest Hugo version - I assume we can switch back if it breaks again.
  • Maybe its a good idea to first try to convert all pptx to PDFs or to reduce image size in the pptx with this compress pictures feature in PowerPoint
    image

Sharing the PPTX via OneDrive/SharePoint would be the last resort I think

@appieschot
Copy link
Member Author

@LuiseFreese ill have a look if we can automate the pptx to pdf with some form of script; reducing image size feels way more effort as we would need to go to the master slide and remove un-used master slides as well. Printing to PDF most likely is quicker. Will report back (and updated initial issue with a checklist to keep track of status)

@appieschot
Copy link
Member Author

@LuiseFreese started working on resizing some of the largers pptx but they are not referenced in the content so far... Any other place they might be linked from?

appieschot added a commit to appieschot/pnp-blog that referenced this issue Sep 9, 2022
Swapped pptx for pdf; saved 20% of total size
LuiseFreese pushed a commit that referenced this issue Sep 14, 2022
Swapped pptx for pdf; saved 20% of total size
@LuiseFreese
Copy link
Collaborator

How will we proceed with this one? I can instruct Andrew Benson (who writes the community call posts) to save the PPTX as PDF and upload that.

@appieschot
Copy link
Member Author

Sounds like a great plan ;-). I'll do a few more PR's to do the swap out the other PPT's as well and then start with the images. Lets see if we can do a few iterations to improve!

@appieschot appieschot changed the title Optimize build speed by dropping one of the image sizes Optimize build speed Sep 14, 2022
appieschot added a commit that referenced this issue Sep 14, 2022
* [chore] cleared out m365 platform pptx
* [chore] saved all pptx > 10mb to pdf
@appieschot
Copy link
Member Author

image

Resizing does bring it down a bit ;-). Not there yet but the first step is taken 🦾

appieschot added a commit to appieschot/pnp-blog that referenced this issue Sep 14, 2022
appieschot added a commit to appieschot/pnp-blog that referenced this issue Sep 14, 2022
@appieschot
Copy link
Member Author

Ran a version with the --templateMetrics online (image in the initial issue), but results online differ quite a bit:

image

Will investigate if we can speed up some of those things by stripping some stuff.

LuiseFreese pushed a commit that referenced this issue Sep 26, 2022
* Updated hugo version. Partially solves #144

* [chore] Removed duplicate section check

* [chore] build improvements
- decreased image quality
- removed smalles image size to reduce total images
- cached set of partials
appieschot added a commit that referenced this issue Sep 26, 2022
appieschot added a commit that referenced this issue Sep 26, 2022
Revert "Updated hugo version. Partially solves #144 (#576)"

This reverts commit b94f820.
@andrewconnell
Copy link
Collaborator

Hey @appieschot, @LuiseFreese pinged me and asked if I wanted to assist. I've been doing a bunch of work on my Hugo sites lately & heavily leveraging the image processing stuff (which negatively impacted build times, but not too bad).

Interested in another set of eyes?

@appieschot
Copy link
Member Author

@andrewconnell would love to get some input. Agree that images are quite slow so a few observations I made so far:

  • Seems like images started breaking after a certain Hugo version
  • Feels like images are not cached in our build pipeline so everything gets generated evertyime we build
  • the single.html checks for a whole lot of sections and that slows the build quite a lot on our pipeline

So more than happy to learn what you think would be the best plan of attack :)

@andrewconnell
Copy link
Collaborator

Cool... couple of questions:

  • Is the goal to (a) optimize for build speed (b) at the expense of making the repo bigger?
  • Are there any options for externally hosting some larger assets that don't need to be processed, like a slide deck or ZIP?
  • What do you mean by images breaking? Two of my sites are on Hugo v0.104.x, both use the image processing capability and are working fine (making sure images in the content area are no wider than Xpx & dynamically generating the open graph images). Got a pointer to something specific?

Cards on the table: I have one Hugo site that has ~2k content pages & makes heavy use of Hugo's image processing. I favor a smaller repo over longer build times because my build times for deployment aren't a concern (build+deploy = ~16 minutes)... locally it takes MUCH less.

@appieschot
Copy link
Member Author

appieschot commented Sep 28, 2022

  • Is the goal to (a) optimize for build speed (b) at the expense of making the repo bigger?

I favor a smaller repo as the current upload of 3gb is quite long, but would assume that we could cache the results of the images to prevent the build process issues we have now; Hugo just stops working in the latest version after like 12m

  • Are there any options for externally hosting some larger assets that don't need to be processed, like a slide deck or ZIP?

For me those are on the table; we recently saved all >10mb slide decks to pdf to save over 1gb of repo storage. If we would offload some of that content we need to figure out where and involve some other ppl to get that working but I am all for that.

  • What do you mean by images breaking? Two of my sites are on Hugo v0.104.x, both use the image processing capability and are working fine (making sure images in the content area are no wider than Xpx & dynamically generating the open graph images). Got a pointer to something specific?

We are using version 0.100.2, anything after that is currently not building with timeouts. See some thread here: #424. I basicaly gave up figuring out why the latest hugo version would break. My feeling is that is has to do with _default/single.html and the section selection we do but I have not investigated further.

Cards on the table: I have one Hugo site that has ~2k content pages & makes heavy use of Hugo's image processing. I favor a smaller repo over longer build times because my build times for deployment aren't a concern (build+deploy = ~16 minutes)... locally it takes MUCH less.

Agree with this, but given the 30 to 45 minutes we have seen with the current release pipeline we would love to bring that down a bit. Since we have around the same amount of content pages in the blog I would be happy if we can hit those 16 minutes ish :)

@andrewconnell
Copy link
Collaborator

Cool... thanks for the detailed response. Lemme take a few stabs and see if I can't figure something out.

@andrewconnell
Copy link
Collaborator

How much flexibility do I/we have in making changes? Because... the first thing I looked at are huge files in the repo...

... and...

Not only do most of the archetypes contain an 8MB together-mode.gif sample animation, but those animated GIFs, which some are HUGE (3 over 20MB, 6 over 10MB), account for over 815MB in the repo.

... and every single one is processed using Hugo's image processing.

There are over 185 of these files in the repo, NOT including those that have been cached in the image processing. My question: are these really necessary? They are clearly having a significant impact on the build time.

I haven't gone through the trouble seeing what kind of an impact it will have, because it's going to require making a lot of edits to remove those images from content pages so the builds don't throw errors... but if we're open to purging these, I can see what kind of an impact it would have on the repo.

Personally, I can't see the value in keeping these around. I get using them in social media sharing, but in the content page?

Thoughts?

@appieschot
Copy link
Member Author

I won't mind purging them, @LuiseFreese how do you feel?

@LuiseFreese
Copy link
Collaborator

LuiseFreese commented Sep 28, 2022 via email

@andrewconnell
Copy link
Collaborator

Correction... it does NOT look like those files are being image processed... they're just contributing to the huge size of the repo. But it's not just the GIFs... there are a lot of large files. The last PPTX was 86MB. Moving these large files will impact the build times but not as much as I was implying.

@LuiseFreese said:

  1. Educate Andrew B. - who publishes the community call blog posts on how to reduce the size of a gif
  2. Keep the gif for 2 weeks, then replace by a link to the SoMe post (I can easily do that as a recurring clean up task)

If you're trying to reduce the build times AND the size of the repo, this won't help. You're adding a large file to the repo, then removing it, but it's still in history. So, anyone who clones the repo, unless if they do a shallow clone, will get the history including those large files.

When I cloned the repo, it was 6GB... which is quite large for a content site.

The bigger issue with these animated GIF's is they're hurting SEO for the page. Google penalizes these pages for a bad mobile practice because it's impossible for a user to avoid downloading & rendering an animated GIF. That's unlike the PPTX files which the user must manually download.

So this really comes down to two things... what's more important / priority on the site?

  1. SEO / reach of the page
  2. build time for the site
  3. the optimized size of the repo

IMHO, it's that order order. So... that means removing these 100% from the site (because the value of those pages is in the first period after publication, 2wks as you say @LuiseFreese). That makes the other two priorities moot.

IMHO, these should only be for social media marketing of the meeting. At most, include just a still frame, not an animated GIF, of the together mode.

@LuiseFreese
Copy link
Collaborator

Had a chat with @VesaJuvonen ...he agrees that we should optimize, but wonders, if there is really no way to reference the raw gifs and exclude them?

If there is no way, we will add a still image of the togethermode into the blog and add a link to the twitter/Linkedin post that shows the animated gif.

@andrewconnell
Copy link
Collaborator

@LuiseFreese said:

if there is really no way to reference the raw gifs and exclude them?

Possibly with some CSS trickery, you could keep the image from being loaded on mobile, but when Google indexes the page, it will see the gif and add strikes against the page. I use a service that crawls my sites weekly. I used to use animated GIF's for a few demos, until I realized how much of a negative impact it was having on the SEO of the page.

So... referencing my previous comment, WRT (1), I think these animated GIF's are more negative than positive on the site.

And that doesn't even factor in the issues WRT (2) & (3)... from the recent builds, just the deployment of the rendered site content takes 5m... so slashing these big files down will have a substantial impact. In the worst case, it's going to cut 20% of the total deployment size.

FWIW - the way I deal with this on my content sites is to put big files (ZIP's, animated GIF's, etc.) in a separate location (https://cdn.[andrewconnell.com|voitanos.io]) which is an Azure storage blob fronted by an Azure CDN with a custom domain. If I want to show a demo using an animated GIF, I have a sill frame and link to a popup that shows the animation. But this way, the user has to request the animation with manual action (addressing point (1)) and it's served up NOT from my site content & stored in my codebase, rather it's externalized (addressing points (2) & (3)).

I understand this thread to say "remove them", correct? I'll proceed with that & optimizing a lot more of the big files.

@LuiseFreese
Copy link
Collaborator

I understand this thread to say "remove them", correct? I'll proceed with that & optimizing a lot more of the big files.

yes.

andrewconnell added a commit to andrewconnell/blog that referenced this issue Oct 6, 2022
- remove image processing for unused images
  - current theme only supports max content width of ~675px, soooo...
  - removed processing for images never used (800px, 1200px, 1500px)
- refactored logic to determine featured image (ie: thumbnail)
  for to standardized partial (**get-featured-image.html**)
- improved responsive images
  - replaced prev 4x img processing (see above) w/ 2 default images
    (mobile/desktop) with ideal optimization (webp)
  - added fallback image for old browsers (jpeg)
  - replaced image reference with simplified `<picture>` element
- theme code formatting
- changed single post rendering to use `partialCache` for widgets
  - widgets (categories/tags/recent posts/social) aren't page unique,
    so switch partial to cache so only gen once to optimize build
  - updated `params.toml` with warning about personalized widget usage
- references pnp#144
andrewconnell added a commit to andrewconnell/blog that referenced this issue Oct 6, 2022
- switch to partial cache for page header
  - global for all pages so no need to regen for each page
- references pnp#144
LuiseFreese pushed a commit that referenced this issue Oct 7, 2022
* 🎨🐞🧼📦 optimize image processing for single posts

- remove image processing for unused images
  - current theme only supports max content width of ~675px, soooo...
  - removed processing for images never used (800px, 1200px, 1500px)
- refactored logic to determine featured image (ie: thumbnail)
  for to standardized partial (**get-featured-image.html**)
- improved responsive images
  - replaced prev 4x img processing (see above) w/ 2 default images
    (mobile/desktop) with ideal optimization (webp)
  - added fallback image for old browsers (jpeg)
  - replaced image reference with simplified `<picture>` element
- theme code formatting
- changed single post rendering to use `partialCache` for widgets
  - widgets (categories/tags/recent posts/social) aren't page unique,
    so switch partial to cache so only gen once to optimize build
  - updated `params.toml` with warning about personalized widget usage
- references #144

* 🐞🎨 optimize list builds

- switch to partial cache for page header
  - global for all pages so no need to regen for each page
- references #144

* 🎨🧼 improve code readability

- inconsistent line breaks & intendetation fixed

* 🎨 refactor resposive image generation to partial

- simplify code by refactoring resposive images

* 🎨 use reposive img partial in image render hook

- change the image rendering hook for content pages to use the new
  reusable reposive rendering hook

* 🎨 add Fill img render support for 'responsive-image'

- update partial `responsive-image` to support `Fill`
  in addition to `Resize` Hugo image processing method
- NOTE: see code comment in header of `responsive-image` partial
  for docs & usage guidance

* 🐞 update image render hook path resolution logic

- previously, image path resolution only worked if content article
  referenceed image as `images/file.ext` from content page; but if
  content page used the legitimate `./images/file.ext`, file wouldn't
  resolve
- result: responsive image resizing & processing wouldn't worked
  likely resulting in large web-unfriendly images on site
- this fix makes a 2nd attempt to strip the leading `./` from the path
  if the image wasn't previously resolved

* 📚 remove debug message & add code comments

* 📚 ensure go comments don't trigger linewrap in `srcset`

* 🎨 change recent-posts to use new image rendering

- change recent posts in margin to use new image rendering partial

* 🎨 change homepage to use new image rendering hooks

* 🎨 if orig image < resize option, use it

- previously, images were always resized, forcing some to get
  up-scaled
- now, if the original image width <= target resize size, the image
  processor uses that size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants