Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: Rework page store, add a dynacache to enable bigger data/content, and some general spring cleaning #11830

Closed
wants to merge 1 commit into from

Conversation

bep
Copy link
Member

@bep bep commented Dec 24, 2023

This PR is big. I could/should probably try to split it, but the pieces are in many ways related, and splitting it would be hard.

There are some breaking changes in this commit, see #11455.

Closes #11455
Closes #11549

What Changed

This fixes a set of bugs (see issue list) and it is also paying some technical debt accumulated over the years. We now finally builds with Staticcheck enabled in the CI build.

The performance should be about the same for regular sized Hugo sites, but it should perform and scale much better to larger data sets, as objects that uses lots of memory (e.g. rendered Markdown, big JSON files read into maps with transform.Unmarshal etc.) will now get automatically garbage collected if needed. Performance on partial rebuilds when running the server in fast render mode should be the same, but the change detection should be much more accurate.

A summery list of new features:

  1. A new dependency tracker that covers (almost) all of Hugo's API and is used to do fine grained partial rebuilds when running the server.
  2. A new and simpler tree document store which allows fast lookups and walking in all dimensions (e.g. language) concurrently.
  3. You can now configure a upper memory limit allowing for much larger data sets and/or running on lower specced PCs.

Memory Limit

Hugos will, by default, set aside a quarter of the total system memory, but you can set it via the OS environment variable HUGO_MEMORYLIMIT (in gigabytes). This is backed by a partitioned LRU cache used throughout Hugo. A cache that gets dynamically resized in low memory situations, allowing Go's Garbage Collector to free the memory.

Dependency Tracker

Hugo has had a rule based coarse grained approach to server rebuilds that has worked mostly pretty well, but there have been some surprises (e.g. stale content). This is now revamped with a new dependency tracker that can quickly calculate the delta given a changed resource (e.g. a content file, template, JS file etc.). This handles transitive relations, e.g. $page -> js.Build -> JS import, or $page1.Content -> render hook -> site.GetPage -> $page2.Title, or $page1.Content -> shortcode -> partial -> site.RegularPages -> $page2.Content -> shortcode ..., and should also handle changes to aggregated values (e.g. site.Lastmod) effectively.

This covers all of Hugo's API with 2 known exceptions (a list that may not be fully exaustive):

  1. Changes to files loaded with template func os.ReadFile may not be handled correctly. We recommend loading resources with resources.Get
  2. Changes to Hugo object (e.g. Page) passed in the template context to lang.Translate may not be detected correctly. We recommend having simple i18n templates without too much data context passed in other than simple types such as strings and numbers.

Document Store

Previously we, a little simplified, split the document store (where we store pages and resources) in a tree per language. This worked pretty well, but the structure made some operations harder than they needed to be. We have now restructured it into one Radix tree for all languages. Internally the language is considered to be a dimension of that tree, and the tree can be viewed in all dimensions concurrently. This makes some operations re. language simpler (e.g. finding translations is just a slice range), but the idea is that it should also be relatively inexpensive to add more dimensions if needed (e.g. role).

Fixes

Fixes #10104
Fixes #10380
Fixes #10694
Fixes #11439
Fixes #11453
Fixes #11457
Fixes #11466
Fixes #11540
Fixes #11551
Fixes #11556
Fixes #11654
Fixes #11661
Fixes #11663
Fixes #11840
Fixes #11664
Fixes #11669
Fixes #11671
Fixes #11807
Fixes #11808
Fixes #11809
Fixes #11815
Fixes #7425
Fixes #7436
Fixes #7437
Fixes #7544
Fixes #7882
Fixes #8307
Fixes #8498
Fixes #8927
Fixes #9192
Fixes #9324
Fixes #9343

@bep bep force-pushed the feat/mem1 branch 23 times, most recently from 1930e02 to cc73dc1 Compare December 30, 2023 17:21
@bep bep force-pushed the feat/mem1 branch 7 times, most recently from af1256c to 2c9c0f7 Compare January 6, 2024 10:48
@bep bep force-pushed the feat/mem1 branch 6 times, most recently from 283b298 to 5f23443 Compare January 12, 2024 17:30
@bep bep force-pushed the feat/mem1 branch 9 times, most recently from 5129c30 to 2e2966b Compare January 15, 2024 17:52
@bep bep marked this pull request as ready for review January 15, 2024 19:03
@bep
Copy link
Member Author

bep commented Jan 15, 2024

@jmooring I have done a fair amount of manually testing of this (will spin up my Windows VM tomorrow), but I would appreciate if you could take it for a spin on some of your sites and let me know how it works. I fully expect there to be some issues, but I'm pretty sure that this should fix much more than it breaks ...

@bep bep force-pushed the feat/mem1 branch 2 times, most recently from 90a03ec to a7a21ff Compare January 16, 2024 11:08
@jmooring
Copy link
Member

@bep
Copy link
Member Author

bep commented Jan 17, 2024

Replaced by #11894

@bep bep closed this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment