Incremental regeneration #380

Closed
zanshin opened this Issue Aug 7, 2011 · 51 comments
@zanshin

Can some form of incremental regeneration (ala graysky@39ae8c7) be added to the core of Jekyll? This would greatly enhance the tool for those sites (like mine) that have nearly 1800 entries and counting, and which take upwards of 5 minutes to generate.

@belkadan

I have a different implementation at https://github.com/belkadan/jekyll, too. It's helped plenty.

@sindresorhus

👍 This would be immensely useful.

@stereobooster

connected #118

@igrigorik

Any reason why this can't be merged? Would be rather useful.. stuck with the same wait loop here.

@matthiasbeyer

I would love it to be merged, too! My machine is not as high-end as yours, and with +350 posts it takes up to 4 minutes right now.

Don't know if it is a plugin or jekyll itself, but every user would benefit if there is a performance-improvement!

@parkr
Jekyll member

We're working on a fix for this. Soon :)

@parkr
Jekyll member

As @tombell would point out, we need to regenerate the entire site every time anything changes – we just can't know how pages depend upon each other.

I'll wait to close this until I hear from @mojombo, though.

@parkr
Jekyll member

Talked to @mojombo – we're going to work on a solution for this that regenerates changed files and all the files that "depend" on that file. Not until after 1.0 though!

@jwebcat

@parkr any news on this? I would love this feature as well.

@mattr-
Jekyll member
@jwebcat

@mattr- I am new to ruby. If I patch jekyll.rb and the changes with a diff from either repo above will it break my install of Jekyll 1.0.0.beta2 ? I apologize if this is a dumb question. Thank you for response 👍

@mattr-
Jekyll member
@caiogondim

That would really help a lot =)

@AlexanderEkdahl

Could you create new topic branch for this feature? Opening it up for discussion so that others might contribute/help. It would indeed be an awesome feature to have!

@parkr
Jekyll member

We're focused on shipping 1.0 at this point so not quite yet. Soon!

@parkr
Jekyll member

We don't even know if this is possible in practical terms. We have a plan, but we need to see how that plan maps to reality before we can really think about doing this.

@Jack000

+1. I wouldn't mind managing dependency tree myself, just need a way to generate a single post.

@maul-esel

@Jack000: 👍 that's what I thought - there's so many possible dependencies, they can't be all managed by jekyll. Instead, the user should take care of them.

@csaunders

So this looks like a good candidate to talk about the conversation I was having with @mattr- about a caching/performance improvements.

I'm not sure if liquid provides this functionality yet, but would having something in liquid that informs you of it's dependencies be useful? We could build up a tree of dependencies as well as a lookup table for each of those files along with some kind of file signature that would allow us to quickly check if a file needs to be regenerated or not.

@Jack000

I'm not sure if liquid is the right place to put the dependency tree. This seems like something that should go in a plugin.

As I understand it the speed problem is caused by writing rather than reading. I can't speak to all use cases but in my case what I need is something in Jekyll that allows writing of a single post (like with the limit_posts flag), but reading of all posts in the plugin system. This way I'd be able to add a single new post, while being able to populate related posts etc in the sidebar using a plugin.

for me, progressive generation is the biggest issue with jekyll. I operate a medium-sized site and it's taking 15 minutes to generate, which is becoming rather unsustainable.

@mattr-
Jekyll member
@Jack000

our blog has 487 posts, and takes 6:00 to generate

I abused jekyll a bit for the main site:
923 posts and maybe 100 or so static pages. It takes 18:04 to generate, but after testing it seems jekyll is only responsible for 7:00 of that. I'm running version v0.11.2

There's just a bit of culture shock coming from the wordpress/drupal world where you expect things to be instant. Since most of the time we're just adding one post it seems a bit wasteful to generate the whole site every time.

@csaunders

The reason for the need for a dependency tree is because jekyll actually doesn't know anything about what files are required to generate a page, at least when it comes to using liquid tags such as {% include %}

Adding that feature such that you can query a liquid template to know what it's dependencies are and check for changes at the Jekyll level is what I'm looking into.

There may be some aspects of jekyll that won't require changes though; such as the headers, which are all within the jekyll domain.

@Jack000

I was under the impression that by the time you get to the {% include %} part, that the file is already being written. If there's a way in liquid to tell jekyll "nothing's changed, no need to write this file", that'll work too.

I have no idea where the actual bottleneck is, so someone more familiar with the code might have better feedback.

I run jekyll as a post-receive hook, and I get people making 20 little commits and wondering why their post hasn't shown up an hour later..

@mattr-
Jekyll member

I've thought about the dependency tree, but I'm hoping to avoid it because getting to it is a bit nasty.

@csaunders
@csaunders

If you ever want to chat about it, I'm hanging out in the #jekyll channel on freenode.

@csaunders

Internally liquid templates know about all their child tags/nodes. When I have spare time I've been working on a patch to make it such that liquid templates can inform you about what other templates are they are including.

With that information we can probably build a global lookup table that maps template.liquid => [dependencies.liquid] and build the dependency tree from there.

This is what I have in mind in terms of that dependency tree:

lookup = {
  # This should be a reverse index?
  'template.liquid' => ['header.liquid', 'body.liquid', 'footer.liquid']
}
hashes = {
  'template.liquid' => 'ababababba',
  'header.liquid' => 'dcdcdcdcdcdc',
  'body.liquid' => 'efefefefef',
  'footer.liquid' => '0010010'
}

We'd keep a manifest to the last successful compilation which contains hashes of all the files, and once we find a change just need to find out the files that depend on the changed file.

@parkr
Jekyll member

@mattr- Can you push up your branch? This will help our conversation here.

@jaybe-jekyll
Jekyll member

Some discussion from IRC:

High level summary:

  • In order to know, truly, everything that needs to be regenerated upon a change, Jekyll would need to be refactored to "keep track" of what's been built, what's changed, the impact of said changes, etc. This sounds daunting.

My initial thoughts:

  • As stated above, Jekyll being responsible for keeping track of implications and understanding impacts of changes seems daunting and out of scope.

Example: Changing "permalink: " in _config.yml could essentially mean every page on the site needs to be regenerated. Should Jekyll be responsible for tracking/understanding this?

Potential approach:

  • Perhaps consider an option for Jekyll, such as, --changed-only, that could be specified along with -w --changed-only, for example.

    • When Jekyll detects File A has changed, only File A is regenerated. Otherwise, the default behavior would be to regenerate the entire site, as that would ensure nothing is missed/overlooked.

This approach is somewhat "working around" what was originally asked/discussed, but it's a realistic approach that would provide a baby step towards the broader thinking/challenge.

@mattr-
Jekyll member
@mattr-
Jekyll member

In order to know, truly, everything that needs to be regenerated upon a change, Jekyll would need to be refactored >to "keep track" of what's been built, what's changed, the impact of said changes, etc. This sounds daunting.

I think it's actually quite possible, but it requires a lot of decoupling so that the rendering steps are more functional in nature.

As stated above, Jekyll being responsible for keeping track of implications and understanding impacts of changes >seems daunting and out of scope.
Example: Changing "permalink: " in _config.yml could essentially mean every page on the site needs to be regenerated. Should Jekyll be responsible for tracking/understanding this?

Yes. It should be, but at a higher level. Any time _config.yml changes, the whole site should be rebuilt, actually.

Perhaps consider an option for Jekyll, such as, --changed-only, that could be specified along with -w --changed-only, for example.
When Jekyll detects File A has changed, only File A is regenerated. Otherwise, the default behavior would be to regenerate the entire site, as that would ensure nothing is missed/overlooked.
This approach is somewhat "working around" what was originally asked/discussed, but it's a realistic approach that would provide a baby step towards the broader thinking/challenge.

I would prefer not to see an additional flag for this but just have it happen automatically after the first complete build.

@mattr-
Jekyll member

One of the things that I was investigating was whether or not Jekyll actually needed to have a large dependency tree of Liquid things. My thought was that it would be enough to just keep track of the information in the front matter of each post.

Either way, all of that is moot really. The codebase isn't really ready for this right now and I've been working on refactoring it so that it is. (#1572 is part of that but I need to go back and fix it up after the recent changes to load layouts from subdirectories.)

@jaybe-jekyll
Jekyll member

I would prefer not to see an additional flag for this but just have it happen automatically after the first complete build.

My concern with defaulting [potentially incomplete] builds, by default, would be the inevitable support rants:

My site' links in associated posts/pages aren't being updated when it gets rebuilt!? What gives?

Defaulting to assured-to-rebuild everything properly (and historically) might mitigate such ongoing confusions.

@amitaibu

Perhaps consider an option for Jekyll, such as, --changed-only, that could be specified along with -w --changed-only, for example.

I think it's indeed a more realistic approach. The default behaviour should be re-render everything - for any blog site it's would be great.

However when we try to deal with bigger sites, we can't let the content editor wait for so long to see their outcome. I'd be happy to be given a way to render just the pages that I want. Maybe even jekyll build --files=foo,bar,_posts/* . A full render (that might take really long), I'll do later -- according to my own business logic.

@mattr- mattr- was assigned Nov 11, 2013
@mattr- mattr- referenced this issue Nov 27, 2013
Closed

Incremental regeneration #1761

1 of 8 tasks complete
@vjeux

Ping.

I'm blogging for React website (http://facebook.github.io/react/) which is written with Jekyll and it's getting more an more a pain. When we first launched the website it took few seconds to update to a change but now that we have much more content it takes a minute to update every time I change a single line on a blog post.

I really don't care about regenerating the entire freaking website, I just want to regenerate the page I'm looking at. Could it be possible to mark the whole website as dirty and whenever I open a page on the website then it would just reload that file.

Essentially, instead of phrasing the problem as "this file changed, what other files did it impact", phrase it as "I want to see this page, do the entire build process but skip everything that's not this file". I feel like this would be a lot more manageable for the use case of changing one file iteratively.

I don't really know jekyll enough to see if it's possible or not but I figured I would say it out loud and see if that's not too crazy

@mattr-
Jekyll member
@migurski

+1 to all this; incremental regeneration would be immensely helpful for a ~1000 page site I’m considering Jekyll for.

@tuananh

+1 for this. It's the only thing that holds me back from using jekyll. personal blog/website is fine but 1000+ posts sites are not.

@phoet

👍 for a jekyll serve --watch --changed-only

that would come in really handy when testing out small changes. i suppose that is exactly what most people do on large sites.

it could also be nice to trigger a full regeneration through a signal, like it is done in watchr rspec

@skadavan

+1 for this.

@zanshin
@jafskot

Relevant/related:

#2087

User-space Functional Logic Renderings with Globalization Capabilities

Overview of Idea, Concept

Provide core Jekyll functionality allowing the end-user designer to create, have access to, and take advantage of initialization-style logic (Liquid, Variables, Functions) of which outcome would then be globally available throughout the build and render phases without redundant reiteration.

@parkr parkr modified the milestone: 3.0, 2.0 Mar 16, 2014
@9peppe

as the uploading time is a bigger (to me) problem than the generating time, there should be a way to avoid syncing the entire site on trivial changes, I think. jekyll should, anytime it builds a page, make a diff between the generated one and the already-in-_site one, and only replace the _site version if there are differences. (this solves a different problem. I know. :( )

@jaybe-jekyll
Jekyll member

@9peppe if you are referring to the synchronization/upload of generated site contents to a remote location such as a web host, the rsync command along with its switches/options such as --checksum will assist with incremental-type synchronization regardless of using jekyll or merely synchronizing abritrary data to/from machines.

Example

$ rsync --rsh='ssh -p 22222' -vv --checksum --recursive --update --keep-dirlinks --stats --human-readable --progress --itemize-changes _site/  user1@host.example.com:www/example.com/www/
@9peppe

@jaybe-jekyll yes, that's what I'm referring to. I'd like use rsync, but I don't have sftp access, only ftp[s], so I am constrained to use stuff like ftpsync, which only uses file creation/modification times.

@parkr
Jekyll member

Tracking this in #3060, will try to get some version of this into 3.0.

@parkr parkr closed this Nov 5, 2014
@parkr
Jekyll member

Ehhh, just kidding. Sorry guys. This tracks incremental regeneration between Jekyll processes, #3060 tracks incremental regeneration within one jekyll (build|serve) --watch command.

@parkr parkr reopened this Nov 5, 2014
@Naatan

That's a weird joke.

@alfredxing alfredxing referenced this issue Nov 17, 2014
Merged

Incremental regeneration #3116

4 of 4 tasks complete
@parkr
Jekyll member

#3116 solves this, at least as a first iteration. If you have time, I'd appreciate it if y'all could try building your sites with the current master and see how it works out for you.

@parkr
Jekyll member

Going to close this.

@parkr parkr closed this Apr 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment