Incremental regeneration #3116

Merged
merged 17 commits into from Dec 22, 2014

Conversation

Projects
None yet
@alfredxing
Member

alfredxing commented Nov 17, 2014

Almost complete implementation of incremental regeneration (#380 and #3060) between as well as within a process. Currently handles file content changes and dependency changes (layouts and includes only). This implementation takes a few things from @mattr-'s implementation in #1761.

Here is what happens on each build:

  1. The Metadata class is initialized and reads in metadata from .jekyll-metadata (metadata is not read if the --clean flag is set).
  2. Jekyll performs reset, read, and generate as it did before.
  3. In the render step, all documents, pages, and posts are checked for modifications to itself or its dependencies. A document/page/post is also marked as modified if it has regenerate: true in its front-matter.
  4. cleanup is performed if the --clean flag is set.
  5. Modified files, as well as the new .jekyll-metadata, are written in write.

Remarks:

  • Although initial (no metadata present) build times went up a tiny bit, builds with metadata are super quick (here are the results from the jekyllrb.com site)
  • Dependencies in the form of referencing posts/pages/collections/data in Liquid with site.x are not resolved. (This is why I included the regenerate front-matter override.) Any ideas on how we can make this work?
  • The metadata file is currently YAML (loaded with SafeYAML). Is this optimal, or would another serialization method work better?

Todo:

  • Implement behaviour
  • Fix failing tests
  • Add tests (cucumber & unit)
  • Add more unit tests!
@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 17, 2014

Member

Amazing work. 🙇

Member

parkr commented Nov 17, 2014

Amazing work. 🙇

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 17, 2014

Member

/cc @mattr- for his 👀 as well.

Member

parkr commented Nov 17, 2014

/cc @mattr- for his 👀 as well.

lib/jekyll/cleaner.rb
+
+ # Private: The metadata file storing dependency tree and build history
+ #
+ # Returns an Array with the metdata file as the only item

This comment has been minimized.

@mattr-

mattr- Nov 17, 2014

Member

typo for 'metadata' 😃

@mattr-

mattr- Nov 17, 2014

Member

typo for 'metadata' 😃

lib/jekyll/metadata.rb
+
+ # Initialize metadata store by reading YAML file,
+ # or an empty hash if file does not exist
+ @metadata = (File.file?(metadata_file) && !(site.config['clean'])) ? SafeYAML.load(File.read(metadata_file)) : {}

This comment has been minimized.

@mattr-

mattr- Nov 17, 2014

Member

Seems like this could use its own method.

@mattr-

mattr- Nov 17, 2014

Member

Seems like this could use its own method.

lib/jekyll/site.rb
@@ -289,13 +293,20 @@ def render
collections.each do |label, collection|
collection.docs.each do |document|
- document.output = Jekyll::Renderer.new(self, document).run
+ document.output = Jekyll::Renderer.new(self, document).run if (
+ @metadata.regenerate?(document.path) ||

This comment has been minimized.

@mattr-

mattr- Nov 17, 2014

Member

if we're gonna have the accessor, let's use the accessor here. 😃 Makes it easier to change later should we need to, and it's very low cost.

@mattr-

mattr- Nov 17, 2014

Member

if we're gonna have the accessor, let's use the accessor here. 😃 Makes it easier to change later should we need to, and it's very low cost.

@mattr-

This comment has been minimized.

Show comment
Hide comment
@mattr-

mattr- Nov 17, 2014

Member

Dependencies in the form of referencing posts/pages/collections/data in Liquid with site.x are not resolved. (This is why I included the regenerate front-matter override.) Any ideas on how we can make this work?

Yes, for Liquid 2.x at least. No idea about Liquid 3.x. Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

Member

mattr- commented Nov 17, 2014

Dependencies in the form of referencing posts/pages/collections/data in Liquid with site.x are not resolved. (This is why I included the regenerate front-matter override.) Any ideas on how we can make this work?

Yes, for Liquid 2.x at least. No idea about Liquid 3.x. Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

lib/jekyll/configuration.rb
@@ -17,7 +17,7 @@ class Configuration < Hash
# Handling Reading
'safe' => false,
'include' => ['.htaccess'],
- 'exclude' => [],
+ 'exclude' => ['.jekyll-metadata'],

This comment has been minimized.

@mattr-

mattr- Nov 17, 2014

Member

Since this is Jekyll internal, I'm wondering if the metadata file is something that we should always exclude, no matter what. In my mind, this implies that we would move it out of configuration.

@mattr-

mattr- Nov 17, 2014

Member

Since this is Jekyll internal, I'm wondering if the metadata file is something that we should always exclude, no matter what. In my mind, this implies that we would move it out of configuration.

lib/jekyll/command.rb
@@ -58,6 +58,7 @@ def add_build_options(c)
c.option 'unpublished', '--unpublished', 'Render posts that were marked as unpublished'
c.option 'quiet', '-q', '--quiet', 'Silence output.'
c.option 'verbose', '-V', '--verbose', 'Print verbose output.'
+ c.option 'clean', '-c', '--clean', 'Clean the site before rebuilding.'

This comment has been minimized.

@mattr-

mattr- Nov 17, 2014

Member

I had called this option --full-rebuild before. I thought it was more intention revealing from a UX perspective. Should we stick with --clean here or name it something else?

@mattr-

mattr- Nov 17, 2014

Member

I had called this option --full-rebuild before. I thought it was more intention revealing from a UX perspective. Should we stick with --clean here or name it something else?

This comment has been minimized.

@parkr

parkr Nov 17, 2014

Member

--no-cache?

@parkr

parkr Nov 17, 2014

Member

--no-cache?

This comment has been minimized.

@parkr

parkr Nov 17, 2014

Member

/cc @jglovier who said he'd love to help out with UX :)

@parkr

parkr Nov 17, 2014

Member

/cc @jglovier who said he'd love to help out with UX :)

This comment has been minimized.

@jglovier

jglovier Nov 17, 2014

Member

Sorry for lack of context, but could you help me understand what exactly it does before I offer an opinion? I've never used this flag.

@jglovier

jglovier Nov 17, 2014

Member

Sorry for lack of context, but could you help me understand what exactly it does before I offer an opinion? I've never used this flag.

This comment has been minimized.

@alfredxing

alfredxing Nov 17, 2014

Member

@jglovier It basically runs cleanup! (deletes the generated _site directory and deletes .jekyll-metadata) before building (well, actually, it runs before write but the effect is the same since we don't load metadata if this flag is set).

@alfredxing

alfredxing Nov 17, 2014

Member

@jglovier It basically runs cleanup! (deletes the generated _site directory and deletes .jekyll-metadata) before building (well, actually, it runs before write but the effect is the same since we don't load metadata if this flag is set).

This comment has been minimized.

@troyswanson

troyswanson Nov 17, 2014

Member

You've never used this flag because it's brand new. 😉

From what I can tell, it looks like this flag tells Jekyll to completely trash the destination directory prior to rebuilding the site, which negates the incremental regeneration stuff and forces a full rebuild every time. I'm with @mattr- that "clean" isn't very obvious.

I'm trying to think of something that might work better... 💭

@troyswanson

troyswanson Nov 17, 2014

Member

You've never used this flag because it's brand new. 😉

From what I can tell, it looks like this flag tells Jekyll to completely trash the destination directory prior to rebuilding the site, which negates the incremental regeneration stuff and forces a full rebuild every time. I'm with @mattr- that "clean" isn't very obvious.

I'm trying to think of something that might work better... 💭

This comment has been minimized.

@parkr

parkr Nov 17, 2014

Member

@jglovier The idea is that the --clean flag tells jekyll to do a full rebuild of the entire site, rather than doing an incremental build. 😄

@parkr

parkr Nov 17, 2014

Member

@jglovier The idea is that the --clean flag tells jekyll to do a full rebuild of the entire site, rather than doing an incremental build. 😄

This comment has been minimized.

@alfredxing

alfredxing Nov 17, 2014

Member

I chose clean quite arbitrarily, actually. I was thinking of something like --force-rebuild but got a bit lazy 😛

@alfredxing

alfredxing Nov 17, 2014

Member

I chose clean quite arbitrarily, actually. I was thinking of something like --force-rebuild but got a bit lazy 😛

This comment has been minimized.

@jglovier

jglovier Nov 17, 2014

Member

So my 2 cents is as follows: as for the flag name, it's less important than surfacing the documentation for what it does in a helpful way, and that it is even available.

But I agree that clean is rather ambiguous. Something like --clear or --fresh or even --full-rebuild might carry more of an accurate connotation.

@jglovier

jglovier Nov 17, 2014

Member

So my 2 cents is as follows: as for the flag name, it's less important than surfacing the documentation for what it does in a helpful way, and that it is even available.

But I agree that clean is rather ambiguous. Something like --clear or --fresh or even --full-rebuild might carry more of an accurate connotation.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 17, 2014

Member

Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

Upgrading to Liquid 3.x for sure.

Member

parkr commented Nov 17, 2014

Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

Upgrading to Liquid 3.x for sure.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 17, 2014

Member

@mattr- Thanks for the comments! I'll fix them up as suggested.

Member

alfredxing commented Nov 17, 2014

@mattr- Thanks for the comments! I'll fix them up as suggested.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 22, 2014

Member

Finished fixing unit tests (quite a few were failing because of the no-render, no-write policy for unchanged files introduced by this), and implemented the suggestions. Now I just need to tackle the cucumber tests!

I also rebased onto master to maintain better merge compatibility.

Member

alfredxing commented Nov 22, 2014

Finished fixing unit tests (quite a few were failing because of the no-render, no-write policy for unchanged files introduced by this), and implemented the suggestions. Now I just need to tackle the cucumber tests!

I also rebased onto master to maintain better merge compatibility.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 22, 2014

Member

You are a boss. Great work!

Member

parkr commented Nov 22, 2014

You are a boss. Great work!

@gjtorikian

This comment has been minimized.

Show comment
Hide comment
@gjtorikian

gjtorikian Nov 22, 2014

Member

Tossing all sorts of confetti here. Will work to get this GH-Pages compatible when it lands in a Jekyll version.

Member

gjtorikian commented Nov 22, 2014

Tossing all sorts of confetti here. Will work to get this GH-Pages compatible when it lands in a Jekyll version.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 23, 2014

Member

All Cucumber tests pass on my machine, let's see what Travis thinks (might take a while since the queue seems to be quite long right now).

I also added a jekyll clean command for easy cleaning of output and metadata (kind of like --full-rebuild but without the build part).


@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

Member

alfredxing commented Nov 23, 2014

All Cucumber tests pass on my machine, let's see what Travis thinks (might take a while since the queue seems to be quite long right now).

I also added a jekyll clean command for easy cleaning of output and metadata (kind of like --full-rebuild but without the build part).


@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 23, 2014

Member

I also added a jekyll clean command for easy cleaning of output and metadata

Allegory to make clean? 👍

Member

parkr commented Nov 23, 2014

I also added a jekyll clean command for easy cleaning of output and metadata

Allegory to make clean? 👍

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 23, 2014

Member

@parkr Yeah, pretty much. I was going to use it in the Cucumber tests but I fixed it the right way in the end 😛

Member

alfredxing commented Nov 23, 2014

@parkr Yeah, pretty much. I was going to use it in the Cucumber tests but I fixed it the right way in the end 😛

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Nov 23, 2014

Member

@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

GitHub Pages won't need this at all – most builds take < 4 seconds to complete, and the build environment is ephemeral like Travis. Maybe for Pages you can add a --no-cache flag or something?

That reminds me – what are we calling this? Cache? Incremental regeneration? Partial rebuild? I like option 1 the most, but I'm open to less-tech-savvy names. @jglovier / @gjtorikian, you're both very smart with the people. What might you call this feature?

Member

parkr commented Nov 23, 2014

@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

GitHub Pages won't need this at all – most builds take < 4 seconds to complete, and the build environment is ephemeral like Travis. Maybe for Pages you can add a --no-cache flag or something?

That reminds me – what are we calling this? Cache? Incremental regeneration? Partial rebuild? I like option 1 the most, but I'm open to less-tech-savvy names. @jglovier / @gjtorikian, you're both very smart with the people. What might you call this feature?

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 23, 2014

Member

For non-techies, I would probably call it "lazy regeneration" or "lazy rebuild" (similar to "lazy loading", since unchanged files aren't unnecessarily regenerated).

Member

alfredxing commented Nov 23, 2014

For non-techies, I would probably call it "lazy regeneration" or "lazy rebuild" (similar to "lazy loading", since unchanged files aren't unnecessarily regenerated).

@budparr

This comment has been minimized.

Show comment
Hide comment
@budparr

budparr Nov 23, 2014

Contributor

If you know that it exists you can handle any name for it, I think, though I'd say that the word cache is a thing I'm looking to avoid with Jekyll (if you've ever had to do layers of caching to get Wordpress to run decently), so the connotation may be bad for some. I vote for some variant of "partial build" since I think that speaks directly to what it's doing.

But really here to say a big Thanks! guys. Game changer for Jekyll and a great addition!

Contributor

budparr commented Nov 23, 2014

If you know that it exists you can handle any name for it, I think, though I'd say that the word cache is a thing I'm looking to avoid with Jekyll (if you've ever had to do layers of caching to get Wordpress to run decently), so the connotation may be bad for some. I vote for some variant of "partial build" since I think that speaks directly to what it's doing.

But really here to say a big Thanks! guys. Game changer for Jekyll and a great addition!

@alfredxing alfredxing closed this Nov 23, 2014

@alfredxing alfredxing reopened this Nov 23, 2014

lib/jekyll/layout.rb
@@ -26,6 +29,7 @@ def initialize(site, base, name)
@site = site
@base = base
@name = name
+ @path = Jekyll.sanitized_path(site.source, File.join(base, name))

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

Let's use site.in_source_dir(base, name).

@parkr

parkr Nov 23, 2014

Member

Let's use site.in_source_dir(base, name).

This comment has been minimized.

@alfredxing

alfredxing Nov 23, 2014

Member

Yeah, that's much simpler. I'll change the rest as well.

@alfredxing

alfredxing Nov 23, 2014

Member

Yeah, that's much simpler. I'll change the rest as well.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Nov 23, 2014

Member

Oops, accidentally clicked the 'close' button. 😇

Member

alfredxing commented Nov 23, 2014

Oops, accidentally clicked the 'close' button. 😇

lib/jekyll/convertible.rb
+ # Add layout to dependency tree
+ site.metadata.add_dependency(
+ Jekyll.sanitized_path(site.source, path),
+ Jekyll.sanitized_path(site.source, layout.path)

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

These should also use site.in_source_dir :)

@parkr

parkr Nov 23, 2014

Member

These should also use site.in_source_dir :)

lib/jekyll/metadata.rb
@@ -0,0 +1,108 @@
+require 'set'

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

We usually don't require things in an individual file. Can this be put in lib/jekyll.rb? I think things get lost when they go into individual files (or they lay around after a refactor, requiring a module we don't need).

@parkr

parkr Nov 23, 2014

Member

We usually don't require things in an individual file. Can this be put in lib/jekyll.rb? I think things get lost when they go into individual files (or they lay around after a refactor, requiring a module we don't need).

lib/jekyll/metadata.rb
+ end
+
+ # Check path that exists in metadata
+ if (data = @metadata[path])

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

You don't need the parentheses here. Also, can you use metadata instead of @metadata?

@parkr

parkr Nov 23, 2014

Member

You don't need the parentheses here. Also, can you use metadata instead of @metadata?

lib/jekyll/metadata.rb
+ return @cache[dependency] = @cache[path] = true
+ end
+ end
+ if data["mtime"] == File.mtime(path)

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

Would you mind using .eql? here instead of ==?

@parkr

parkr Nov 23, 2014

Member

Would you mind using .eql? here instead of ==?

lib/jekyll/metadata.rb
+ #
+ # Returns the String path of the file.
+ def metadata_file
+ Jekyll.sanitized_path(site.source, '.jekyll-metadata')

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

site.in_source_dir here, too, please.

@parkr

parkr Nov 23, 2014

Member

site.in_source_dir here, too, please.

lib/jekyll/metadata.rb
+ #
+ # Returns the read metadata.
+ def read_metadata
+ @metadata = (File.file?(metadata_file) && !(site.config['full_rebuild'])) ? SafeYAML.load(File.read(metadata_file)) : {}

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

full_rebuild should be immortalized on this class, no? Seems round-about to get it via the site config so directly here.

@parkr

parkr Nov 23, 2014

Member

full_rebuild should be immortalized on this class, no? Seems round-about to get it via the site config so directly here.

lib/jekyll/site.rb
- document.output = Jekyll::Renderer.new(self, document).run
+ document.output = Jekyll::Renderer.new(self, document).run if (
+ metadata.regenerate?(document.path, document.write?) ||
+ document.data['regenerate']

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

let's add a regenerate? method to Document and Convertible

@parkr

parkr Nov 23, 2014

Member

let's add a regenerate? method to Document and Convertible

lib/jekyll/site.rb
+ each_site_file { |item|
+ item.write(dest) if (
+ metadata.regenerate?(Jekyll.sanitized_path(source, item.path)) ||
+ (item.respond_to?(:data) && item.data['regenerate'])

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

The #regenerate? would come in handy here

@parkr

parkr Nov 23, 2014

Member

The #regenerate? would come in handy here

lib/jekyll/site.rb
@@ -11,6 +11,7 @@ class Site
:gems, :plugin_manager
attr_accessor :converters, :generators
+ attr_accessor :metadata

This comment has been minimized.

@parkr

parkr Nov 23, 2014

Member

Should this be changeable directly? Why not attr_reader?

@parkr

parkr Nov 23, 2014

Member

Should this be changeable directly? Why not attr_reader?

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 6, 2014

Member

The Sass issue would be pretty difficult to fix (because of the Converter structure of not passing the filename to the converter). One idea would be to add the sass includes directory to every sass file, though I can't think of a good place to put this logic...

The second issue is solved by the newly added jekyll clean command, which removes the metadata file along with the _site directory.

Member

alfredxing commented Dec 6, 2014

The Sass issue would be pretty difficult to fix (because of the Converter structure of not passing the filename to the converter). One idea would be to add the sass includes directory to every sass file, though I can't think of a good place to put this logic...

The second issue is solved by the newly added jekyll clean command, which removes the metadata file along with the _site directory.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Dec 6, 2014

Member

Sass regeneration is also very quick, so we could also always ask sass to regenerate.

Member

parkr commented Dec 6, 2014

Sass regeneration is also very quick, so we could also always ask sass to regenerate.

@mglukhovsky

This comment has been minimized.

Show comment
Hide comment
@mglukhovsky

mglukhovsky Dec 6, 2014

@parkr -- that makes complete sense.

On our site, Sass generation takes less than a second (with fairly complicated stylesheets), but Jekyll takes around 15 seconds for a full build. 👍 for always regenerating Sass.

@parkr -- that makes complete sense.

On our site, Sass generation takes less than a second (with fairly complicated stylesheets), but Jekyll takes around 15 seconds for a full build. 👍 for always regenerating Sass.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 6, 2014

Member

Implemented! I used asset_file? in general because those should all tend to be quick.

Member

alfredxing commented Dec 6, 2014

Implemented! I used asset_file? in general because those should all tend to be quick.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Dec 6, 2014

These should be indented 2 spaces, please:

def regenerate?
  asset_file? ||
    data['regenerate'] ||
    site.metadata.regenerate?(site.in_source_dir(relative_path))
end

Furthermore, why does the Convertible know whether it should be converted? This seems improper.

These should be indented 2 spaces, please:

def regenerate?
  asset_file? ||
    data['regenerate'] ||
    site.metadata.regenerate?(site.in_source_dir(relative_path))
end

Furthermore, why does the Convertible know whether it should be converted? This seems improper.

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 6, 2014

Owner

Sure!

I see the Convertible class as more of a common interface between Post and Page (kind of like a Document, and going back to #3169) than something like Renderer. There's also a couple of other methods that don't fit into the convert part either, like published? and output_ext...

Owner

alfredxing replied Dec 6, 2014

Sure!

I see the Convertible class as more of a common interface between Post and Page (kind of like a Document, and going back to #3169) than something like Renderer. There's also a couple of other methods that don't fit into the convert part either, like published? and output_ext...

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Dec 6, 2014

It is totally a common interface, yes, but it definitely shouldn't be reaching into the site for its metadata instance to operate with. published? is a nice interface for saying !!data['published'], but it shouldn't interact with the Publisher (and ideally wouldn't even exist itself). The Publisher should be the only thing that can know if a given Page/Post/Document is published. The Page/Post/Document should have no clue whatsoever -- it's just a datastore.

parkr replied Dec 6, 2014

It is totally a common interface, yes, but it definitely shouldn't be reaching into the site for its metadata instance to operate with. published? is a nice interface for saying !!data['published'], but it shouldn't interact with the Publisher (and ideally wouldn't even exist itself). The Publisher should be the only thing that can know if a given Page/Post/Document is published. The Page/Post/Document should have no clue whatsoever -- it's just a datastore.

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 6, 2014

Owner

Right... Would it be better if I moved the metadata.regenerate? stuff back into Site, and renamed the method force_regenerate?? I think there's one in Document as well.

Owner

alfredxing replied Dec 6, 2014

Right... Would it be better if I moved the metadata.regenerate? stuff back into Site, and renamed the method force_regenerate?? I think there's one in Document as well.

@parkr parkr self-assigned this Dec 6, 2014

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Dec 7, 2014

Member

Looking great to me... @jekyll/core, want to take a look?

@alfredxing Should we encourage folks to add .jekyll-metadata to their .gitignore's?

Member

parkr commented Dec 7, 2014

Looking great to me... @jekyll/core, want to take a look?

@alfredxing Should we encourage folks to add .jekyll-metadata to their .gitignore's?

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Dec 8, 2014

Contributor

I have a few questions:

  • Is it necessary to write the metadata to the disk? If so, why?
  • Can the logic in regenerate? be improved, it's not bad logic but it did take me a second to realize.
Contributor

envygeeks commented Dec 8, 2014

I have a few questions:

  • Is it necessary to write the metadata to the disk? If so, why?
  • Can the logic in regenerate? be improved, it's not bad logic but it did take me a second to realize.
@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 8, 2014

Member

@envygeeks

  • I think so; there's no other way to keep the metadata persistently (between runs).
  • Which lines/parts did you have trouble with?

@parkr Having it in the Git repo would allow for faster build times for cloners, but on the other hand it's not really something we want completely exposed...

Member

alfredxing commented Dec 8, 2014

@envygeeks

  • I think so; there's no other way to keep the metadata persistently (between runs).
  • Which lines/parts did you have trouble with?

@parkr Having it in the Git repo would allow for faster build times for cloners, but on the other hand it's not really something we want completely exposed...

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Dec 8, 2014

Contributor

@alfredxing

On the logic part, the entire method was rather confusing and could probably be organized in a way that combines it into a single simple if. I would short-circuit some of that (but I'd have to reread the entire class to get a firm grasp on it to see if it could be organized in a better way.)

On the metadata part, if that is the case then please do marshal it to a single file instead of a folder of files because meta data is small, but the file amount will be high, files are expensive and it will be cheaper in the long run to store it in a single file than thousands of smaller files. The cost of data storage and IO only benefits from large files (from my experience.)

That said, I must also imply that if it cannot keep track of itself between runs (and it's not a persistent process) that in and of itself adds a performance penalty that doesn't necessarily add a benefit to using this no? Unless I am completely misunderstanding but from what I'm to believe you would need to wrap around as a server with inotify/fswatch and track these file changes efficiently and proc off into a run. I don't fully understand the entirety of this code though as I've only gone over parts of it so I'll have a full look at it today or tomorrow and get a full grasp of it.

Contributor

envygeeks commented Dec 8, 2014

@alfredxing

On the logic part, the entire method was rather confusing and could probably be organized in a way that combines it into a single simple if. I would short-circuit some of that (but I'd have to reread the entire class to get a firm grasp on it to see if it could be organized in a better way.)

On the metadata part, if that is the case then please do marshal it to a single file instead of a folder of files because meta data is small, but the file amount will be high, files are expensive and it will be cheaper in the long run to store it in a single file than thousands of smaller files. The cost of data storage and IO only benefits from large files (from my experience.)

That said, I must also imply that if it cannot keep track of itself between runs (and it's not a persistent process) that in and of itself adds a performance penalty that doesn't necessarily add a benefit to using this no? Unless I am completely misunderstanding but from what I'm to believe you would need to wrap around as a server with inotify/fswatch and track these file changes efficiently and proc off into a run. I don't fully understand the entirety of this code though as I've only gone over parts of it so I'll have a full look at it today or tomorrow and get a full grasp of it.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 8, 2014

Member

@envygeeks I'll see what I can do about the method.

The metadata is currently stored in a single YAML file, .jekyll-metadata. It does indeed add a slight performance penalty from the increased overhead, but this penalty is quite insignificant relative to the time otherwise needed to render Liquid pages/posts/documents.

Member

alfredxing commented Dec 8, 2014

@envygeeks I'll see what I can do about the method.

The metadata is currently stored in a single YAML file, .jekyll-metadata. It does indeed add a slight performance penalty from the increased overhead, but this penalty is quite insignificant relative to the time otherwise needed to render Liquid pages/posts/documents.

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 12, 2014

Member

@envygeeks Does something like this for regenerate? look better?

# Checks if a path should be regenerated
#
# Returns a boolean.
def regenerate?(path, skip_add = false)
  return true if disabled?

  # Check for path in cache
  if cache.has_key? path
    return cache[path]
  end

  # Check path that exists in metadata
  data = metadata[path]
  if data
    data["deps"].each do |dependency|
      if regenerate?(dependency)
        cache[dependency] = cache[path] = true
        return false
      end
    end
    if data["mtime"].eql? File.mtime(path)
      cache[path] = false
      return false
    else
      add(path) unless skip_add
      return true
    end
  end

  # Path does not exist in metadata, add it
  add(path) unless skip_add
  return true
end
Member

alfredxing commented Dec 12, 2014

@envygeeks Does something like this for regenerate? look better?

# Checks if a path should be regenerated
#
# Returns a boolean.
def regenerate?(path, skip_add = false)
  return true if disabled?

  # Check for path in cache
  if cache.has_key? path
    return cache[path]
  end

  # Check path that exists in metadata
  data = metadata[path]
  if data
    data["deps"].each do |dependency|
      if regenerate?(dependency)
        cache[dependency] = cache[path] = true
        return false
      end
    end
    if data["mtime"].eql? File.mtime(path)
      cache[path] = false
      return false
    else
      add(path) unless skip_add
      return true
    end
  end

  # Path does not exist in metadata, add it
  add(path) unless skip_add
  return true
end
@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Dec 12, 2014

Member

Man, that's a very complicated method...

Member

parkr commented Dec 12, 2014

Man, that's a very complicated method...

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Dec 12, 2014

Contributor

That is a pretty big method with a lot of context switching (or branches whatever you prefer to call it) but there is also a slight question that I feel the need to ask because your contexts muddy the matter, why are you short-circuiting false if a dependency needs to regenerate?

Contributor

envygeeks commented Dec 12, 2014

That is a pretty big method with a lot of context switching (or branches whatever you prefer to call it) but there is also a slight question that I feel the need to ask because your contexts muddy the matter, why are you short-circuiting false if a dependency needs to regenerate?

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Dec 12, 2014

Member

Yeah, it's quite a large function but I'm not really sure how (if it even should) to split it up. Maybe the middle data portion could be split off into a separate method?
The rewrite made it even longer since I took out assignment returns and a lot of the unclear condensed lines.

@envygeeks Thanks for catching that, it's a typo in my comment (originally it was return cache[dependency] = cache[path] = true.

Member

alfredxing commented Dec 12, 2014

Yeah, it's quite a large function but I'm not really sure how (if it even should) to split it up. Maybe the middle data portion could be split off into a separate method?
The rewrite made it even longer since I took out assignment returns and a lot of the unclear condensed lines.

@envygeeks Thanks for catching that, it's a typo in my comment (originally it was return cache[dependency] = cache[path] = true.

parkr added a commit that referenced this pull request Dec 22, 2014

@parkr parkr merged commit 7227ad4 into jekyll:master Dec 22, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

parkr added a commit that referenced this pull request Dec 22, 2014

@gjtorikian

This comment has been minimized.

Show comment
Hide comment
@gjtorikian

gjtorikian Dec 23, 2014

Member

Hooray!

Member

gjtorikian commented Dec 23, 2014

Hooray!

@parkr parkr referenced this pull request Jan 10, 2015

Closed

Incremental regeneration #380

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015

docs - three minor liquid language things which break in v.next of li…
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@ahgittin

This comment has been minimized.

Show comment
Hide comment
@ahgittin

ahgittin Jan 15, 2015

works well for us in brooklyn (https://github.com/apache/incubator-brooklyn/tree/master/docs ... modulo some liquid-induced changes apache/incubator-brooklyn#450 ).

specifics in case it's useful for people:

  • it only regens one page when we make an isolated change
  • regen time with several plugins and 100 pages is down from 10s to 1s (slightly surprised it's not down to 100ms though! 😛)
  • responds correctly to _layout and _includes changes by building dependent pages
  • serve -f triggers a full rebuild correctly

the last is very useful as we have menus built from a generator and incremental doesn't pick up all the dependencies there. but that's understandable, i like fast incremental as the default and -f is there when i need it.

great to have this!

works well for us in brooklyn (https://github.com/apache/incubator-brooklyn/tree/master/docs ... modulo some liquid-induced changes apache/incubator-brooklyn#450 ).

specifics in case it's useful for people:

  • it only regens one page when we make an isolated change
  • regen time with several plugins and 100 pages is down from 10s to 1s (slightly surprised it's not down to 100ms though! 😛)
  • responds correctly to _layout and _includes changes by building dependent pages
  • serve -f triggers a full rebuild correctly

the last is very useful as we have menus built from a generator and incremental doesn't pick up all the dependencies there. but that's understandable, i like fast incremental as the default and -f is there when i need it.

great to have this!

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015

docs - three minor liquid language things which break in v.next of li…
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015

docs - three minor liquid language things which break in v.next of li…
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@crearc

This comment has been minimized.

Show comment
Hide comment
@crearc

crearc Jan 15, 2015

I've been testing it out. Couple of liquid 3.0 fixes, but no biggie. But I did run into an issue when trying to serve it up, any ideas? Here's the stack trace:

Configuration file: /home/eric/workspace/website/_config.yml
            Source: src
       Destination: /home/eric/workspace/website/_site
 Incremental build: enabled
      Generating... 
/home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:322:in `block in write': undefined method `regenerate?' for #<Sprockets::StaticAsset:0x007f662353c5d8> (NoMethodError)
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:481:in `block (2 levels) in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `block in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:321:in `write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-assets-0.9.2/lib/jekyll/assets_plugin/patches/site_patch.rb:61:in `__wrap_write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:57:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/command.rb:28:in `process_site'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:58:in `build'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:34:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/serve.rb:26:in `block (2 levels) in init_with_program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `call'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `block in execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/program.rb:42:in `go'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary.rb:19:in `program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/bin/jekyll:17:in `<top (required)>'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `load'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `<main>'

crearc commented Jan 15, 2015

I've been testing it out. Couple of liquid 3.0 fixes, but no biggie. But I did run into an issue when trying to serve it up, any ideas? Here's the stack trace:

Configuration file: /home/eric/workspace/website/_config.yml
            Source: src
       Destination: /home/eric/workspace/website/_site
 Incremental build: enabled
      Generating... 
/home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:322:in `block in write': undefined method `regenerate?' for #<Sprockets::StaticAsset:0x007f662353c5d8> (NoMethodError)
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:481:in `block (2 levels) in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `block in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:321:in `write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-assets-0.9.2/lib/jekyll/assets_plugin/patches/site_patch.rb:61:in `__wrap_write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:57:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/command.rb:28:in `process_site'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:58:in `build'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:34:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/serve.rb:26:in `block (2 levels) in init_with_program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `call'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `block in execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/program.rb:42:in `go'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary.rb:19:in `program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/bin/jekyll:17:in `<top (required)>'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `load'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `<main>'

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015

docs - three minor liquid language things which break in v.next of li…
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Jan 15, 2015

Contributor

@crearc Please file an issue for that, I'd be happy to fix that today unless @alfredxing wants to.

Contributor

envygeeks commented Jan 15, 2015

@crearc Please file an issue for that, I'd be happy to fix that today unless @alfredxing wants to.

kevin1 added a commit to columbiafsae/website that referenced this pull request Jan 17, 2015

Deploy script now kind of does delta updates, while retaining safety
Two folders take turns being public_html. This allows us not to copy
most of the files, while still being safe to cancel the script at
any time. Users will never end up with a partially deployed site.

This feature won't be very useful until Jekyll stable has incremental
site regeneration, which is coming soon.

jekyll/jekyll#3116

@parkr parkr referenced this pull request Jan 18, 2015

Closed

3.0 RELEASE GAMEPLAN #3324

7 of 7 tasks complete

@mbland mbland referenced this pull request in 18F/hub Feb 18, 2015

Open

Improve build slowness #169

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 May 11, 2015

Contributor

This is really great work, we are seeing huge improvements on our site. Thanks so much @alfredxing and everybody else who was involved.

Contributor

fw42 commented May 11, 2015

This is really great work, we are seeing huge improvements on our site. Thanks so much @alfredxing and everybody else who was involved.

@phiresky phiresky referenced this pull request in ML-KA/ML-KA.github.io Jun 15, 2015

Closed

Webpage System #5

@alfredxing alfredxing deleted the alfredxing:incremental branch Oct 26, 2015

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.