New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob scope path only if configured with a pattern #6692

Merged
merged 6 commits into from Jan 19, 2018

Conversation

Projects
None yet
7 participants
@ashmaroli
Member

ashmaroli commented Jan 11, 2018

Resolves #6691

Take the less expensive route unless the user configures the scope["path"] with a glob pattern. e.g. include

  • "content/*.pdf"
  • "content/**/*"

@ashmaroli ashmaroli requested a review from jekyll/performance Jan 11, 2018

@ashmaroli ashmaroli added the fix label Jan 11, 2018

@ashmaroli ashmaroli added this to the v3.7.1 milestone Jan 11, 2018

@Crunch09

👍👏 LGTM! Might be worth adding a test to check that Dir.glob is no longer called for pattern-less paths?
Also can you confirm that this speeds up https://github.com/mmistakes/front-matter-defaults.git again?

@DirtyF

This comment has been minimized.

Member

DirtyF commented Jan 11, 2018

On latest macOS with my macbook air and @mmistakes test repo:

3.6.2 3.7.0 Fix
16.279s 25.321s 15.913s
18.276s 30.145s 15.979s
17.406s 29.49s 16.09s
@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 11, 2018

Might be worth adding a test to check that

@Crunch09 That's a good idea to be extra sure.. Will see if I can come up with a simple check..

On latest macOS with my macbook air

@DirtyF Thank you very much for running the tests 💯

@mmistakes

This comment has been minimized.

mmistakes commented Jan 11, 2018

Tested the fix on the front-matter-defaults repo and my large personal site... both are back to building at 3.6.2 speeds.

😄

@@ -123,6 +128,13 @@ def path_is_subpath?(path, parent_path)
false
end
def glob_pattern?(path)
path.each_filename do |str|
return true if str =~ %r!\*+|\?|\[.*\]|{.*}!

This comment has been minimized.

@parkr

parkr Jan 11, 2018

Member

Does path.include?(“*”) not work for this case?

This comment has been minimized.

@ashmaroli

ashmaroli Jan 12, 2018

Member

I wanted to test the path component string for specific glob patterns. and regex seems better suited for it than multiple String#include? calls

This comment has been minimized.

@parkr

parkr Jan 14, 2018

Member

@ashmaroli Why do we need to test each filename separately? AFAIK, the presence of * in scope["path"] is enough to know it's a glob pattern.

This comment has been minimized.

@ashmaroli

ashmaroli Jan 14, 2018

Member

@parkr

defaults:
  - scope:
      path: "_puppies/arrears.?"
      type: "puppies"
    values:
      layout: arrears

is also a glob pattern to catch all kinds of files with basename arrears, if you ask me..

This comment has been minimized.

@parkr

parkr Jan 14, 2018

Member

I haven't seen or used ? before. Is that a Windows-ism?

Let's decide which glob patterns we should support and write individual tests using them so we know they work and are supported. The original PR only mentioned * which is why I mention that one only.

This comment has been minimized.

@parkr

parkr Jan 14, 2018

Member

FWIW, for your example, I would have used "_puppets/arrears.*".

scope_path = Pathname.new(scope_path).relative_path_from site_path
return true if path_is_subpath?(sanitized_path, scope_path)
if glob_pattern?(rel_scope_path)

This comment has been minimized.

@parkr

parkr Jan 14, 2018

Member

So instead of this, we do if scope["path"].to_s.include?("*"). Or you can change the implementation of glob_pattern? to just do path.to_s.include?("*"). What do you think?

@parkr

parkr requested changes Jan 14, 2018 edited

A simple unit test for glob_pattern? would be great if you don't mind. Using my suggestion:

should "match globs" do
  assert @site.frontmatter_defaults.glob_include?("content/*.pdf")
end

should "not match non-globs" do
  refute @site.frontmatter_defaults.glob_include?("layouts/")
end
@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 14, 2018

if you don't mind

@parkr I don't mind adding tests at all.. but just that I find it difficult to digest your suggestion above..
The absence of * in the scope["path"] doesn't mean it's not a glob pattern..

/cc @pathawks for your inputs.. (Thanks in advance..)

Regarding the unit test,
I won't be able to assert @site.frontmatter_defaults.glob_include?("content/*.pdf") because glob_include? is a private method.. ( yeah, I can use :send.. )

@ashmaroli ashmaroli requested a review from pathawks Jan 14, 2018

@DirtyF

This comment has been minimized.

Member

DirtyF commented Jan 14, 2018

Those use case seems already fine for most users, we never spoke of supporting all possible regexp patterns in path, did we?

"content/*.pdf"
"content/**/*"

Let's try to keep it simple here for now, and ship a quick fix.

@ashmaroli ashmaroli referenced this pull request Jan 15, 2018

Merged

Release v3.7.1 #6695

7 of 7 tasks complete
scope_path = Pathname.new(scope_path).relative_path_from site_path
Jekyll.logger.debug "Globbed Scope Path:", scope_path
return true if path_is_subpath?(sanitized_path, scope_path)
end
end
path_is_subpath?(sanitized_path, rel_scope_path)

This comment has been minimized.

@pathawks

pathawks Jan 17, 2018

Member

Could we make this an else for the previous block (if scope["path"].to_s.include?("*"))?
Multiple exits can get confusing.

This comment has been minimized.

@ashmaroli

ashmaroli Jan 18, 2018

Member

We can't make this an else branch because if diff-ln#113 evaluates to false even at the end of looping the globbed array, diff-ln#117 acts as a fallback..

That said, it does not make sense to me why that fallback's needed..

@parkr

parkr approved these changes Jan 18, 2018

Looks good to me!

<div class="note warning">
<h5>Globbing and Performance</h5>
<p>
Please note that globbing a path is a known performance-poison and is

This comment has been minimized.

@parkr

parkr Jan 18, 2018

Member

The poison sounds really terrifying! Can we rephrase?

Please note that front matter defaults with glob paths have known performance problems.
Windows users may notice especially high build times by adding globbing. We are tracking [in this issue](let's open an issue).

This comment has been minimized.

@ashmaroli

ashmaroli Jan 18, 2018

Member

We are tracking [in this issue](let's open an issue).

I believe this is an issue with Ruby for Windows itself as Ruby is not optimized for Windows.. There's not much one can do here IMO..

So.. I'll just re-phrase the "poison" bit to something more mellow..

@pathawks

👏🏼

@DirtyF

This comment has been minimized.

Member

DirtyF commented Jan 19, 2018

@jekyllbot: merge +bug

@jekyllbot jekyllbot merged commit 009d308 into jekyll:master Jan 19, 2018

3 checks passed

WIP ready for review
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jekyllbot jekyllbot added bug fix labels Jan 19, 2018

@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 23, 2018

@mmistakes If possible can you please provide the following feedback on this change using your front-matter-defaults repo.. and Jekyll's master branch..?

  1. Build times on Windows and Unix with jekyll build
  2. Build times on Windows and Unix with jekyll build --watch followed by a minor change to about.md
  3. Build times on Windows and Unix with jekyll serve followed by a minor change to about.md

Thanks in advance..

@mmistakes

This comment has been minimized.

mmistakes commented Jan 23, 2018

Sure thing @ashmaroli, here's what I'm getting on Windows.

3.6.2 3.7.0 jekyll/jekyll master
1. 20.098s 48.487s 20.729s
2. 20.721s 49.879s 21.192s
3. 20.977s 50.313s 21.518s
@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 23, 2018

here's what I'm getting

Right.. and the regeneration times..?

@mmistakes

This comment has been minimized.

mmistakes commented Jan 23, 2018

Those are the regeneration times. It wasn't any faster regenerating the page after about.md was modified. Still took ~20s... or in the case of 3.7.0 around 50.

@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 23, 2018

Okay buddy.. Thanks a lot..
On my system (Windows 7, Ruby 2.3, jekyll:master and minima:master)
the build times were:

  • initial-build: 45.412 seconds
  • regeneratn.: 254.069825 seconds

Something's wrong somewhere!!

@mmistakes

This comment has been minimized.

mmistakes commented Jan 23, 2018

Forgot to mention, I added the wdm gem to my Gemfile as instructed by this warning:

gem 'wdm', '>= 0.1.0' if Gem.win_platform?

Without it the regen speeds were much slower for some reason.

Without wdm:

  • initial-build: 20.914 seconds
  • regeneration: 37.999 seconds

With wdm:

  • initial-build: 20.624 seconds
  • regeneration: 21.493 seconds
@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Jan 23, 2018

MIND=BLOWN !!
Two years of Jekyll on Windows and I finally realize the truth behind the suggestion to include gem 'wdm'..

Without wdm:

  • initial-build: 47.503 seconds
  • regeneratn.: 246.402432 seconds

With wdm:

  • initial-build: 45.334 seconds
  • regeneratn.: 48.469285 seconds

💯 👏 @mmistakes

@mmistakes

This comment has been minimized.

mmistakes commented Jan 23, 2018

@ashmaroli 😃

I've never really known what adding wdm did. I feel like at one point --watch stopped working for me reliably so I added it, but didn't know it helped performance until now.

@ashmaroli ashmaroli deleted the ashmaroli:glob-opt-in branch Jan 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment