Move EntryFilter to use Pathutil & fix `glob_include?` #4859

Merged
merged 1 commit into from May 12, 2016

Conversation

Projects
None yet
4 participants
@envygeeks
Contributor

envygeeks commented Apr 30, 2016

  • Allow users to filter directories by ending their path with "/"
  • Allow users to filter with a Regexp, some scenariors can really require it.
  • Use Pathutil#in_path? for Symlink verification, it real/expand.

This also requires some downstream work in "jekyll-watch" which at this time is
not very robust, it doesn't recognize the difference either, and should probably
start doing so (what I mean is detecting "/" and using the full path.)

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Apr 30, 2016

Contributor

This starts the move from Pathname to Pathutil, so that we can get a slightly faster Pathname as Pathutil with the addition of encoded and normalized file read and writes (which should solve a lot of our problems with international users.) And a wrapped SafeYAML that moves between Psych#safe and SafeYAML.

Contributor

envygeeks commented Apr 30, 2016

This starts the move from Pathname to Pathutil, so that we can get a slightly faster Pathname as Pathutil with the addition of encoded and normalized file read and writes (which should solve a lot of our problems with international users.) And a wrapped SafeYAML that moves between Psych#safe and SafeYAML.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr May 2, 2016

Member

I like this PR but it has style issues. Could things be cleaned up so that, e.g. newlines are not added unless it exceeds 90 characters wide? Empty comments should be removed, etc.

Member

parkr commented May 2, 2016

I like this PR but it has style issues. Could things be cleaned up so that, e.g. newlines are not added unless it exceeds 90 characters wide? Empty comments should be removed, etc.

@@ -36,4 +36,5 @@ Gem::Specification.new do |s|
s.add_runtime_dependency('rouge', '~> 1.7')
s.add_runtime_dependency('jekyll-sass-converter', '~> 1.0')
s.add_runtime_dependency('jekyll-watch', '~> 1.1')
+ s.add_runtime_dependency("pathutil", "~> 0.9")

This comment has been minimized.

@parkr

parkr May 2, 2016

Member

@jekyll/security Possibility for v3.2 is the addition of this gem as a dependency. Pathutil is a pure-Ruby implementation of a Pathname-like interface which has some extra features and also offers a speed increase. 👀 ok to you?

@parkr

parkr May 2, 2016

Member

@jekyll/security Possibility for v3.2 is the addition of this gem as a dependency. Pathutil is a pure-Ruby implementation of a Pathname-like interface which has some extra features and also offers a speed increase. 👀 ok to you?

This comment has been minimized.

@ptoomey3

ptoomey3 May 2, 2016

What is the main driver for using it? Is it performance, centralization of security logic, or something else? I haven't looked through these changes in depth, but, are you going to be leaning on some of the security provided by pathutil? For example, it seems that gem has some features like safe_copy and enforce_root. But, those features don't seem super well tested (ex. https://github.com/envygeeks/pathutil/blob/d03bd830f629dc820bdf791d06011a42e7be35a8/spec/tests/lib/pathutil_spec.rb#L1010-L1049). If you are going to lean on those features it would be nice for them to be much more rigorously tested.

@ptoomey3

ptoomey3 May 2, 2016

What is the main driver for using it? Is it performance, centralization of security logic, or something else? I haven't looked through these changes in depth, but, are you going to be leaning on some of the security provided by pathutil? For example, it seems that gem has some features like safe_copy and enforce_root. But, those features don't seem super well tested (ex. https://github.com/envygeeks/pathutil/blob/d03bd830f629dc820bdf791d06011a42e7be35a8/spec/tests/lib/pathutil_spec.rb#L1010-L1049). If you are going to lean on those features it would be nice for them to be much more rigorously tested.

This comment has been minimized.

@envygeeks

envygeeks May 2, 2016

Contributor

@ptoomey3 scroll down a few lines...

@envygeeks

envygeeks May 2, 2016

Contributor

@ptoomey3 scroll down a few lines...

This comment has been minimized.

@parkr

parkr May 2, 2016

Member

@ptoomey3 Yes, the plan would be to lean on those safe_* methods where possible so changes to Jekyll core could be focused more on features/bug fixes. It would be nice to know that wherever we read using Pathutil, we were safe.

@parkr

parkr May 2, 2016

Member

@ptoomey3 Yes, the plan would be to lean on those safe_* methods where possible so changes to Jekyll core could be focused more on features/bug fixes. It would be nice to know that wherever we read using Pathutil, we were safe.

This comment has been minimized.

@envygeeks

envygeeks May 2, 2016

Contributor

@ptoomey3 yes, we don't respond to some methods from Pathname like cleanpath (which Jekyll doesn't use) because we do all that internally where we can or let StdLib deal with most of it, but I'm happy to implement it if you like. For the most part, 90% of the methods that we have on that class are delegated, with the exception of a few:

  • #read diverges, it normalizes and encodes by default (this was designed for Jekyll.)
  • #write diverges, it normalizes and encoded by default (this was designed for Jekyll.)
  • We offer read_yaml and read_json both written with my own library in mind and Jekyll. Right now Jekyll relies heavily on SafeYAML, while YAML itself offers Yaml#safe so this wraps around both and looks for the latter while falling back to the former.
  • We offer >, <, >=, <=, =~ and others which check if it's below, above, or like and also allows for a Regexp match of the said, sometimes people want Regexp's, so we made it available.
  • Our glob is "safer", in that it will chdir before doing the glob to avoid any potential issues with globbing, this was a bug in Jekyll.
  • Some methods that Pathname had customized, we route to their real classes directly (those can all be found at the bottom of the class, this is where most of the methods come from, we don't directly do them, we actually delegate them out.)

On the #read, #write when a read comes in and it's Windows, we actually convert to Linux and then if it's Windows we convert back to Windows. We enforce UTF-8 by default, which should solve a lot of Jekyll problems, and we also allow the default encoding to be set so Jekyll's encoding can now be Pathutil's encoding. That specific part was designed explicitly for Jekyll and falls along the lines of what I had mentioned in the past "lets normalize everything and then normalize it back out."

@envygeeks

envygeeks May 2, 2016

Contributor

@ptoomey3 yes, we don't respond to some methods from Pathname like cleanpath (which Jekyll doesn't use) because we do all that internally where we can or let StdLib deal with most of it, but I'm happy to implement it if you like. For the most part, 90% of the methods that we have on that class are delegated, with the exception of a few:

  • #read diverges, it normalizes and encodes by default (this was designed for Jekyll.)
  • #write diverges, it normalizes and encoded by default (this was designed for Jekyll.)
  • We offer read_yaml and read_json both written with my own library in mind and Jekyll. Right now Jekyll relies heavily on SafeYAML, while YAML itself offers Yaml#safe so this wraps around both and looks for the latter while falling back to the former.
  • We offer >, <, >=, <=, =~ and others which check if it's below, above, or like and also allows for a Regexp match of the said, sometimes people want Regexp's, so we made it available.
  • Our glob is "safer", in that it will chdir before doing the glob to avoid any potential issues with globbing, this was a bug in Jekyll.
  • Some methods that Pathname had customized, we route to their real classes directly (those can all be found at the bottom of the class, this is where most of the methods come from, we don't directly do them, we actually delegate them out.)

On the #read, #write when a read comes in and it's Windows, we actually convert to Linux and then if it's Windows we convert back to Windows. We enforce UTF-8 by default, which should solve a lot of Jekyll problems, and we also allow the default encoding to be set so Jekyll's encoding can now be Pathutil's encoding. That specific part was designed explicitly for Jekyll and falls along the lines of what I had mentioned in the past "lets normalize everything and then normalize it back out."

This comment has been minimized.

@envygeeks

envygeeks May 2, 2016

Contributor

As far as all this discussion, I thought this was a security review, not a review of features Jekyll doesn't utilize and have no impact on Jekyll in total 😜

@envygeeks

envygeeks May 2, 2016

Contributor

As far as all this discussion, I thought this was a security review, not a review of features Jekyll doesn't utilize and have no impact on Jekyll in total 😜

This comment has been minimized.

@envygeeks

envygeeks May 2, 2016

Contributor

To add another note, #cleanpath is the only method we do not respond to, as we explicitly check that we respond to all methods but cleanpath which we explicitly ignored.

@envygeeks

envygeeks May 2, 2016

Contributor

To add another note, #cleanpath is the only method we do not respond to, as we explicitly check that we respond to all methods but cleanpath which we explicitly ignored.

This comment has been minimized.

@ptoomey3

ptoomey3 May 2, 2016

Hehe...I was just wanting to make sure I understood, as I was assuming that any non-explicitly implemted behavior would have been delegated to Pathname under the covers. Given that any behavioral change to Pathname could have security consequence, I wanted to be clear on what/how things are delegated. Thanks for the clarification.

@ptoomey3

ptoomey3 May 2, 2016

Hehe...I was just wanting to make sure I understood, as I was assuming that any non-explicitly implemted behavior would have been delegated to Pathname under the covers. Given that any behavioral change to Pathname could have security consequence, I wanted to be clear on what/how things are delegated. Thanks for the clarification.

This comment has been minimized.

@ptoomey3

ptoomey3 May 3, 2016

we respond to all methods but cleanpath which we explicitly ignored.

Too funny..I literally grabbed cleanpath randomly from the docs to test how the delegation worked 😄.

The overall approach seems sounds, though it would be good if someone else that is super familiar with Jekyll, and what issues we have run into before (such as @parkr), could give the gem a read through.

@ptoomey3

ptoomey3 May 3, 2016

we respond to all methods but cleanpath which we explicitly ignored.

Too funny..I literally grabbed cleanpath randomly from the docs to test how the delegation worked 😄.

The overall approach seems sounds, though it would be good if someone else that is super familiar with Jekyll, and what issues we have run into before (such as @parkr), could give the gem a read through.

@parkr parkr added the needs-work label May 2, 2016

Cleanup EntryFilter and make it far more robust.
* Allow users to filter directories by ending their path with "/"
* Allow users to filter with a Regexp, some scenariors can really require it.
* Use Pathutil#in_path? for Symlink verification, it real/expand.

This also requires some downstream work in "jekyll-watch" which at this time is
not very robust, it doesn't recognize the difference either, and should probably
start doing so (what I mean is detecting "/" and using the full path.)
- File.join(base_directory, entry)
+ File.join(
+ base_directory, entry
+ )

This comment has been minimized.

@parkr

parkr May 12, 2016

Member

These sorts of changes should not be included in pull requests like this. They muddy up the diff so it's harder to review.

@parkr

parkr May 12, 2016

Member

These sorts of changes should not be included in pull requests like this. They muddy up the diff so it's harder to review.

This comment has been minimized.

@envygeeks

envygeeks May 12, 2016

Contributor

Your argument is a facetious Catch-22 based on what you've expressed before about committing to master.

@envygeeks

envygeeks May 12, 2016

Contributor

Your argument is a facetious Catch-22 based on what you've expressed before about committing to master.

This comment has been minimized.

@parkr

parkr May 12, 2016

Member

@envygeeks My argument is: keep pull request diffs to content changes because it's easier to review. I don't think that's facetious at all – if you commit style changes to master, so be it, but don't include the style changes in pull requests if you can help it.

I'm going to fix up rubocop errors soon and add that to the CI pipeline so that style changes can be handled by Travis.

@parkr

parkr May 12, 2016

Member

@envygeeks My argument is: keep pull request diffs to content changes because it's easier to review. I don't think that's facetious at all – if you commit style changes to master, so be it, but don't include the style changes in pull requests if you can help it.

I'm going to fix up rubocop errors soon and add that to the CI pipeline so that style changes can be handled by Travis.

This comment has been minimized.

@envygeeks

envygeeks May 12, 2016

Contributor

I agree then! If we can make minor style changes in master, I'll remove any and all non-actual changes out of this!

@envygeeks

envygeeks May 12, 2016

Contributor

I agree then! If we can make minor style changes in master, I'll remove any and all non-actual changes out of this!

This comment has been minimized.

@parkr

parkr May 12, 2016

Member

@envygeeks Once we get Rubocop more stable and all the code working with it, then we can let CI determine style changes. I personally don't like the style you have here, so we'll have to hash out some of those differences in that PR.

@parkr

parkr May 12, 2016

Member

@envygeeks Once we get Rubocop more stable and all the code working with it, then we can let CI determine style changes. I personally don't like the style you have here, so we'll have to hash out some of those differences in that PR.

This comment has been minimized.

@envygeeks

envygeeks May 12, 2016

Contributor

Also @parkr if you want I've already got all that done, I just never pushed it up. If you want I'll push it.

@envygeeks

envygeeks May 12, 2016

Contributor

Also @parkr if you want I've already got all that done, I just never pushed it up. If you want I'll push it.

This comment has been minimized.

@envygeeks

envygeeks May 12, 2016

Contributor

Sorry, let me clarify, the Rubocop on Travis-CI is already done, I just never pushed it.

@envygeeks

envygeeks May 12, 2016

Contributor

Sorry, let me clarify, the Rubocop on Travis-CI is already done, I just never pushed it.

- glob_include?(site.include, entry)
+ glob_include?(site.include,
+ entry
+ )

This comment has been minimized.

@parkr

parkr May 12, 2016

Member

Which is it – both arguments on a new line or just the second? Let's get a rubocop rule in for this.

@parkr

parkr May 12, 2016

Member

Which is it – both arguments on a new line or just the second? Let's get a rubocop rule in for this.

@parkr parkr changed the title from Cleanup EntryFilter and make it far more robust. to Move EntryFilter to use Pathutil & fix `glob_include?` May 12, 2016

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr May 12, 2016

Member

@jekyllbot: merge +minor

Member

parkr commented May 12, 2016

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 1504864 into master May 12, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jekyllbot jekyllbot deleted the feature/allow-excludes-to-leave-the-insane-asylum branch May 12, 2016

@parkr parkr removed the needs-work label May 12, 2016

jekyllbot added a commit that referenced this pull request May 12, 2016

@parkr parkr added this to the 3.2 milestone May 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment