Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move EntryFilter to use Pathutil & fix glob_include? #4859

Merged
merged 1 commit into from
May 12, 2016

Conversation

envygeeks
Copy link
Contributor

  • Allow users to filter directories by ending their path with "/"
  • Allow users to filter with a Regexp, some scenariors can really require it.
  • Use Pathutil#in_path? for Symlink verification, it real/expand.

This also requires some downstream work in "jekyll-watch" which at this time is
not very robust, it doesn't recognize the difference either, and should probably
start doing so (what I mean is detecting "/" and using the full path.)

@envygeeks
Copy link
Contributor Author

This starts the move from Pathname to Pathutil, so that we can get a slightly faster Pathname as Pathutil with the addition of encoded and normalized file read and writes (which should solve a lot of our problems with international users.) And a wrapped SafeYAML that moves between Psych#safe and SafeYAML.

@parkr
Copy link
Member

parkr commented May 2, 2016

I like this PR but it has style issues. Could things be cleaned up so that, e.g. newlines are not added unless it exceeds 90 characters wide? Empty comments should be removed, etc.

@@ -36,4 +36,5 @@ Gem::Specification.new do |s|
s.add_runtime_dependency('rouge', '~> 1.7')
s.add_runtime_dependency('jekyll-sass-converter', '~> 1.0')
s.add_runtime_dependency('jekyll-watch', '~> 1.1')
s.add_runtime_dependency("pathutil", "~> 0.9")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jekyll/security Possibility for v3.2 is the addition of this gem as a dependency. Pathutil is a pure-Ruby implementation of a Pathname-like interface which has some extra features and also offers a speed increase. 👀 ok to you?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the main driver for using it? Is it performance, centralization of security logic, or something else? I haven't looked through these changes in depth, but, are you going to be leaning on some of the security provided by pathutil? For example, it seems that gem has some features like safe_copy and enforce_root. But, those features don't seem super well tested (ex. https://github.com/envygeeks/pathutil/blob/d03bd830f629dc820bdf791d06011a42e7be35a8/spec/tests/lib/pathutil_spec.rb#L1010-L1049). If you are going to lean on those features it would be nice for them to be much more rigorously tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptoomey3 scroll down a few lines...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptoomey3 Yes, the plan would be to lean on those safe_* methods where possible so changes to Jekyll core could be focused more on features/bug fixes. It would be nice to know that wherever we read using Pathutil, we were safe.

Copy link
Contributor Author

@envygeeks envygeeks May 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptoomey3 yes, we don't respond to some methods from Pathname like cleanpath (which Jekyll doesn't use) because we do all that internally where we can or let StdLib deal with most of it, but I'm happy to implement it if you like. For the most part, 90% of the methods that we have on that class are delegated, with the exception of a few:

  • #read diverges, it normalizes and encodes by default (this was designed for Jekyll.)
  • #write diverges, it normalizes and encoded by default (this was designed for Jekyll.)
  • We offer read_yaml and read_json both written with my own library in mind and Jekyll. Right now Jekyll relies heavily on SafeYAML, while YAML itself offers Yaml#safe so this wraps around both and looks for the latter while falling back to the former.
  • We offer >, <, >=, <=, =~ and others which check if it's below, above, or like and also allows for a Regexp match of the said, sometimes people want Regexp's, so we made it available.
  • Our glob is "safer", in that it will chdir before doing the glob to avoid any potential issues with globbing, this was a bug in Jekyll.
  • Some methods that Pathname had customized, we route to their real classes directly (those can all be found at the bottom of the class, this is where most of the methods come from, we don't directly do them, we actually delegate them out.)

On the #read, #write when a read comes in and it's Windows, we actually convert to Linux and then if it's Windows we convert back to Windows. We enforce UTF-8 by default, which should solve a lot of Jekyll problems, and we also allow the default encoding to be set so Jekyll's encoding can now be Pathutil's encoding. That specific part was designed explicitly for Jekyll and falls along the lines of what I had mentioned in the past "lets normalize everything and then normalize it back out."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as all this discussion, I thought this was a security review, not a review of features Jekyll doesn't utilize and have no impact on Jekyll in total 😜

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add another note, #cleanpath is the only method we do not respond to, as we explicitly check that we respond to all methods but cleanpath which we explicitly ignored.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe...I was just wanting to make sure I understood, as I was assuming that any non-explicitly implemted behavior would have been delegated to Pathname under the covers. Given that any behavioral change to Pathname could have security consequence, I wanted to be clear on what/how things are delegated. Thanks for the clarification.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we respond to all methods but cleanpath which we explicitly ignored.

Too funny..I literally grabbed cleanpath randomly from the docs to test how the delegation worked 😄.

The overall approach seems sounds, though it would be good if someone else that is super familiar with Jekyll, and what issues we have run into before (such as @parkr), could give the gem a read through.

* Allow users to filter directories by ending their path with "/"
* Allow users to filter with a Regexp, some scenariors can really require it.
* Use Pathutil#in_path? for Symlink verification, it real/expand.

This also requires some downstream work in "jekyll-watch" which at this time is
not very robust, it doesn't recognize the difference either, and should probably
start doing so (what I mean is detecting "/" and using the full path.)
@envygeeks envygeeks force-pushed the feature/allow-excludes-to-leave-the-insane-asylum branch from e5d8223 to 3751b47 Compare May 11, 2016 01:01
File.join(base_directory, entry)
File.join(
base_directory, entry
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sorts of changes should not be included in pull requests like this. They muddy up the diff so it's harder to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your argument is a facetious Catch-22 based on what you've expressed before about committing to master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@envygeeks My argument is: keep pull request diffs to content changes because it's easier to review. I don't think that's facetious at all – if you commit style changes to master, so be it, but don't include the style changes in pull requests if you can help it.

I'm going to fix up rubocop errors soon and add that to the CI pipeline so that style changes can be handled by Travis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree then! If we can make minor style changes in master, I'll remove any and all non-actual changes out of this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@envygeeks Once we get Rubocop more stable and all the code working with it, then we can let CI determine style changes. I personally don't like the style you have here, so we'll have to hash out some of those differences in that PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @parkr if you want I've already got all that done, I just never pushed it up. If you want I'll push it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, let me clarify, the Rubocop on Travis-CI is already done, I just never pushed it.

@parkr parkr changed the title Cleanup EntryFilter and make it far more robust. Move EntryFilter to use Pathutil & fix glob_include? May 12, 2016
@parkr
Copy link
Member

parkr commented May 12, 2016

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 1504864 into master May 12, 2016
@jekyllbot jekyllbot deleted the feature/allow-excludes-to-leave-the-insane-asylum branch May 12, 2016 17:16
@parkr parkr removed the needs-work label May 12, 2016
jekyllbot added a commit that referenced this pull request May 12, 2016
@parkr parkr added this to the 3.2 milestone May 12, 2016
@jekyll jekyll locked and limited conversation to collaborators Jul 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants