-
-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move EntryFilter to use Pathutil & fix glob_include?
#4859
Move EntryFilter to use Pathutil & fix glob_include?
#4859
Conversation
This starts the move from Pathname to Pathutil, so that we can get a slightly faster Pathname as Pathutil with the addition of encoded and normalized file read and writes (which should solve a lot of our problems with international users.) And a wrapped SafeYAML that moves between Psych#safe and SafeYAML. |
I like this PR but it has style issues. Could things be cleaned up so that, e.g. newlines are not added unless it exceeds 90 characters wide? Empty comments should be removed, etc. |
@@ -36,4 +36,5 @@ Gem::Specification.new do |s| | |||
s.add_runtime_dependency('rouge', '~> 1.7') | |||
s.add_runtime_dependency('jekyll-sass-converter', '~> 1.0') | |||
s.add_runtime_dependency('jekyll-watch', '~> 1.1') | |||
s.add_runtime_dependency("pathutil", "~> 0.9") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jekyll/security Possibility for v3.2 is the addition of this gem as a dependency. Pathutil is a pure-Ruby implementation of a Pathname-like interface which has some extra features and also offers a speed increase. 👀 ok to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the main driver for using it? Is it performance, centralization of security logic, or something else? I haven't looked through these changes in depth, but, are you going to be leaning on some of the security provided by pathutil
? For example, it seems that gem has some features like safe_copy
and enforce_root
. But, those features don't seem super well tested (ex. https://github.com/envygeeks/pathutil/blob/d03bd830f629dc820bdf791d06011a42e7be35a8/spec/tests/lib/pathutil_spec.rb#L1010-L1049). If you are going to lean on those features it would be nice for them to be much more rigorously tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptoomey3 scroll down a few lines...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptoomey3 the tests for those that method is more like: https://github.com/envygeeks/pathutil/blob/d03bd830f629dc820bdf791d06011a42e7be35a8/spec/tests/lib/pathutil_spec.rb#L962-L1174
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptoomey3 Yes, the plan would be to lean on those safe_*
methods where possible so changes to Jekyll core could be focused more on features/bug fixes. It would be nice to know that wherever we read using Pathutil, we were safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptoomey3 yes, we don't respond to some methods from Pathname like cleanpath (which Jekyll doesn't use) because we do all that internally where we can or let StdLib deal with most of it, but I'm happy to implement it if you like. For the most part, 90% of the methods that we have on that class are delegated, with the exception of a few:
#read
diverges, it normalizes and encodes by default (this was designed for Jekyll.)#write
diverges, it normalizes and encoded by default (this was designed for Jekyll.)- We offer
read_yaml
andread_json
both written with my own library in mind and Jekyll. Right now Jekyll relies heavily on SafeYAML, while YAML itself offers Yaml#safe so this wraps around both and looks for the latter while falling back to the former. - We offer >, <, >=, <=, =~ and others which check if it's below, above, or like and also allows for a Regexp match of the said, sometimes people want Regexp's, so we made it available.
- Our glob is "safer", in that it will
chdir
before doing the glob to avoid any potential issues with globbing, this was a bug in Jekyll. - Some methods that Pathname had customized, we route to their real classes directly (those can all be found at the bottom of the class, this is where most of the methods come from, we don't directly do them, we actually delegate them out.)
On the #read
, #write
when a read comes in and it's Windows, we actually convert to Linux and then if it's Windows we convert back to Windows. We enforce UTF-8 by default, which should solve a lot of Jekyll problems, and we also allow the default encoding to be set so Jekyll's encoding can now be Pathutil's encoding. That specific part was designed explicitly for Jekyll and falls along the lines of what I had mentioned in the past "lets normalize everything and then normalize it back out."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as all this discussion, I thought this was a security review, not a review of features Jekyll doesn't utilize and have no impact on Jekyll in total 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add another note, #cleanpath
is the only method we do not respond to, as we explicitly check that we respond to all methods but cleanpath
which we explicitly ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe...I was just wanting to make sure I understood, as I was assuming that any non-explicitly implemted behavior would have been delegated to Pathname
under the covers. Given that any behavioral change to Pathname
could have security consequence, I wanted to be clear on what/how things are delegated. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we respond to all methods but cleanpath which we explicitly ignored.
Too funny..I literally grabbed cleanpath
randomly from the docs to test how the delegation worked 😄.
The overall approach seems sounds, though it would be good if someone else that is super familiar with Jekyll, and what issues we have run into before (such as @parkr), could give the gem a read through.
* Allow users to filter directories by ending their path with "/" * Allow users to filter with a Regexp, some scenariors can really require it. * Use Pathutil#in_path? for Symlink verification, it real/expand. This also requires some downstream work in "jekyll-watch" which at this time is not very robust, it doesn't recognize the difference either, and should probably start doing so (what I mean is detecting "/" and using the full path.)
e5d8223
to
3751b47
Compare
File.join(base_directory, entry) | ||
File.join( | ||
base_directory, entry | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sorts of changes should not be included in pull requests like this. They muddy up the diff so it's harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your argument is a facetious Catch-22 based on what you've expressed before about committing to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@envygeeks My argument is: keep pull request diffs to content changes because it's easier to review. I don't think that's facetious at all – if you commit style changes to master, so be it, but don't include the style changes in pull requests if you can help it.
I'm going to fix up rubocop errors soon and add that to the CI pipeline so that style changes can be handled by Travis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree then! If we can make minor style changes in master, I'll remove any and all non-actual changes out of this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@envygeeks Once we get Rubocop more stable and all the code working with it, then we can let CI determine style changes. I personally don't like the style you have here, so we'll have to hash out some of those differences in that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @parkr if you want I've already got all that done, I just never pushed it up. If you want I'll push it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, let me clarify, the Rubocop on Travis-CI is already done, I just never pushed it.
glob_include?
@jekyllbot: merge +minor |
This also requires some downstream work in "jekyll-watch" which at this time is
not very robust, it doesn't recognize the difference either, and should probably
start doing so (what I mean is detecting "/" and using the full path.)