-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not shellescape URLs since it borks params #312
Conversation
@sigmavirus24 Got a second to see you agree with me that this has minimal security implications? |
Ugg. Our wkhtmltopdf source is 404ing. I'll fix that in another commit, but likely not tonight. This will wait on that. |
So... I'm trying to decide. I think we could mitigate this, but no matter what this will have security implications. Since we interpolate this into the command, we're going to have something like this PDFKit.new('https://google.com/search?q=pdfkit; do_something #') Which would be interpolated into:
Which is what @devn was trying to avoid by using shellescape. We can take solace in a few things. For one, we can probably use URI tools to normalize the URI we're provided. That said, I'm not sure if the uri library in Ruby does RFC 3986 normalization. I would guess that Addressable does. One of us would have to look into this though. |
Looks like the uri library would warn us of problematic URLs ~ ❯❯❯ irb
1.9.3-p448 :001 > require 'uri'
=> true
1.9.3-p448 :002 > google = 'https://google.com/search?q=pdfkit; do_something # --args'
=> "https://google.com/search?q=pdfkit; do_something # --args"
1.9.3-p448 :003 > u = URI(google)
URI::InvalidURIError: bad URI(is not URI?): https://google.com/search?q=pdfkit; do_something # --args
from /usr/local/rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:176:in `split'
from /usr/local/rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:211:in `parse'
from /usr/local/rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:747:in `parse'
from /usr/local/rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:994:in `URI'
from (irb):3
from /usr/local/rvm/rubies/ruby-1.9.3-p448/bin/irb:16:in `<main>' That says to me that if that barfs, so can we. |
Spaces are always encoded in a URI as either |
Interestingly, we do not consider that URL to be HTML, which means this change wouldn't affect that URL.
|
Hahaha, can I take back that moment of insanity? Totally wrong. License to code revoked! |
Okay, so, I'm cool with bubbling up an error on initialize if the source is not HTML and the URL is not parseable by URI. |
So even attempting to get the URL would be a problem. We should parse the URI first. Then we should bubble up an error immediately. Even attempting to get that URL would result in a shell injection vulnerability and a new CVE-ID |
Can I just take this moment to complain about how Python's standard library doesn't barf on a URL like that? |
Well, if we're on the subject of complaints, can I take a moment to complain that the implementation of Date.parse in JS (now and ES6) is browser dependent? |
In any case, fix should be ^^ |
Happy path URL and non-URL sources are implicitly tested in the rest of spec/source_spec.rb. |
Have I said lately how glad I am not to have to deal with JS in the browser anymore? I'd still like @wuest to review this if she has the chance. I'm also unsure if we should maybe raise our own error as a result. It might be a better user experience than having to do require 'pdfkit'
require 'uri'
begin
PDFKit.new(...)
rescue URI::InvalidURIError
# do something else
end Whereas someone would just have to do require 'pdfkit'
begin
PDFKit.new('..')
rescue PDFKit::InvalidURIError
# ...
end |
Yeah, definitely, a second opinion would be great. I wonder if @wuest and I met at Ruby on Ales at a mexican restaurant... but I have a bad memory. I'm fine with a new Maybe an argument for better documentation? |
I've been revamping documentation on three projects for a while now. Finished toolbelt.readthedocs.org, working on rfc3986.readthedocs.org and betamax.readthedocs.org and will be continuing work on github3.readthedocs.org too. Docs are important and we should probably try harder to have some. |
Yeah... Not sure that's a time commitment I'm able to make right now. :-/ But good to keep in mind. |
Just catching up, sorry for taking so long! Looking at the new test expected to fail, why is 2.2.2 :001 > require 'uri'
=> true
2.2.2 :002 > URI::escape('https://google.com/search?q=pdfkit; do_something # --args&do_other_thing', /[^\w\d\-_+\/:\.?=&%]/)
=> "https://google.com/search?q=pdfkit%3B%20do_something%20%23%20--args&do_other_thing"
2.2.2 :003 > URI(URI::escape('https://google.com/search?q=pdfkit; do_something # --args&do_other_thing', /[^\w\d\-_+\/:\.?=&%]/))
=> #<URI::HTTPS https://google.com/search?q=pdfkit%3B%20do_something%20%23%20--args&do_other_thing> The problem that still remains is that @cdwort I wasn't at Ruby on Ales, but perhaps we met at Madison+ Ruby last year? |
So That said, URLs aren't technically allowed to have |
Yeah, I guess I don't know enough about URLs. To my eye
does not look like a valid URL. That said, I'm happy not policing the URL - generally that is something that is under the user's control. Given that we support the middleware, however, and allow arbitrary URLs to be sent in, it seems worthwhile to have some protection. |
Here's a totally valid URL which doesn't TECHNICALLY require escaping in order to be considered "correct" by Do any modern servers actually have problems with |
Yeah, you're right, that URI needs some kind of escaping. My first instinct is to place the URL in
Which would translate to
=( |
So it would seem that the best way forward would be:
I'm sure there are even worse URLs that we could run into, but I think this will give us the flexibility that we need to fix the issue and some small bit of comfort in the fact that we're trying to protect the user's system. Do you agree @wuest? |
Looks good at a glance! |
Alright, so now passing, but here's a question: This is a breaking change for our users who ARE currently escaping their URLs:
Is that something we want to do? I'm guessing not. The only way I can think of to fix that is to check whether the decoded URL is the same as the URL passed in. That's a little ugly. |
Also, now that we are not shellescaping ampersands, are we going to run into issues like #207, where the ampersand sends the command to the background? |
Interestingly enough, WickedPDF doesn't bother with escaping any aspects of the shell command. |
@cdwort that logic would be convenient, but: URI::decode("https://www.google.com/search?q='cat<dev/zero>/dev/null'")
# => "https://www.google.com/search?q='cat<dev/zero>/dev/null'"
URI("https://www.google.com/search?q='cat<dev/zero>/dev/null'")
URI::InvalidURIError: bad URI(is not URI?): https://www.google.com/search?q='cat<dev/zero>/dev/null'
from /usr/local/rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:176:in `split'
... That said, we can probably do def url_needs_escaping?
begin
URI(@source)
rescue URI::InvalidURIError
return false
end
URI::decode(@source) == @source
end Since
This is why I suggested:
As step 2 of fixing this. Although now it would be |
f971a77 fixes the issue you ran into (this is the solution to which @sigmavirus24 was referring). |
@wuest Okay, I had that solution and it gave me a junked up URL, but I think that was just outdated code. Thanks for setting that straight. |
Removed the ready label as I would like to cut a release before this goes out (whenever we think we're done with it). |
Confirmed that we do not support multiple URLs, which is a shame, but keeps this from being a breaking change. Release has also gone out, so no blockers at the moment. |
This needs to be rebased |
Alright, so after a long fought battle I have a working-ish Windows environment in which to test this. It works in smoke tests! Obviously a quick smoke test isn't the best, but at least it's better than nothing. In any case, I no longer have concerns about windows compatibility for this or #316. However, this will wait for #316 because those refactors will make merge conflicts much easier to handle. I'm traveling this weekend so I may not get to finishing that up until Sunday or early next week. Sorry for the delay on this. Any remaining concerns @sigmavirus24 besides need for a rebase? |
b8334f1
to
57b1da6
Compare
Rebased, passing and ready to go @sigmavirus24. |
I'll merge this tonight unless there are concerns. |
end | ||
|
||
def url_needs_escaping? | ||
URI::decode(@source) == @source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we're quoting the URI, this still won't work.
1.9.3-p448 :001 > url = 'https://google.com/search?q="foobarbogus"'
=> "https://google.com/search?q=\"foobarbogus\""
1.9.3-p448 :002 > require 'uri'
=> true
1.9.3-p448 :003 > URI::decode(url)
=> "https://google.com/search?q=\"foobarbogus\""
That's a simple example that would allow someone to use something like
1.9.3-p448 :001 > url = 'https://google.com/search?q="useradd -p secrete hacked"'
=> "https://google.com/search?q=\"useradd -p secrete hacked\""
1.9.3-p448 :002 > require 'uri'
=> true
1.9.3-p448 :003 > URI::decode(url)
=> "https://google.com/search?q=\"useradd -p secrete hacked\""
We really need to first parse the URI to make sure it's valid before checking if the decoded value is the same as the source. If it's not valid then the decoded URI being equal to the original is just a worthless check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to that, and I'm out of my element here, so apologies if I sound like a broken record repeating this question again:
I thought we determined that we wanted to support urls like:
https://www.google.com/search?q='cat<dev/zero>/dev/null'
Maybe I misread that in the discussion? It's what this PR uses as the example of something that needs escaping, but that we want to allow. I'm just not sure how to do that while not supporting:
"https://google.com/search?q=\"useradd -p secrete hacked\""
For example:
[14] pry(main)> url = "https://www.google.com/search?q='cat<dev/zero>/dev/null'"
=> "https://www.google.com/search?q='cat<dev/zero>/dev/null'"
[15] pry(main)> bad_url = "https://google.com/search?q=\"useradd -p secrete hacked\""
=> "https://google.com/search?q=\"useradd -p secrete hacked\""
[16] pry(main)> URI::parse(url)
URI::InvalidURIError: bad URI(is not URI?): https://www.google.com/search?q='cat<dev/zero>/dev/null'
from /Users/aunger/.rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/uri/common.rb:176:in `split'
[17] pry(main)> URI::parse(bad_url)
URI::InvalidURIError: bad URI(is not URI?): https://google.com/search?q="useradd -p secrete hacked"
from /Users/aunger/.rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/uri/common.rb:176:in `split'
[18] pry(main)> URI::parse(URI::escape(url))
=> #<URI::HTTPS:0x007ffbc1baf530 URL:https://www.google.com/search?q='cat%3Cdev/zero%3E/dev/null'>
[19] pry(main)> URI::parse(URI::escape(bad_url))
=> #<URI::HTTPS:0x007ffbc1b76fc8 URL:https://google.com/search?q=%22useradd%20-p%20secrete%20hacked%22>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been reading this logic backwards. Ignore me. This looks fine. 😞
@sigmavirus24 Thanks for taking a look - I know you're busy. I'm very grateful for the peer review on this! |
Do not shellescape URLs since it borks params
For record keeping, this fixes #284 |
Fixes #166
Current Functionality (master):
New Functionality (this PR):
To accomplish this, this PR walks back some security measures implemented in #164.
I don't know enough about bash exploits, so forgive the poor examples below.
This would impact those using PDFKit who allow raw user input in the URL (e.g.
www.mysite.com/echo /etc/passwd
) or in the HTML (e.g."<p>Your PDF Content: echo /etc/passwd</p>"
).On the other hand, I think we need to support query parameters. So, unless someone sees a better this solution, I'd say let's fix this.