New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add failing spec for parsing invalid query bytes #624
Conversation
I've inserted a middleware before I'm not sure how common a problem this is, but it is really easy to replicate. Without investigating the stack trace and headers it would appear to be an obscure encoding error. |
Hi, I'm not 100% sure how to handle this. I don't think invalid unicode character should make Rack blow up. At the very least, we should hand the data to the app somehow. Second, I'm not sure the test is 100% accurate. In the string literal will be tagged as the file's encoding, but I think (I'm not 100% sure) that the strings passed to this function come off the socket and will be tagged as ASCII-8BIT. So the test should probably be this: should "parse nested queries with invalid bytes correctly" do
x = "fo\255[]=1"
x.force_encoding Encoding::ASCII_8BIT
Rack::Utils.parse_nested_query(x)
end It still blows up, just not in the same place. Here is a potential fix: diff --git a/lib/rack/utils.rb b/lib/rack/utils.rb
index 43bbef3..d4b730a 100644
--- a/lib/rack/utils.rb
+++ b/lib/rack/utils.rb
@@ -91,7 +91,7 @@ module Rack
params = KeySpaceConstrainedParams.new
(qs || '').split(d ? /[#{d}] */n : DEFAULT_SEP).each do |p|
- k, v = p.split('=', 2).map { |s| unescape(s) }
+ k, v = p.split('=', 2).map { |s| unescape(s, Encoding::ASCII_8BIT) }
normalize_params(params, k, v)
end
@@ -107,9 +107,12 @@ module Rack
return if k.empty?
+ k.force_encoding Encoding::UTF_8 if k
+
if after == ""
params[k] = v
elsif after == "[]"
+ v.force_encoding Encoding::UTF_8 if v
params[k] ||= []
raise TypeError, "expected Array (got #{params[k].class.name}) for param `#{k}'" unless params[k].is_a?(Array)
params[k] << v We tag the strings with the encoding after they've been parsed. It won't blow up in Rack anymore. You'll be able to get the data, but the encoding may potentially be invalid: require 'rack/utils'
x = "fo\255[]=1"
x.force_encoding Encoding::ASCII_8BIT
z = Rack::Utils.parse_nested_query(x)
p z
z.each do |k,v|
p [k, k.encoding, k.valid_encoding?]
v.each do |v|
p [v, v.encoding, v.valid_encoding?]
end
end output:
It may be invalid, but at least the application has a chance to do something with the data. For example: require 'rack/utils'
x = "fo\255[]=1"
x.force_encoding Encoding::ASCII_8BIT
z = Rack::Utils.parse_nested_query(x)
p z
z.each do |k,v|
unless k.valid_encoding? # clean up invalid strings
k = k.dup.encode!(:invalid => :replace)
end
p [k, k.encoding, k.valid_encoding?]
v.each do |v|
p [v, v.encoding, v.valid_encoding?]
end
end Output:
Or the application could just choose to raise an exception. I think this is the way to go, but I'm open to suggestions! :-) |
So, how do I write an app which requires latin-1 encoded parameter strings? |
@chneukirchen We should probably allow people to pass an encoding to |
@tenderlove With the patch applied and the original test an
Changing the test over to force the encoding, as you proposed in turn throws a should "parse nested queries with invalid bytes correctly" do
x = "fo\255[]=1".force_encoding(Encoding::ASCII_8BIT)
Rack::Utils.parse_nested_query(x).should.equal("fo\xAD" => ["1"])
end The effected Rails app has a request spec that covers the scenario (and passes with the In the request spec the body is pure binary data, it is a portion of a |
After thinking about it for a while, I came to the same conclusion. If we don't know the encoding of the parameters, we can't parse it at all. |
@tenderlove I haven't forgotten about this. I'll take a stab at bypassing parsing altogether sometime today or tomorrow. |
fyi whitequark/rack-utf8_sanitizer#7 and note that Rack specifies
|
@@ -204,6 +204,10 @@ def kcodeu | |||
message.should.equal "expected Array (got String) for param `y'" | |||
end | |||
|
|||
should "parse nested queries with invalid bytes correctly" do | |||
Rack::Utils.parse_nested_query("fo\255[]=1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing the test that it doesn't raise an ArgumentError. I recently wrote something like this for a Rails app I work on, though in rspec
# coding: utf-8
require 'spec_helper'
require_dependency Rails.root.join('lib/string_formatter').to_s
describe StringFormatter do
it '#hash_values_as_utf8' do
h = {
:foo => "Schäfbernt\xED",
'bar' => 1,
'baz' => :qux,
}.with_indifferent_access
expect {
h[:foo].present?
}.to raise_error(ArgumentError) {|e|
e.message == "invalid byte sequence in UTF-8"
}
StringFormatter.hash_values_as_utf8!(h)
expect(h[:foo].present?).to be_true
expect(h[:foo]).to eq("Schäfbernt")
expect(h[:bar]).to eq(1)
expect(h[:baz]).to eq(:qux)
end
end
and the impl
# coding: utf-8
module StringFormatter
module_function
def hash_values_as_utf8!(hash)
hash.each do |key, value|
next unless value.is_a?(String)
hash[key] = StringFormatter.as_utf8(value, '')
end
end
def as_utf8(str, replace = '_')
if str.respond_to?(:encode!)
if defined?(JRUBY_VERSION)
# don't blow up https://github.com/jruby/jruby/issues/375
str = as_ascii(str)
else
# Converting it to a higher higher character set (UTF-16) and then
# back (to UTF-8) ensures that you will strip away invalid or undefined byte sequences.
str.encode!(Encoding::UTF_16LE, :invalid => :replace, :undef => :replace, :replace => replace)
end
str.encode!(Encoding::UTF_8)
else
str
end
rescue => e
Airbrake.notify(e)
force_utf8_encoding(str)
end
# help out copy and pasting errors of good-looking email addresses
# by stripping out non-ASCII characters
def as_ascii(str)
if str.respond_to?(:to_ascii)
str.to_ascii
else
# avoids invalid multi-byte escape error
ascii_text = str.encode( Encoding::ASCII, invalid: :replace, undef: :replace, replace: '' )
# see http://www.ruby-forum.com/topic/183413
pattern = Regexp.new('[\x80-\xff]', nil, 'n')
ascii_text.gsub(pattern, '')
end
end
end
For 1.8, I have seen something like the below using BINARY aka ASCII-8BIT
# Encodes a string from encoding "from" to encoding "to" in
# a way that works for both ruby 1.8 and 1.9
def convert_string_encoding(to, from, str)
if "1.9".respond_to?(:force_encoding)
str = str.dup if str.frozen?
str.encode(to, from, :undef => :replace)
else
require 'iconv'
Iconv.conv(to, from, str)
end
end
RE: "Unfortunately I've seen this error thousands of times, and there is no way for me to stop the exception from happening inside my application's stack." @sorentwo Yes, you can write a middleware that peeks at the request body to see if it is indeed form data, or if it's arbitrary binary. A magic bytes scanner should get you most of the way there, and you can do this before anyone ever creates a Rack::Request. If you can classify the mime type from the request data, just correct the header and pass it on to the app, if you find form data that isn't valid form data, fire a 400. https://github.com/minad/mimemagic or https://github.com/blackwinter/ruby-filemagic should get you started. |
I should have followed this up (months ago), but that is exactly what I did. Our app only saw this for certain endpoints so my solution was a bit more ad-hoc. Clearly my efforts here aren't getting anywhere, and there are workarounds available, so I'm going to close this out. |
There's just not a lot we can do about invalid requests without breaking users apps. It's not valid for us, being a generic middle-of-the-stack library to enforce corrections on bad requests. Do you have some recommendation? |
No recommendations really. I was being serious that I wasn't getting anywhere, I hadn't touched the issue in nearly a year. With middleware it is possible to prevent this issue. Also, the tools available to handle encoding issues vary so much between Ruby versions that it seems easier to apply for a particular application. |
I think we should add this to the known-issues file. |
One of our mobile client's has an intermittent bug which causes it to send requests with binary uploads, but with the
Content-Type
header incorrectly set toapplication/x-www-form-urlencoded
. As a result whenRack::Utils
attempts to parse the query params it throws anArgumentError: invalid byte sequence in UTF-8
exception. Here is the relevant slice of the stack trace:I've looked into forcing an alternate encoding or swallowing the error, but there are a lot of different approaches possible, all with different support and performance considerations. Considering Rack supports Ruby 1.8 through 2.1 I would like some feedback on how you would approach the matter.
Some possible solutions I see are:
String.encode
orIconv
, depending on Ruby version.Unfortunately I've seen this error thousands of times, and there is no way for me to stop the exception from happening inside my application's stack.