Skip to content

[Ruby] Add Unicode Bypass Validation query, test and help file #12992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
019b85b
Add Unicode Bypass Validation query, test and help file
Sim4n6 May 2, 2023
14ca20e
removed redundant imports
Sim4n6 May 3, 2023
1247403
Updated expected results file
Sim4n6 May 4, 2023
69ca49f
Deleted the UBV query change note.
Sim4n6 May 20, 2023
eb7e1de
Update ruby/ql/lib/codeql/ruby/experimental/UnicodeBypassValidationQu…
Sim4n6 May 20, 2023
8dcf139
Update ruby/ql/src/experimental/cwe-176/UnicodeBypassValidation.qhelp
Sim4n6 May 20, 2023
c3c65ca
Qhelp formatting
Sim4n6 May 20, 2023
c9c7179
Deleted the ugly flowchart.
Sim4n6 May 20, 2023
957023e
nfd and nfkd are considered
Sim4n6 May 20, 2023
7cd1fd4
CWE-179 and CWE-180 are included in metadata
Sim4n6 May 20, 2023
e345d7d
Update ruby/ql/src/experimental/cwe-176/examples/unicode_normalizatio…
Sim4n6 May 20, 2023
d11cb91
Use of CGI.escapeHTML() in test samples
Sim4n6 May 20, 2023
cc3cc1f
Merge branch 'ruby-UBV' of https://github.com/sim4n6/codeql-pun into …
Sim4n6 May 20, 2023
f5ff508
Updated qhelp for the use of html_escape()
Sim4n6 May 20, 2023
ad754f1
use of all normalization forms without the ":" prefix
Sim4n6 May 20, 2023
0a0a6dd
Replaced CGI.escapeHTML() with the html_escape()
Sim4n6 May 20, 2023
f7f0564
added one more test
Sim4n6 May 20, 2023
97e8e0b
Add String Manipulation Method Calls & CGI.escapeHTML() support
Sim4n6 May 21, 2023
90c174d
Updated the .expected file accordingly
Sim4n6 May 23, 2023
d772bb2
Added three more Unicode Normalization sinks
Sim4n6 May 25, 2023
7d68f6a
added ActiveSupport::Multibyte::Chars normalize() sink
Sim4n6 May 25, 2023
09c97ce
Added one more example to the qhelp
Sim4n6 May 25, 2023
52dd247
Removed redundant cast
Sim4n6 May 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/**
* Provides default sources, sinks and sanitizers for detecting
* "Unicode transformation"
* vulnerabilities, as well as extension points for adding your own.
*/

private import ruby

/**
* Provides default sources, sinks and sanitizers for detecting
* "Unicode transformation"
* vulnerabilities, as well as extension points for adding your own.
*/
module UnicodeBypassValidation {
/**
* A data flow source for "Unicode transformation" vulnerabilities.
*/
abstract class Source extends DataFlow::Node { }

/**
* A data flow sink for "Unicode transformation" vulnerabilities.
*/
abstract class Sink extends DataFlow::Node { }

/**
* A sanitizer for "Unicode transformation" vulnerabilities.
*/
abstract class Sanitizer extends DataFlow::Node { }
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
/**
* Provides a taint-tracking configuration for detecting "Unicode transformation mishandling" vulnerabilities.
*/

private import ruby
private import codeql.ruby.dataflow.RemoteFlowSources
private import codeql.ruby.Concepts
private import codeql.ruby.TaintTracking
private import codeql.ruby.ApiGraphs
import UnicodeBypassValidationCustomizations::UnicodeBypassValidation

/** A state signifying that a logical validation has not been performed. */
class PreValidation extends DataFlow::FlowState {
PreValidation() { this = "PreValidation" }
}

/** A state signifying that a logical validation has been performed. */
class PostValidation extends DataFlow::FlowState {
PostValidation() { this = "PostValidation" }
}

/**
* A taint-tracking configuration for detecting "Unicode transformation mishandling" vulnerabilities.
*
* This configuration uses two flow states, `PreValidation` and `PostValidation`,
* to track the requirement that a logical validation has been performed before the Unicode Transformation.
*/
class Configuration extends TaintTracking::Configuration {
Configuration() { this = "UnicodeBypassValidation" }

override predicate isSource(DataFlow::Node source, DataFlow::FlowState state) {
source instanceof RemoteFlowSource and state instanceof PreValidation
}

override predicate isAdditionalTaintStep(
DataFlow::Node nodeFrom, DataFlow::FlowState stateFrom, DataFlow::Node nodeTo,
DataFlow::FlowState stateTo
) {
(
exists(Escaping escaping | nodeFrom = escaping.getAnInput() and nodeTo = escaping.getOutput())
or
exists(RegexExecution re | nodeFrom = re.getString() and nodeTo = re)
or
// String Manipulation Method Calls
// https://ruby-doc.org/core-2.7.0/String.html
exists(DataFlow::CallNode cn |
cn.getMethodName() =
[
[
"ljust", "lstrip", "succ", "next", "rjust", "capitalize", "chomp", "gsub", "chop",
"downcase", "swapcase", "uprcase", "scrub", "slice", "squeeze", "strip", "sub",
"tr", "tr_s", "reverse"
] + ["", "!"], "concat", "dump", "each_line", "replace", "insert", "inspect", "lines",
"partition", "prepend", "replace", "rpartition", "scan", "split", "undump",
"unpack" + ["", "1"]
] and
nodeFrom = cn.getReceiver() and
nodeTo = cn
)
or
exists(DataFlow::CallNode cn |
cn.getMethodName() =
[
"casecmp" + ["", "?"], "center", "count", "each_char", "index", "rindex", "sum",
["delete", "delete_prefix", "delete_suffix"] + ["", "!"],
["start_with", "end_with" + "eql", "include"] + ["?", "!"], "match" + ["", "?"],
] and
nodeFrom = cn.getReceiver() and
nodeTo = nodeFrom
)
or
exists(DataFlow::CallNode cn |
cn = API::getTopLevelMember("CGI").getAMethodCall("escapeHTML") and
nodeFrom = cn.getArgument(0) and
nodeTo = cn
)
) and
stateFrom instanceof PreValidation and
stateTo instanceof PostValidation
}

/* A Unicode Tranformation (Unicode tranformation) is considered a sink when the algorithm used is either NFC or NFKC. */
override predicate isSink(DataFlow::Node sink, DataFlow::FlowState state) {
(
exists(DataFlow::CallNode cn |
cn.getMethodName() = "unicode_normalize" and
cn.getArgument(0).getConstantValue().getSymbol() = ["nfkc", "nfc", "nfkd", "nfd"] and
sink = cn.getReceiver()
)
or
// unicode_utils
exists(API::MethodAccessNode mac |
mac = API::getTopLevelMember("UnicodeUtils").getMethod(["nfkd", "nfc", "nfd", "nfkc"]) and
sink = mac.getParameter(0).asSink()
)
or
// eprun
exists(API::MethodAccessNode mac |
mac = API::getTopLevelMember("Eprun").getMethod("normalize") and
sink = mac.getParameter(0).asSink()
)
or
// unf
exists(API::MethodAccessNode mac |
mac = API::getTopLevelMember("UNF").getMember("Normalizer").getMethod("normalize") and
sink = mac.getParameter(0).asSink()
)
or
// ActiveSupport::Multibyte::Chars
exists(DataFlow::CallNode cn, DataFlow::CallNode n |
cn =
API::getTopLevelMember("ActiveSupport")
.getMember("Multibyte")
.getMember("Chars")
.getMethod("new")
.getCallNode() and
n = cn.getAMethodCall("normalize") and
sink = cn.getArgument(0)
)
) and
state instanceof PostValidation
}
}
50 changes: 50 additions & 0 deletions ruby/ql/src/experimental/cwe-176/UnicodeBypassValidation.qhelp
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
<qhelp>
<overview>
<p>Security checks bypass due to a Unicode transformation</p>
<p>
If ever a unicode tranformation is performed after some security checks or logical
validation, the
latter could be bypassed due to a potential Unicode characters collision.
The validation of concern are any character escaping, any regex validation or any string
verification.
</p>
</overview>
<recommendation>
<p> Perform a Unicode normalization before the logical validation. </p>
</recommendation>
<example>

<p> The following example showcases the bypass of all checks performed by <code>
html_escape()</code> due to a post-unicode normalization.</p>
<p>For instance: the character U+FE64 (<code>﹤</code>) is not filtered-out by the
html_escape() function. But due to the Unicode normalization, the character is
transformed and would become U+003C (<code> &lt; </code> ).</p>

<sample src="./examples/unicode_normalization.rb" />

</example>
<example>

<p> The next example shows how an early deletion of a character may be bypassed due to a
potential Unicode character collision.</p>
<p>The character <code>&lt;</code> was expected to be omitted from the string <code>s</code>.
However, a malicious user may consider using its colliding Unicode character U+FE64 <code>
﹤</code> as an alternative. Due to the Late-Unicode normalization with the form NFKC,
the resulting string would contain the unintended character <code>&lt;</code> . </p>

<sample src="./examples/unicode_normalization2.rb" />

</example>
<references>
<li> Research study: <a
href="https://gosecure.github.io/presentations/2021-02-unicode-owasp-toronto/philippe_arteau_owasp_unicode_v4.pdf">
Unicode vulnerabilities that could bYte you
</a>
</li>
<li>
<a
href="https://gosecure.github.io/unicode-pentester-cheatsheet/">Unicode pentest
cheatsheet</a>. </li>
</references>
</qhelp>
24 changes: 24 additions & 0 deletions ruby/ql/src/experimental/cwe-176/UnicodeBypassValidation.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/**
* @name Bypass Logical Validation Using Unicode Characters
* @description A Unicode transformation is using a remote user-controlled data. The transformation is a Unicode normalization using the algorithms "NFC" or "NFKC". In all cases, the security measures implemented or the logical validation performed to escape any injection characters, to validate using regex patterns or to perform string-based checks, before the Unicode transformation are **bypassable** by special Unicode characters.
* @kind path-problem
* @id rb/unicode-bypass-validation
* @precision high
* @problem.severity error
* @tags security
* experimental
* external/cwe/cwe-176
* external/cwe/cwe-179
* external/cwe/cwe-180
*/

import ruby
import codeql.ruby.experimental.UnicodeBypassValidationQuery
import DataFlow::PathGraph

from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
"This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters.",
sink.getNode(), "Unicode transformation (Unicode normalization)", source.getNode(),
"remote user-controlled data"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
require "erb"

class UnicodeNormalizationHtMLSafeController < ActionController::Base
def unicodeNormalize
unicode_input = params[:unicode_input]
unicode_html_safe = ERB::Util.html_escape(unicode_input)
normalized_nfkc = unicode_html_safe.unicode_normalize(:nfkc) # $result=BAD
normalized_nfc = unicode_html_safe.unicode_normalize(:nfc) # $result=BAD
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
s = "﹤xss>"
puts s.delete("<").unicode_normalize(:nfkc).include?("<")
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
edges
| unicode_normalization.rb:7:5:7:17 | unicode_input | unicode_normalization.rb:8:23:8:35 | unicode_input |
| unicode_normalization.rb:7:5:7:17 | unicode_input | unicode_normalization.rb:9:22:9:34 | unicode_input |
| unicode_normalization.rb:7:21:7:26 | call to params | unicode_normalization.rb:7:21:7:42 | ...[...] |
| unicode_normalization.rb:7:21:7:42 | ...[...] | unicode_normalization.rb:7:5:7:17 | unicode_input |
| unicode_normalization.rb:15:5:15:17 | unicode_input | unicode_normalization.rb:16:27:16:39 | unicode_input |
| unicode_normalization.rb:15:5:15:17 | unicode_input | unicode_normalization.rb:16:27:16:39 | unicode_input |
| unicode_normalization.rb:15:21:15:26 | call to params | unicode_normalization.rb:15:21:15:42 | ...[...] |
| unicode_normalization.rb:15:21:15:26 | call to params | unicode_normalization.rb:15:21:15:42 | ...[...] |
| unicode_normalization.rb:15:21:15:42 | ...[...] | unicode_normalization.rb:15:5:15:17 | unicode_input |
| unicode_normalization.rb:15:21:15:42 | ...[...] | unicode_normalization.rb:15:5:15:17 | unicode_input |
| unicode_normalization.rb:16:5:16:23 | unicode_input_manip | unicode_normalization.rb:17:23:17:41 | unicode_input_manip |
| unicode_normalization.rb:16:5:16:23 | unicode_input_manip | unicode_normalization.rb:18:22:18:40 | unicode_input_manip |
| unicode_normalization.rb:16:27:16:39 | unicode_input | unicode_normalization.rb:16:27:16:59 | call to sub |
| unicode_normalization.rb:16:27:16:39 | unicode_input | unicode_normalization.rb:16:27:16:59 | call to sub |
| unicode_normalization.rb:16:27:16:59 | call to sub | unicode_normalization.rb:16:5:16:23 | unicode_input_manip |
| unicode_normalization.rb:24:5:24:17 | unicode_input | unicode_normalization.rb:25:37:25:49 | unicode_input |
| unicode_normalization.rb:24:21:24:26 | call to params | unicode_normalization.rb:24:21:24:42 | ...[...] |
| unicode_normalization.rb:24:21:24:42 | ...[...] | unicode_normalization.rb:24:5:24:17 | unicode_input |
| unicode_normalization.rb:25:5:25:21 | unicode_html_safe | unicode_normalization.rb:26:23:26:39 | unicode_html_safe |
| unicode_normalization.rb:25:5:25:21 | unicode_html_safe | unicode_normalization.rb:27:22:27:38 | unicode_html_safe |
| unicode_normalization.rb:25:25:25:50 | call to html_escape | unicode_normalization.rb:25:5:25:21 | unicode_html_safe |
| unicode_normalization.rb:25:37:25:49 | unicode_input | unicode_normalization.rb:25:25:25:50 | call to html_escape |
| unicode_normalization.rb:33:5:33:17 | unicode_input | unicode_normalization.rb:34:40:34:52 | unicode_input |
| unicode_normalization.rb:33:21:33:26 | call to params | unicode_normalization.rb:33:21:33:42 | ...[...] |
| unicode_normalization.rb:33:21:33:42 | ...[...] | unicode_normalization.rb:33:5:33:17 | unicode_input |
| unicode_normalization.rb:34:5:34:21 | unicode_html_safe | unicode_normalization.rb:35:23:35:39 | unicode_html_safe |
| unicode_normalization.rb:34:5:34:21 | unicode_html_safe | unicode_normalization.rb:36:22:36:38 | unicode_html_safe |
| unicode_normalization.rb:34:25:34:53 | call to escapeHTML | unicode_normalization.rb:34:25:34:63 | call to html_safe |
| unicode_normalization.rb:34:25:34:63 | call to html_safe | unicode_normalization.rb:34:5:34:21 | unicode_html_safe |
| unicode_normalization.rb:34:40:34:52 | unicode_input | unicode_normalization.rb:34:25:34:53 | call to escapeHTML |
nodes
| unicode_normalization.rb:7:5:7:17 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:7:21:7:26 | call to params | semmle.label | call to params |
| unicode_normalization.rb:7:21:7:42 | ...[...] | semmle.label | ...[...] |
| unicode_normalization.rb:8:23:8:35 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:9:22:9:34 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:15:5:15:17 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:15:5:15:17 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:15:21:15:26 | call to params | semmle.label | call to params |
| unicode_normalization.rb:15:21:15:42 | ...[...] | semmle.label | ...[...] |
| unicode_normalization.rb:15:21:15:42 | ...[...] | semmle.label | ...[...] |
| unicode_normalization.rb:16:5:16:23 | unicode_input_manip | semmle.label | unicode_input_manip |
| unicode_normalization.rb:16:27:16:39 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:16:27:16:39 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:16:27:16:59 | call to sub | semmle.label | call to sub |
| unicode_normalization.rb:17:23:17:41 | unicode_input_manip | semmle.label | unicode_input_manip |
| unicode_normalization.rb:18:22:18:40 | unicode_input_manip | semmle.label | unicode_input_manip |
| unicode_normalization.rb:24:5:24:17 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:24:21:24:26 | call to params | semmle.label | call to params |
| unicode_normalization.rb:24:21:24:42 | ...[...] | semmle.label | ...[...] |
| unicode_normalization.rb:25:5:25:21 | unicode_html_safe | semmle.label | unicode_html_safe |
| unicode_normalization.rb:25:25:25:50 | call to html_escape | semmle.label | call to html_escape |
| unicode_normalization.rb:25:37:25:49 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:26:23:26:39 | unicode_html_safe | semmle.label | unicode_html_safe |
| unicode_normalization.rb:27:22:27:38 | unicode_html_safe | semmle.label | unicode_html_safe |
| unicode_normalization.rb:33:5:33:17 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:33:21:33:26 | call to params | semmle.label | call to params |
| unicode_normalization.rb:33:21:33:42 | ...[...] | semmle.label | ...[...] |
| unicode_normalization.rb:34:5:34:21 | unicode_html_safe | semmle.label | unicode_html_safe |
| unicode_normalization.rb:34:25:34:53 | call to escapeHTML | semmle.label | call to escapeHTML |
| unicode_normalization.rb:34:25:34:63 | call to html_safe | semmle.label | call to html_safe |
| unicode_normalization.rb:34:40:34:52 | unicode_input | semmle.label | unicode_input |
| unicode_normalization.rb:35:23:35:39 | unicode_html_safe | semmle.label | unicode_html_safe |
| unicode_normalization.rb:36:22:36:38 | unicode_html_safe | semmle.label | unicode_html_safe |
subpaths
#select
| unicode_normalization.rb:8:23:8:35 | unicode_input | unicode_normalization.rb:7:21:7:26 | call to params | unicode_normalization.rb:8:23:8:35 | unicode_input | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:8:23:8:35 | unicode_input | Unicode transformation (Unicode normalization) | unicode_normalization.rb:7:21:7:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:9:22:9:34 | unicode_input | unicode_normalization.rb:7:21:7:26 | call to params | unicode_normalization.rb:9:22:9:34 | unicode_input | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:9:22:9:34 | unicode_input | Unicode transformation (Unicode normalization) | unicode_normalization.rb:7:21:7:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:17:23:17:41 | unicode_input_manip | unicode_normalization.rb:15:21:15:26 | call to params | unicode_normalization.rb:17:23:17:41 | unicode_input_manip | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:17:23:17:41 | unicode_input_manip | Unicode transformation (Unicode normalization) | unicode_normalization.rb:15:21:15:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:18:22:18:40 | unicode_input_manip | unicode_normalization.rb:15:21:15:26 | call to params | unicode_normalization.rb:18:22:18:40 | unicode_input_manip | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:18:22:18:40 | unicode_input_manip | Unicode transformation (Unicode normalization) | unicode_normalization.rb:15:21:15:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:26:23:26:39 | unicode_html_safe | unicode_normalization.rb:24:21:24:26 | call to params | unicode_normalization.rb:26:23:26:39 | unicode_html_safe | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:26:23:26:39 | unicode_html_safe | Unicode transformation (Unicode normalization) | unicode_normalization.rb:24:21:24:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:27:22:27:38 | unicode_html_safe | unicode_normalization.rb:24:21:24:26 | call to params | unicode_normalization.rb:27:22:27:38 | unicode_html_safe | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:27:22:27:38 | unicode_html_safe | Unicode transformation (Unicode normalization) | unicode_normalization.rb:24:21:24:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:35:23:35:39 | unicode_html_safe | unicode_normalization.rb:33:21:33:26 | call to params | unicode_normalization.rb:35:23:35:39 | unicode_html_safe | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:35:23:35:39 | unicode_html_safe | Unicode transformation (Unicode normalization) | unicode_normalization.rb:33:21:33:26 | call to params | remote user-controlled data |
| unicode_normalization.rb:36:22:36:38 | unicode_html_safe | unicode_normalization.rb:33:21:33:26 | call to params | unicode_normalization.rb:36:22:36:38 | unicode_html_safe | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | unicode_normalization.rb:36:22:36:38 | unicode_html_safe | Unicode transformation (Unicode normalization) | unicode_normalization.rb:33:21:33:26 | call to params | remote user-controlled data |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
experimental/cwe-176/UnicodeBypassValidation.ql
Loading