Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
7adc3c2
Upload ReDoS query, qhelp and tests
jorgectf Mar 18, 2021
bd3d2ec
Update to match consistent naming across languages
jorgectf Mar 18, 2021
afc4f51
Remove CWE references
jorgectf Mar 18, 2021
21f8135
Move to experimental folder
jorgectf Mar 18, 2021
6cc7144
Apply suggestions from code review
jorgectf Mar 18, 2021
63f708d
Apply suggestions
jorgectf Mar 18, 2021
5dae920
Edit filenames to match consistent naming
jorgectf Mar 18, 2021
f45307f
Apply rebase
jorgectf Mar 18, 2021
e4736d0
Typo
jorgectf Mar 18, 2021
a1b5cc3
Typo
jorgectf Mar 19, 2021
6d5a0f2
Limit Sanitizer to re.escape(arg)
jorgectf Mar 19, 2021
caaf543
Attempt to restructuring ReMethods and RegexExecution's modules
jorgectf Mar 19, 2021
3daec8e
Enclose Sinks and ReMethods in a module
jorgectf Mar 19, 2021
b207929
RegexExecution restructuring
jorgectf Mar 19, 2021
249e409
Change query ID
jorgectf Mar 22, 2021
b27b77c
Apply suggestions from code review
jorgectf Mar 22, 2021
0f20eeb
Apply suggestions
jorgectf Mar 24, 2021
444a15a
Polish imports
jorgectf Mar 24, 2021
28fdeba
Structure development
jorgectf Mar 24, 2021
a1a3c98
Undo main Concepts.qll change
jorgectf Mar 24, 2021
d61adcc
Take main Concepts.qll out of the PR
jorgectf Mar 24, 2021
b5ea41f
Fix CompiledRegex
jorgectf Mar 24, 2021
ce23db2
Move Sanitizer to ReEscapeCall
jorgectf Mar 24, 2021
ee1d2b6
Delete DirectRegex and CompiledRegex
jorgectf Mar 25, 2021
30554a1
Format
jorgectf Mar 25, 2021
3d990c5
Get back to ApiGraphs
jorgectf Mar 26, 2021
805f86a
Polish RegexEscape
jorgectf Mar 26, 2021
be09ffe
Create RegexEscape Range
jorgectf Mar 26, 2021
c127b10
Create re.compile().ReMethod test
jorgectf Mar 26, 2021
35f1c45
Change from Attribute to DataFlow::CallCfgNode in getRegexMethod()
jorgectf Mar 26, 2021
36cc7b5
Fix CompiledRegex
jorgectf Mar 26, 2021
53d61c4
Use custom Sink
jorgectf Mar 26, 2021
18ce257
Move RegexInjectionSink to query config (qll)
jorgectf Mar 26, 2021
e78e2ac
Get rid of (get)regexMethod
jorgectf Mar 26, 2021
a5850f4
Use getRegexModule to know used lib
jorgectf Mar 27, 2021
f751103
Fix Sink utilization in select
jorgectf Mar 27, 2021
66ee67a
Polished select statement
jorgectf Mar 27, 2021
c54f08f
Improve qhelp
jorgectf Mar 27, 2021
0e169ba
Format qhelp
jorgectf Mar 27, 2021
d49c23f
Improve tests' readability
jorgectf Mar 27, 2021
d4a89b2
Fix qhelp typo while converting to python's regex injection
jorgectf Mar 27, 2021
b672197
Improve code comments
jorgectf Mar 27, 2021
3655514
Fix ambiguity
jorgectf Mar 27, 2021
fc27c6c
Fix RegexExecution ambiguity
jorgectf Mar 27, 2021
03825a6
Add comment to Sink's predicates
jorgectf Mar 27, 2021
ec85ee4
Sink's predicate typo
jorgectf Mar 27, 2021
d401d18
Add .expected and qlref
jorgectf Mar 28, 2021
81d23c0
Move tests and qlref from /src to /test
jorgectf Mar 28, 2021
d968eea
Move expected to /test
jorgectf Mar 28, 2021
6a20a4d
Add newline to qhelp
jorgectf Mar 29, 2021
3fae3fd
Take ApiGraphs out of Concepts.qll
jorgectf Mar 30, 2021
05ee853
Remove wrong comment
jorgectf Apr 6, 2021
12ccd7e
Update .expected
jorgectf Apr 8, 2021
c432284
Polish qhelp
jorgectf Apr 8, 2021
c0c71c5
Apply suggestions from code review
jorgectf Apr 27, 2021
20b532e
Update to-cast sink's naming
jorgectf Apr 27, 2021
8a80098
Remove unused class variables
jorgectf Apr 27, 2021
21e01b8
Add code example in CompiledRegex
jorgectf Apr 27, 2021
213d011
Edit code example in CompiledRegex
jorgectf Apr 29, 2021
bd4b189
Polish documentation consistency
jorgectf Apr 29, 2021
78370cf
Update python/ql/src/experimental/semmle/python/frameworks/Stdlib.qll
yoff May 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Constructing a regular expression with unsanitized user input is dangerous as a malicious user may
be able to modify the meaning of the expression. In particular, such a user may be able to provide
a regular expression fragment that takes exponential time in the worst case, and use that to
perform a Denial of Service attack.
</p>
</overview>

<recommendation>
<p>
Before embedding user input into a regular expression, use a sanitization function such as
<code>re.escape</code> to escape meta-characters that have a special meaning regarding
regular expressions' syntax.
</p>
</recommendation>

<example>
<p>
The following examples are based on a simple Flask web server environment.
</p>
<p>
The following example shows a HTTP request parameter that is used to construct a regular expression
without sanitizing it first:
</p>
<sample src="re_bad.py" />
<p>
Instead, the request parameter should be sanitized first, for example using the function
<code>re.escape</code>. This ensures that the user cannot insert characters which have a
special meaning in regular expressions.
</p>
<sample src="re_good.py" />
</example>

<references>
<li>OWASP: <a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
<li>Python docs: <a href="https://docs.python.org/3/library/re.html">re</a>.</li>
<li>SonarSource: <a href="https://rules.sonarsource.com/python/type/Vulnerability/RSPEC-2631">RSPEC-2631</a>.</li>
</references>
</qhelp>
29 changes: 29 additions & 0 deletions python/ql/src/experimental/Security/CWE-730/RegexInjection.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/**
* @name Regular expression injection
* @description User input should not be used in regular expressions without first being escaped,
* otherwise a malicious user may be able to inject an expression that could require
* exponential time on certain inputs.
* @kind path-problem
* @problem.severity error
* @id py/regex-injection
* @tags security
* external/cwe/cwe-730
* external/cwe/cwe-400
*/

// determine precision above
import python
import experimental.semmle.python.security.injection.RegexInjection
import DataFlow::PathGraph

from
RegexInjectionFlowConfig config, DataFlow::PathNode source, DataFlow::PathNode sink,
RegexInjectionSink regexInjectionSink, Attribute methodAttribute
where
config.hasFlowPath(source, sink) and
regexInjectionSink = sink.getNode() and
methodAttribute = regexInjectionSink.getRegexMethod()
select sink.getNode(), source, sink,
"$@ regular expression is constructed from a $@ and executed by $@.", sink.getNode(), "This",
source.getNode(), "user-provided value", methodAttribute,
regexInjectionSink.getRegexModule() + "." + methodAttribute.getName()
15 changes: 15 additions & 0 deletions python/ql/src/experimental/Security/CWE-730/re_bad.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from flask import request, Flask
import re


@app.route("/direct")
def direct():
unsafe_pattern = request.args["pattern"]
re.search(unsafe_pattern, "")


@app.route("/compile")
def compile():
unsafe_pattern = request.args["pattern"]
compiled_pattern = re.compile(unsafe_pattern)
compiled_pattern.search("")
17 changes: 17 additions & 0 deletions python/ql/src/experimental/Security/CWE-730/re_good.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from flask import request, Flask
import re


@app.route("/direct")
def direct():
unsafe_pattern = request.args['pattern']
safe_pattern = re.escape(unsafe_pattern)
re.search(safe_pattern, "")


@app.route("/compile")
def compile():
unsafe_pattern = request.args['pattern']
safe_pattern = re.escape(unsafe_pattern)
compiled_pattern = re.compile(safe_pattern)
compiled_pattern.search("")
67 changes: 67 additions & 0 deletions python/ql/src/experimental/semmle/python/Concepts.qll
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,70 @@ private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
private import experimental.semmle.python.Frameworks

/** Provides classes for modeling Regular Expression-related APIs. */
module RegexExecution {
/**
* A data-flow node that executes a regular expression.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `RegexExecution` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Gets the argument containing the executed expression.
*/
abstract DataFlow::Node getRegexNode();

/**
* Gets the library used to execute the regular expression.
*/
abstract string getRegexModule();
}
}

/**
* A data-flow node that executes a regular expression.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `RegexExecution::Range` instead.
*/
class RegexExecution extends DataFlow::Node {
RegexExecution::Range range;

RegexExecution() { this = range }

DataFlow::Node getRegexNode() { result = range.getRegexNode() }

string getRegexModule() { result = range.getRegexModule() }
}

/** Provides classes for modeling Regular Expression escape-related APIs. */
module RegexEscape {
/**
* A data-flow node that escapes a regular expression.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `RegexEscape` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Gets the argument containing the escaped expression.
*/
abstract DataFlow::Node getRegexNode();
}
}

/**
* A data-flow node that escapes a regular expression.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `RegexEscape::Range` instead.
*/
class RegexEscape extends DataFlow::Node {
RegexEscape::Range range;

RegexEscape() { this = range }

DataFlow::Node getRegexNode() { result = range.getRegexNode() }
}
89 changes: 89 additions & 0 deletions python/ql/src/experimental/semmle/python/frameworks/Stdlib.qll
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,92 @@ private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.dataflow.new.RemoteFlowSources
private import experimental.semmle.python.Concepts
private import semmle.python.ApiGraphs

/**
* Provides models for Python's `re` library.
*
* See https://docs.python.org/3/library/re.html
*/
private module Re {
/**
* List of `re` methods immediately executing an expression.
*
* See https://docs.python.org/3/library/re.html#module-contents
*/
private class RegexExecutionMethods extends string {
RegexExecutionMethods() {
this in ["match", "fullmatch", "search", "split", "findall", "finditer", "sub", "subn"]
}
}

/**
* A class to find `re` methods immediately executing an expression.
*
* See `RegexExecutionMethods`
*/
private class DirectRegex extends DataFlow::CallCfgNode, RegexExecution::Range {
DataFlow::Node regexNode;

DirectRegex() {
this = API::moduleImport("re").getMember(any(RegexExecutionMethods m)).getACall() and
regexNode = this.getArg(0)
}

override DataFlow::Node getRegexNode() { result = regexNode }

override string getRegexModule() { result = "re" }
}

/**
* A class to find `re` methods immediately executing a compiled expression by `re.compile`.
*
* Given the following example:
*
* ```py
* pattern = re.compile(input)
* pattern.match(s)
* ```
*
* This class will identify that `re.compile` compiles `input` and afterwards
* executes `re`'s `match`. As a result, `this` will refer to `pattern.match(s)`
* and `this.getRegexNode()` will return the node for `input` (`re.compile`'s first argument)
*
*
* See `RegexExecutionMethods`
*
* See https://docs.python.org/3/library/re.html#regular-expression-objects
*/
private class CompiledRegex extends DataFlow::CallCfgNode, RegexExecution::Range {
DataFlow::Node regexNode;

CompiledRegex() {
exists(DataFlow::CallCfgNode patternCall, DataFlow::AttrRead reMethod |
this.getFunction() = reMethod and
patternCall = API::moduleImport("re").getMember("compile").getACall() and
patternCall.flowsTo(reMethod.getObject()) and
reMethod.getAttributeName() instanceof RegexExecutionMethods and
regexNode = patternCall.getArg(0)
)
}

override DataFlow::Node getRegexNode() { result = regexNode }

override string getRegexModule() { result = "re" }
}

/**
* A class to find `re` methods escaping an expression.
*
* See https://docs.python.org/3/library/re.html#re.escape
*/
class ReEscape extends DataFlow::CallCfgNode, RegexEscape::Range {
DataFlow::Node regexNode;

ReEscape() {
this = API::moduleImport("re").getMember("escape").getACall() and
regexNode = this.getArg(0)
}

override DataFlow::Node getRegexNode() { result = regexNode }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/**
* Provides a taint-tracking configuration for detecting regular expression injection
* vulnerabilities.
*/

import python
import experimental.semmle.python.Concepts
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.RemoteFlowSources

/**
* A class to find methods executing regular expressions.
*
* See `RegexExecution`
*/
class RegexInjectionSink extends DataFlow::Node {
string regexModule;
Attribute regexMethod;

RegexInjectionSink() {
exists(RegexExecution reExec |
this = reExec.getRegexNode() and
regexModule = reExec.getRegexModule() and
regexMethod = reExec.(DataFlow::CallCfgNode).getFunction().asExpr().(Attribute)
)
}

/**
* Gets the argument containing the executed expression.
*/
string getRegexModule() { result = regexModule }

/**
* Gets the method used to execute the regular expression.
*/
Attribute getRegexMethod() { result = regexMethod }
}

/**
* A taint-tracking configuration for detecting regular expression injections.
*/
class RegexInjectionFlowConfig extends TaintTracking::Configuration {
RegexInjectionFlowConfig() { this = "RegexInjectionFlowConfig" }

override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }

override predicate isSink(DataFlow::Node sink) { sink instanceof RegexInjectionSink }

override predicate isSanitizer(DataFlow::Node sanitizer) {
sanitizer = any(RegexEscape reEscape).getRegexNode()
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
edges
| re_bad.py:13:22:13:28 | ControlFlowNode for request | re_bad.py:13:22:13:33 | ControlFlowNode for Attribute |
| re_bad.py:13:22:13:33 | ControlFlowNode for Attribute | re_bad.py:13:22:13:44 | ControlFlowNode for Subscript |
| re_bad.py:13:22:13:44 | ControlFlowNode for Subscript | re_bad.py:14:15:14:28 | ControlFlowNode for unsafe_pattern |
| re_bad.py:24:22:24:28 | ControlFlowNode for request | re_bad.py:24:22:24:33 | ControlFlowNode for Attribute |
| re_bad.py:24:22:24:33 | ControlFlowNode for Attribute | re_bad.py:24:22:24:44 | ControlFlowNode for Subscript |
| re_bad.py:24:22:24:44 | ControlFlowNode for Subscript | re_bad.py:25:35:25:48 | ControlFlowNode for unsafe_pattern |
| re_bad.py:36:22:36:28 | ControlFlowNode for request | re_bad.py:36:22:36:33 | ControlFlowNode for Attribute |
| re_bad.py:36:22:36:33 | ControlFlowNode for Attribute | re_bad.py:36:22:36:44 | ControlFlowNode for Subscript |
| re_bad.py:36:22:36:44 | ControlFlowNode for Subscript | re_bad.py:37:16:37:29 | ControlFlowNode for unsafe_pattern |
nodes
| re_bad.py:13:22:13:28 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| re_bad.py:13:22:13:33 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| re_bad.py:13:22:13:44 | ControlFlowNode for Subscript | semmle.label | ControlFlowNode for Subscript |
| re_bad.py:14:15:14:28 | ControlFlowNode for unsafe_pattern | semmle.label | ControlFlowNode for unsafe_pattern |
| re_bad.py:24:22:24:28 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| re_bad.py:24:22:24:33 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| re_bad.py:24:22:24:44 | ControlFlowNode for Subscript | semmle.label | ControlFlowNode for Subscript |
| re_bad.py:25:35:25:48 | ControlFlowNode for unsafe_pattern | semmle.label | ControlFlowNode for unsafe_pattern |
| re_bad.py:36:22:36:28 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| re_bad.py:36:22:36:33 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| re_bad.py:36:22:36:44 | ControlFlowNode for Subscript | semmle.label | ControlFlowNode for Subscript |
| re_bad.py:37:16:37:29 | ControlFlowNode for unsafe_pattern | semmle.label | ControlFlowNode for unsafe_pattern |
#select
| re_bad.py:14:15:14:28 | ControlFlowNode for unsafe_pattern | re_bad.py:13:22:13:28 | ControlFlowNode for request | re_bad.py:14:15:14:28 | ControlFlowNode for unsafe_pattern | $@ regular expression is constructed from a $@ and executed by $@. | re_bad.py:14:15:14:28 | ControlFlowNode for unsafe_pattern | This | re_bad.py:13:22:13:28 | ControlFlowNode for request | user-provided value | re_bad.py:14:5:14:13 | Attribute | re.search |
| re_bad.py:25:35:25:48 | ControlFlowNode for unsafe_pattern | re_bad.py:24:22:24:28 | ControlFlowNode for request | re_bad.py:25:35:25:48 | ControlFlowNode for unsafe_pattern | $@ regular expression is constructed from a $@ and executed by $@. | re_bad.py:25:35:25:48 | ControlFlowNode for unsafe_pattern | This | re_bad.py:24:22:24:28 | ControlFlowNode for request | user-provided value | re_bad.py:26:5:26:27 | Attribute | re.search |
| re_bad.py:37:16:37:29 | ControlFlowNode for unsafe_pattern | re_bad.py:36:22:36:28 | ControlFlowNode for request | re_bad.py:37:16:37:29 | ControlFlowNode for unsafe_pattern | $@ regular expression is constructed from a $@ and executed by $@. | re_bad.py:37:16:37:29 | ControlFlowNode for unsafe_pattern | This | re_bad.py:36:22:36:28 | ControlFlowNode for request | user-provided value | re_bad.py:37:5:37:37 | Attribute | re.search |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
experimental/Security/CWE-730/RegexInjection.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from flask import request, Flask
import re

app = Flask(__name__)


@app.route("/direct")
def direct():
"""
A RemoteFlowSource is used directly as re.search's pattern
"""

unsafe_pattern = request.args["pattern"]
re.search(unsafe_pattern, "")


@app.route("/compile")
def compile():
"""
A RemoteFlowSource is used directly as re.compile's pattern
which also executes .search()
"""

unsafe_pattern = request.args["pattern"]
compiled_pattern = re.compile(unsafe_pattern)
compiled_pattern.search("")


@app.route("/compile_direct")
def compile_direct():
"""
A RemoteFlowSource is used directly as re.compile's pattern
which also executes .search() in the same line
"""

unsafe_pattern = request.args["pattern"]
re.compile(unsafe_pattern).search("")

# if __name__ == "__main__":
# app.run(debug=True)
Loading