-
I am writing a query to find client-side prototype pollution vulnerabilities across Github projects based on this(https://github.com/github/codeql/blob/768e5190a1c9d40a4acc7143c461c3b114e7fd59/javascript/ql/src/Security/CWE-400/PrototypePollutionUtility.ql) query. import javascript
import DataFlow
import PathGraph
import semmle.javascript.DynamicPropertyAccess
/*
source = parameter splits forEach
sink = dynamic prop write(prop instanceof )
*/
/*
pairs = params.split('&');
*/
class AndSplit extends StringSplitCall{
AndSplit(){
getSeparator() = "&" and
getBaseString().getALocalSource() instanceof ParameterNode
}
}
/*
pairs.forEach(function (pair) {
*/
SourceNode step(SourceNode pred, SourceNode succ){
exists(SourceNode arr |
pred instanceof AndSplit and
succ = getAnEnumeratedArrayElement(arr)
and result = succ
)
}
/*
var parts = pair.split('='),
*/
class EqSplit extends StringSplitCall{
SourceNode pred;
SourceNode succ;
EqSplit(){
getSeparator() = "=" and
getBaseString().getALocalSource() = step(pred,succ)
}
}
/*
key = prep(parts.shift()),
*/
CallNode shift(DataFlow::CallNode call){
call.(MethodCallNode).getMethodName() = "shift" and
call.getReceiver().getALocalSource() instanceof EqSplit and
result = call
}
/*
if (key) {
parts = key.match(keyBreaker);
*/
predicate match(CallNode call){
exists(CallNode pred |
call.(MethodCallNode).getMethodName() = "match" and
call.getReceiver().getALocalSource().(CallNode).getAnArgument() = shift(pred)
)
or
exists(CallNode pred, MethodCallNode pop |
call.(MethodCallNode).getMethodName() = "match" and
call.getReceiver().getALocalSource().(CallNode).getAnArgument() = shift(pred) and
pop.getMethodName() = "pop" and
pop.getReceiver().getALocalSource() = call
)
}
from MethodCallNode pop, CallNode pred, CallNode call
where
call.(MethodCallNode).getMethodName() = "match" and
call.getReceiver().getALocalSource().(CallNode).getAnArgument() = shift(pred) and
pop.getMethodName() = "pop" and
pop.getReceiver().getALocalSource() = call
select call, call.getReceiver().getALocalSource(),pop,pred
/*
Property name originating from above pattern
*/
class LocPropName extends SourceNode{
SourceNode array;
LocPropName(){
match(array) and
this = getAnEnumeratedArrayElement(array)
}
SourceNode getArray() {result = array}
LocPropName getAnAlias(){ result.getArray() = getArray()}
} The above query is for finding the below prototype pollution pattern. /*https://www.npmjs.com/package/can-deparam*/
module.exports = namespace.deparam = function (params, valueDeserializer) {
valueDeserializer = valueDeserializer || idenity;
var data = {}, pairs, lastPart;
if (params && paramTest.test(params)) {
pairs = params.split('&');
pairs.forEach(function (pair) {
var parts = pair.split('='),
key = prep(parts.shift()),
value = prep(parts.join('=')),
current = data;
if (key) {
parts = key.match(keyBreaker);
for (var j = 0, l = parts.length - 1; j < l; j++) {
var currentName = parts[j],
nextName = parts[j + 1],
currentIsArray = isArrayLikeName(currentName) && current instanceof Array;
if (!current[currentName]) {
if(currentIsArray) {
current.push( isArrayLikeName(nextName) ? [] : {} );
} else {
// If what we are pointing to looks like an `array`
current[currentName] = isArrayLikeName(nextName) ? [] : {};
}
}
if(currentIsArray) {
current = current[current.length - 1];
} else {
current = current[currentName];
}
}
lastPart = parts.pop();
if ( isArrayLikeName(lastPart) ) {
current.push(valueDeserializer(value));
} else {
current[lastPart] = valueDeserializer(value);
}
}
});
}
return data;
}; The problem with this approach is that different modules parse So, Can we write a general query for this issue? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Hi @msrkp 👋 First of all a huge thank you for collecting those prototype pollution examples in one place. 🙇♂️ I think you're right to use Part of the reason this query is so complicated is that it is one of the few data-flow queries that doesn't rely on a source of untrusted data. For example, it will warn about functions that recursively copy properties between objects, even if none of the objects can possibly be controlled by an attacker. We therefore need to be fairly sure that the code is actually doing what we think it's doing, which is why it's deliberately restricted to "deep merge" and "deep setter" functions. If I had seen your repo back when I wrote the query, I'd probably have tried to include a pattern for query-string parsers as well, but as you've observed, this is hard work as there's a seemingly endless variety of ways in which people do this. If you rely on a source of untrusted data, such as We're currently refactoring the prototype pollution queries and adding a new query in mix (for unrelated reasons). I've noticed that it's able to flag some of these examples, but only if there's an actual call to the query-string parser with a source of untrusted data. Simply having the function there is not enough, but adding a call like this is enough: deparam(window.location.search); I'm hoping to open a PR with the new query this week. I'll let you know here when it's up. Although I can't guarantee that it'll actually flag these examples, it might be an easier starting point. |
Beta Was this translation helpful? Give feedback.
-
@asgerf there is a nice collection by @BlackFan here. It could be used to exercise the queries and improve them. |
Beta Was this translation helpful? Give feedback.
Hi @msrkp 👋
First of all a huge thank you for collecting those prototype pollution examples in one place. 🙇♂️
Having a collection of real-world examples like this is incredibly useful.
I think you're right to use
PrototypePollutionUtility
as a starting point. However, it's one of the more complicated queries we have, so if you're new to CodeQL, I'm not sure this is good place to start learning.Part of the reason this query is so complicated is that it is one of the few data-flow queries that doesn't rely on a source of untrusted data. For example, it will warn about functions that recursively copy properties between objects, even if none of the objects can possibly be controlled by an a…