-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JS: http to file access and file to http body taint flow #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS: http to file access and file to http body taint flow #132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, @denislevin!
I have a few suggestions and questions, but I realise you've marked this PR as work-in-progress, so feel free to ignore my comments for now.
We'd be happy to test your queries on a bunch of larger projects once you think it's ready; just let us know.
* @Description The query detects data flow from Http response to File Access. This can be an indication of a backdoor or direct file download from untrusted/insecure URI | ||
* @kind problem | ||
* @problem.severity warning | ||
* @id js/HttpToFileAccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query ids need to be lower-case-with-dashes (cf. https://lgtm.com/help/ql/writing-queries/query-metadata), so perhaps js/arbitrary-file-upload
or something like that.
@@ -0,0 +1,14 @@ | |||
/** | |||
* @Name Http to File Access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata tags are case-sensitive, I believe, so this needs to be @name
(and @description
below). Also, could you expand the name to be slightly more descriptive.
@@ -0,0 +1,14 @@ | |||
/** | |||
* @Name Http to File Access | |||
* @Description The query detects data flow from Http response to File Access. This can be an indication of a backdoor or direct file download from untrusted/insecure URI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query descriptions generally follow the format "doing X is bad because Y", so perhaps something along the lines of "Writing data from an HTTP request directly to the file system allows arbitrary file upload and might indicate a backdoor."
@@ -0,0 +1,14 @@ | |||
/** | |||
* @Name File Access flows to Http POST/PUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above.
} | ||
|
||
/** | ||
* A call to write corresponds to a pattern where file stream is open first, followed by 'read' or 'write' call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class doesn't seem to check for a call to open
or similar. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split this class in two and now I can check for createReadStream and createWriteStream methods
// using 'request' library to make http 'POST' and 'PUT' requests with message body. | ||
private class RequestPostBody extends HTTP::RequestBody { | ||
RequestPostBody () { | ||
this = DataFlow::moduleImport("request").getAMemberCall("post").getArgument(1) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest you use DataFlow::moduleMember("request", "post").getACall()
instead to also handle ES2015-style destructuring imports like this:
import { post } from "request";
post(...)
|
||
module FileAccessToHttpDataFlow { | ||
/** | ||
* A data flow source for code injection vulnerabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below: "code injection" seems to be copied from another query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you!
} | ||
|
||
/** Taint step propagating a tainted parameter or a variable to a wrapper object. */ | ||
private class ObjectTaintStep extends TaintTracking::AdditionalTaintStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this taint step can lead to significant imprecision: if v
is tainted and you have an assignment o.p = v
, then o
will become tainted, so any (!) read of a property (not just p
) from o
will also be considered tainted. Do you have a concrete use case in mind that you want to model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also weigh in here that AdditionalTaintStep
should not be subclassed in a query-specific file; there is a performance penalty when doing so. That class is only for taint steps that are shared by all taint configurations.
Steps that are specific to a configuration should be added in isAdditionalTaintStep
(even if it's the only configuration used by the query).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to flag data read from file to be passed to http post body. The concrete case looks like this:
// Build the post string from an object
var post_data = querystring.stringify({
'compilation_level' : 'ADVANCED_OPTIMIZATIONS',
'output_format': 'json',
'output_info': 'compiled_code',
'warning_level' : 'QUIET',
'js_code' : codestring
});
Obvoiusly, 'js_code' is just an example, the concrete property name is just something that the receiving server will understand.
If I know that post_data flows into httpclient.post method (a sink is an argument to that method), and source is a file access, I want the taint flow to be flagged.
So I want all objects to be broadly tainted if they contain data from file.
Seems like there is an AdditionalTaintStep for querystring.strigify, I don't see how this is different.
private class FileAccessArgumentAsSource extends Source { | ||
FileAccessArgumentAsSource() { | ||
exists(DataFlow::CallNode cn | | ||
cn instanceof FileSystemAccess and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially breaks the abstraction provided by FileSystemAccess
by casting it to CallNode
. I'd suggest adding the required API on FileSystemAccess
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This source is specifically dealing with FileSystemAccess that is also a CallNode. For these, the source is an argument.
The one below is FileSystemAccess that is a Parameter (as in a callback), or anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Max's point was that file system accesses may have any number of arguments that aren't buffers getting filled with file data. res.sendFile(path)
from the Express library is one example of a file access, but it wouldn't make sense to treat path
a taint source. However, the method you need is clearly missing from FileSystemAccess
, and we're probably in a better position to add that ourselves. I'm fine with fixing this up after the PR lands.
Now apart from that, I don't think this actually has the intended effect. I'm guessing it is supposed to flag something like this:
let buffer = new Buffer(length);
fs.read(fd, buffer, 0, length, null, function(err) {
res.send(buffer); // flag this
});
However, tainting buffer
on line 2 won't actually affect buffer
on line 3. Tainting new Buffer()
would work, and you can accomplish this with getALocalSource()
:
this = cn.getAnArgument().getALocalSource()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added tests for buffered read, and they seem to taint buffer when it's passed as a parameter to fs function.
I agree I need some way to say what is tainted as a result of fs function call - which parameter, or even return value. Right now it's a bit of a mix since NodeJSLib defines NodeJSFileSystemAccess as any call on fs, and somewhere I saw a sink or source that defines any argument to any call as a sink (or source):
private class NodeJSFileSystemAccess extends FileSystemAccess, DataFlow::CallNode {
string methodName;
NodeJSFileSystemAccess() {
exists (string moduleName | this = DataFlow::moduleMember(moduleName, methodName).getACall() |
moduleName = "fs" or moduleName = "graceful-fs"
)
}
override DataFlow::Node getAPathArgument() {
exists (int i | fsFileParam(methodName, i) |
result = getArgument(i))
}
}
}
So perhaps FileSystemAccess needs "What's tainted" function, i.e. DataFlow::Node getTaintedDataNode(). I am not sure what implementation for Express will be - perhaps returning none()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the interface to FileSystemAccess
needs to be become richer. It should also let us distinguish reads and writes. We're happy to take care of this after the PR lands.
Note that getAPathArgument()
doesn't just allow any argument. It is determined by fsFileParam
which is based on how parameters are declared in the extern libraries.
/** A source is a file access callback parameter, as in fs.readFile('MyFile.json', 'utf-8', function (err, data) {}. */ | ||
private class FileAccessParamAsSource extends Source { | ||
FileAccessParamAsSource() { | ||
this instanceof FileSystemAccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly broad. Perhaps you'd want to restrict this to certain kinds of file system accesses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I probably should restrict it to 'reads' - good point.
Guys, need to ask for help here. I don't know why the build is failing, and link with "Details" doesn't open for me (https://jenkins.internal.semmle.com/job/Language-Tests/job/JavaScript/3697/ not leading anywhere). |
Sorry about the hassle. It's a compilation error due to the PR going out of sync with master:
The internal class Could you try rebasing or merging in master? Renaming |
Could you please also remove the "Normalize line endings" commit? It touches files that shouldn't be modified here, and it seems to insert CR line endings which we are trying to avoid (#181). |
Thank you for the review and error root cause! I have fixed it and reverted "Normalize line endings" commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been going over @xiemaisi's review, and everything seems to have been addressed, except for one issue that still needs attention.
Now I realize the PR has been up for over two weeks now and if you'd rather just land the PR and be done with it, just let me know and we'll land it like this and we'll take care of the rest on our end.
private class FileAccessArgumentAsSource extends Source { | ||
FileAccessArgumentAsSource() { | ||
exists(DataFlow::CallNode cn | | ||
cn instanceof FileSystemAccess and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Max's point was that file system accesses may have any number of arguments that aren't buffers getting filled with file data. res.sendFile(path)
from the Express library is one example of a file access, but it wouldn't make sense to treat path
a taint source. However, the method you need is clearly missing from FileSystemAccess
, and we're probably in a better position to add that ourselves. I'm fine with fixing this up after the PR lands.
Now apart from that, I don't think this actually has the intended effect. I'm guessing it is supposed to flag something like this:
let buffer = new Buffer(length);
fs.read(fd, buffer, 0, length, null, function(err) {
res.send(buffer); // flag this
});
However, tainting buffer
on line 2 won't actually affect buffer
on line 3. Tainting new Buffer()
would work, and you can accomplish this with getALocalSource()
:
this = cn.getAnArgument().getALocalSource()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments inline, but again, we're happy to just merge it as-is and take it from here.
May I just ask you to squash your commits, so the line ending commit is removed from the history?
fs.stat(fileName, function (error, stats) { | ||
fs.open(fileName, "r", function (error, fd) { | ||
var buffer = new Buffer(stats.size); | ||
fs.read(fd, buffer, 0, buffer.length, null, function (error, bytesRead, buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test unfortunately doesn't cover the case I mentioned. The buffer
callback parameter is shadowing the outer buffer
variable. The source of taint here is indeed the callback parameter, not the buffer passed to rs.read
.
So this is handled by FileAccessCallbackParam
and FileAccessParamAsSource
, not FileAccessArgumentAsSource
.
If the buffer
callback parameter is deleted, so buffer
on line 13 refers to the outer buffer
variable, the code should still be vulnerable but is not flagged because FileAccessArgumentAsSource
doesn't mark new Buffer()
as a source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback, I'll rework the test.
private class FileAccessArgumentAsSource extends Source { | ||
FileAccessArgumentAsSource() { | ||
exists(DataFlow::CallNode cn | | ||
cn instanceof FileSystemAccess and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the interface to FileSystemAccess
needs to be become richer. It should also let us distinguish reads and writes. We're happy to take care of this after the PR lands.
Note that getAPathArgument()
doesn't just allow any argument. It is determined by fsFileParam
which is based on how parameters are declared in the extern libraries.
Got my change refactored to take care of separating read from write file access where applicable. |
85aa40f
to
8152cef
Compare
Thanks for your latest updates, @denislevin! If you think this is more or less ready, we'd be happy to merge and perhaps fix up some minor things later on. For the third-party code snippets used in your tests, could you add a header comment like this one indicating where you got them from? |
Done! Yes, I think it's ready to merge. |
Please merge if you see it fit. |
Thank you for your contribution, @denislevin! |
Rename {Rescue,RescueExpr} to {RescueExpr,RescueModifierExpr}
Extract arrayOf-like calls
change branch name
change branch name
…cit-models PS: Extract source files found via `PSModulePath`
…cit-models-followup PS: github#132 follow-up
js: Http to File and File to Http request body taint flow.
An attempt to find code vulnerable to file overwrite with untrusted content or unauthorized arbitrary file upload (information disclosure).
Work in progress.