Skip to content

Conversation

denislevin
Copy link
Contributor

js: Http to File and File to Http request body taint flow.
An attempt to find code vulnerable to file overwrite with untrusted content or unauthorized arbitrary file upload (information disclosure).
Work in progress.

@denislevin denislevin requested a review from a team as a code owner August 31, 2018 21:30
@denislevin denislevin changed the title Denisl/js/http to file access test Javascript: http to file access and file to http body taint flow Aug 31, 2018
@denislevin denislevin changed the title Javascript: http to file access and file to http body taint flow Javascript: http to file access and file to http body taint flow (WIP) Aug 31, 2018
@denislevin denislevin changed the title Javascript: http to file access and file to http body taint flow (WIP) JS: http to file access and file to http body taint flow:WIP Aug 31, 2018
@xiemaisi xiemaisi added the JS label Sep 3, 2018
Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @denislevin!

I have a few suggestions and questions, but I realise you've marked this PR as work-in-progress, so feel free to ignore my comments for now.

We'd be happy to test your queries on a bunch of larger projects once you think it's ready; just let us know.

* @Description The query detects data flow from Http response to File Access. This can be an indication of a backdoor or direct file download from untrusted/insecure URI
* @kind problem
* @problem.severity warning
* @id js/HttpToFileAccess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query ids need to be lower-case-with-dashes (cf. https://lgtm.com/help/ql/writing-queries/query-metadata), so perhaps js/arbitrary-file-upload or something like that.

@@ -0,0 +1,14 @@
/**
* @Name Http to File Access
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata tags are case-sensitive, I believe, so this needs to be @name (and @description below). Also, could you expand the name to be slightly more descriptive.

@@ -0,0 +1,14 @@
/**
* @Name Http to File Access
* @Description The query detects data flow from Http response to File Access. This can be an indication of a backdoor or direct file download from untrusted/insecure URI
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query descriptions generally follow the format "doing X is bad because Y", so perhaps something along the lines of "Writing data from an HTTP request directly to the file system allows arbitrary file upload and might indicate a backdoor."

@@ -0,0 +1,14 @@
/**
* @Name File Access flows to Http POST/PUT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.

}

/**
* A call to write corresponds to a pattern where file stream is open first, followed by 'read' or 'write' call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class doesn't seem to check for a call to open or similar. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split this class in two and now I can check for createReadStream and createWriteStream methods

// using 'request' library to make http 'POST' and 'PUT' requests with message body.
private class RequestPostBody extends HTTP::RequestBody {
RequestPostBody () {
this = DataFlow::moduleImport("request").getAMemberCall("post").getArgument(1) or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest you use DataFlow::moduleMember("request", "post").getACall() instead to also handle ES2015-style destructuring imports like this:

import { post } from "request";
post(...)


module FileAccessToHttpDataFlow {
/**
* A data flow source for code injection vulnerabilities.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: "code injection" seems to be copied from another query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!

}

/** Taint step propagating a tainted parameter or a variable to a wrapper object. */
private class ObjectTaintStep extends TaintTracking::AdditionalTaintStep {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this taint step can lead to significant imprecision: if v is tainted and you have an assignment o.p = v, then o will become tainted, so any (!) read of a property (not just p) from o will also be considered tainted. Do you have a concrete use case in mind that you want to model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also weigh in here that AdditionalTaintStep should not be subclassed in a query-specific file; there is a performance penalty when doing so. That class is only for taint steps that are shared by all taint configurations.

Steps that are specific to a configuration should be added in isAdditionalTaintStep (even if it's the only configuration used by the query).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to flag data read from file to be passed to http post body. The concrete case looks like this:

// Build the post string from an object
var post_data = querystring.stringify({
'compilation_level' : 'ADVANCED_OPTIMIZATIONS',
'output_format': 'json',
'output_info': 'compiled_code',
'warning_level' : 'QUIET',
'js_code' : codestring
});

Obvoiusly, 'js_code' is just an example, the concrete property name is just something that the receiving server will understand.

If I know that post_data flows into httpclient.post method (a sink is an argument to that method), and source is a file access, I want the taint flow to be flagged.
So I want all objects to be broadly tainted if they contain data from file.
Seems like there is an AdditionalTaintStep for querystring.strigify, I don't see how this is different.

private class FileAccessArgumentAsSource extends Source {
FileAccessArgumentAsSource() {
exists(DataFlow::CallNode cn |
cn instanceof FileSystemAccess and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially breaks the abstraction provided by FileSystemAccess by casting it to CallNode. I'd suggest adding the required API on FileSystemAccess instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This source is specifically dealing with FileSystemAccess that is also a CallNode. For these, the source is an argument.
The one below is FileSystemAccess that is a Parameter (as in a callback), or anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Max's point was that file system accesses may have any number of arguments that aren't buffers getting filled with file data. res.sendFile(path) from the Express library is one example of a file access, but it wouldn't make sense to treat path a taint source. However, the method you need is clearly missing from FileSystemAccess, and we're probably in a better position to add that ourselves. I'm fine with fixing this up after the PR lands.

Now apart from that, I don't think this actually has the intended effect. I'm guessing it is supposed to flag something like this:

let buffer = new Buffer(length);
fs.read(fd, buffer, 0, length, null, function(err) {
  res.send(buffer); // flag this
});

However, tainting buffer on line 2 won't actually affect buffer on line 3. Tainting new Buffer() would work, and you can accomplish this with getALocalSource():

this = cn.getAnArgument().getALocalSource()

Copy link
Contributor Author

@denislevin denislevin Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests for buffered read, and they seem to taint buffer when it's passed as a parameter to fs function.

I agree I need some way to say what is tainted as a result of fs function call - which parameter, or even return value. Right now it's a bit of a mix since NodeJSLib defines NodeJSFileSystemAccess as any call on fs, and somewhere I saw a sink or source that defines any argument to any call as a sink (or source):

private class NodeJSFileSystemAccess extends FileSystemAccess, DataFlow::CallNode {
  string methodName;

NodeJSFileSystemAccess() {
  exists (string moduleName | this = DataFlow::moduleMember(moduleName, methodName).getACall() |
    moduleName = "fs" or moduleName = "graceful-fs"
  )
}

override DataFlow::Node getAPathArgument() {
  exists (int i | fsFileParam(methodName, i) |
    result = getArgument(i))
 }
}

}
So perhaps FileSystemAccess needs "What's tainted" function, i.e. DataFlow::Node getTaintedDataNode(). I am not sure what implementation for Express will be - perhaps returning none()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the interface to FileSystemAccess needs to be become richer. It should also let us distinguish reads and writes. We're happy to take care of this after the PR lands.

Note that getAPathArgument() doesn't just allow any argument. It is determined by fsFileParam which is based on how parameters are declared in the extern libraries.

/** A source is a file access callback parameter, as in fs.readFile('MyFile.json', 'utf-8', function (err, data) {}. */
private class FileAccessParamAsSource extends Source {
FileAccessParamAsSource() {
this instanceof FileSystemAccess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly broad. Perhaps you'd want to restrict this to certain kinds of file system accesses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I probably should restrict it to 'reads' - good point.

@denislevin
Copy link
Contributor Author

Guys, need to ask for help here. I don't know why the build is failing, and link with "Details" doesn't open for me (https://jenkins.internal.semmle.com/job/Language-Tests/job/JavaScript/3697/ not leading anywhere).
I suspect this is because my Windows git changed line endings on some of the tests to \r\n instead of \n.
'ql.editorconfig has this:
[.{ql,qll,qlref,dbscheme,qhelp}]
end_of_line = lf
`
so test files (
.cpp, *.cs, *.js, *.ts, etc. ) are not included.

@asger-semmle
Copy link
Contributor

Sorry about the hassle. It's a compilation error due to the PR going out of sync with master:

ERROR: Could not resolve type HttpRequest (ql/javascript/ql/src/semmle/javascript/frameworks/NodeJSLib.qll:665,14-25)

The internal class HttpRequest was subsumed by ClientRequest. The old name didn't make it clear if this was a request being sent or a request being received, which is an important distinction. We do try to be careful about breaking changes, but this was an internal class.

Could you try rebasing or merging in master? Renaming HttpRequest to NodeJSClientRequest should then do the trick.

@asger-semmle
Copy link
Contributor

Could you please also remove the "Normalize line endings" commit? It touches files that shouldn't be modified here, and it seems to insert CR line endings which we are trying to avoid (#181).

@denislevin denislevin changed the title JS: http to file access and file to http body taint flow:WIP JS: http to file access and file to http body taint flow Sep 14, 2018
@denislevin
Copy link
Contributor Author

Thank you for the review and error root cause! I have fixed it and reverted "Normalize line endings" commit.

Copy link
Contributor

@asger-semmle asger-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been going over @xiemaisi's review, and everything seems to have been addressed, except for one issue that still needs attention.

Now I realize the PR has been up for over two weeks now and if you'd rather just land the PR and be done with it, just let me know and we'll land it like this and we'll take care of the rest on our end.

private class FileAccessArgumentAsSource extends Source {
FileAccessArgumentAsSource() {
exists(DataFlow::CallNode cn |
cn instanceof FileSystemAccess and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Max's point was that file system accesses may have any number of arguments that aren't buffers getting filled with file data. res.sendFile(path) from the Express library is one example of a file access, but it wouldn't make sense to treat path a taint source. However, the method you need is clearly missing from FileSystemAccess, and we're probably in a better position to add that ourselves. I'm fine with fixing this up after the PR lands.

Now apart from that, I don't think this actually has the intended effect. I'm guessing it is supposed to flag something like this:

let buffer = new Buffer(length);
fs.read(fd, buffer, 0, length, null, function(err) {
  res.send(buffer); // flag this
});

However, tainting buffer on line 2 won't actually affect buffer on line 3. Tainting new Buffer() would work, and you can accomplish this with getALocalSource():

this = cn.getAnArgument().getALocalSource()

Copy link
Contributor

@asger-semmle asger-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments inline, but again, we're happy to just merge it as-is and take it from here.

May I just ask you to squash your commits, so the line ending commit is removed from the history?

fs.stat(fileName, function (error, stats) {
fs.open(fileName, "r", function (error, fd) {
var buffer = new Buffer(stats.size);
fs.read(fd, buffer, 0, buffer.length, null, function (error, bytesRead, buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test unfortunately doesn't cover the case I mentioned. The buffer callback parameter is shadowing the outer buffer variable. The source of taint here is indeed the callback parameter, not the buffer passed to rs.read.

So this is handled by FileAccessCallbackParam and FileAccessParamAsSource, not FileAccessArgumentAsSource.

If the buffer callback parameter is deleted, so buffer on line 13 refers to the outer buffer variable, the code should still be vulnerable but is not flagged because FileAccessArgumentAsSource doesn't mark new Buffer() as a source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, I'll rework the test.

private class FileAccessArgumentAsSource extends Source {
FileAccessArgumentAsSource() {
exists(DataFlow::CallNode cn |
cn instanceof FileSystemAccess and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the interface to FileSystemAccess needs to be become richer. It should also let us distinguish reads and writes. We're happy to take care of this after the PR lands.

Note that getAPathArgument() doesn't just allow any argument. It is determined by fsFileParam which is based on how parameters are declared in the extern libraries.

@denislevin
Copy link
Contributor Author

Got my change refactored to take care of separating read from write file access where applicable.

@denislevin denislevin force-pushed the denisl/js/HttpToFileAccessTest branch from 85aa40f to 8152cef Compare September 21, 2018 23:48
@xiemaisi
Copy link

Thanks for your latest updates, @denislevin! If you think this is more or less ready, we'd be happy to merge and perhaps fix up some minor things later on.

For the third-party code snippets used in your tests, could you add a header comment like this one indicating where you got them from?

@denislevin
Copy link
Contributor Author

Done! Yes, I think it's ready to merge.

@denislevin
Copy link
Contributor Author

Please merge if you see it fit.
Thank you!
=Denis

@semmle-qlci semmle-qlci merged commit 98254e8 into github:master Oct 4, 2018
@xiemaisi
Copy link

xiemaisi commented Oct 4, 2018

Thank you for your contribution, @denislevin!

@ghost ghost mentioned this pull request Oct 10, 2018
aibaars added a commit that referenced this pull request Oct 14, 2021
Rename {Rescue,RescueExpr} to {RescueExpr,RescueModifierExpr}
smowton pushed a commit to smowton/codeql that referenced this pull request Dec 6, 2021
erik-krogh pushed a commit to erik-krogh/ql that referenced this pull request Dec 15, 2021
erik-krogh pushed a commit to erik-krogh/ql that referenced this pull request Dec 15, 2021
MathiasVP added a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025
…cit-models

PS: Extract source files found via `PSModulePath`
MathiasVP added a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants