-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Java: Add modelling for Guava IO utilities #5310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"com.google.common.io;BaseEncoding;true;decodingSource;(CharSource);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;encode;(byte[]);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;encode;(byte[],int,int);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;withSeparator;(String,int);;Argument[0];ReturnValue;taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can taint be modelled backwards?
In that case should this also match encodingSink
and encodingStream
where the return value taints the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few special cases that model backwards taint steps, however there is no general mechanism for it, as it would likely lead to false positives from "time-travel flow", in cases like the following:
x = new StringWriter();
y = enc.encodingStream(x)
sink(x.toString());
y.write(taint());
If the writes to y
are propegated backwards to x
, then we get spurious flow to sink.
So, the question of whether to include things like encodingStream
(in general, wrappers around an output stream - of which there are several in the standard library that we don't model for simlar reasons) amounts to whether the additional effort to implement a special-case backwards flow step plus the potential for these kinds of FPs is worth the few additional results it could get; which I don't think it is.
"com.google.common.io;BaseEncoding;true;decode;(CharSequence);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;decodingStream;(Reader);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;decodingSource;(CharSource);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;encode;(byte[]);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;upperCase;();;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;lowerCase;();;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;withPadChar;(char);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;omitPadding;();;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;BaseEncoding;true;encode;(byte[],int,int);;Argument[-1];ReturnValue;taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a comment why BaseEncoding
can be tainted. I was puzzled at first until I noticed that withSeparator
(and only that method) can taint the encoding object.
"com.google.common.io;ByteStreams;false;toByteArray;(InputStream);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;CharSource;true;asByteSource;(Charset);;Argument[-1];ReturnValue;taint", | ||
"com.google.common.io;CharSource;true;concat;;;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;CharSource;true;copyTo;(Appendable);;Argument[-1];Argument[0];taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also cover copyTo(CharSink)
? Someone could write their own CharSink
which acts like a StringWriter
, though this might be unlikely since it does not fit the intended use case of CharSink
very well.
"com.google.common.io;CharStreams;false;copy;(Readable,Appendable);;Argument[0];Argument[1];taint", | ||
"com.google.common.io;CharStreams;false;readLines;(Readable);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;CharStreams;false;toString;(Readable);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;Closer;true;register;;;Argument[0];ReturnValue;value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should exception objects also be tracked for taint? E.g. when an exception message is created with a user-defined value and is then logged, it could allow log injection. Though in general any flow from user-defined data to an exception constructor is bad, so maybe it is not worth trying to check where the tainted exception flows afterwards?
"com.google.common.io;CharStreams;false;readLines;(Readable);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;CharStreams;false;toString;(Readable);;Argument[0];ReturnValue;taint", | ||
"com.google.common.io;Closer;true;register;;;Argument[0];ReturnValue;value", | ||
"com.google.common.io;Files;false;getFileExtension;(String);;Argument[0];ReturnValue;taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all the methods which read or transfer the file content to in-memory structures (e.g. Appendable
) also be considered, when the user can chose the file (though this on its own is already bad)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a query for user controlled data flowing to a File
, so additional taint steps from that point are unlikely to be helpful. Additionally, a user controlling a file path doesn't necassarily mean they can control the file contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to cover com.google.common.io.Resources
methods taking a URL as argument?
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
As it is unlikely for a user-controlled resource URL to imply that its contents are user controlled, those methods don't seem like useful flow steps. |
Despite |
In fact, there is a query for calling |
Otherwise LGTM. |
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
No description provided.