-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong taint results for relational and conditional operators #1
Comments
Hi Bihuan, If you run this same exact code with the controlTrack flag set, you should see the output: In this case, there are branches in System.out.println that depend on the value of what we are printing (e.g. for deciding how to print numbers). So the line that prints e will cause e to be relevant to the control flow state, and the line that prints f will cause it to be relevant as well. If you DON'T print the values, then you will see the expected behavior. I've demonstrated this behavior in a test case here - Programming-Systems-Lab@f05efe6 |
Thanks, Jon. I did set the "controlTrack" flag, but unfortunately I used the old phosphor.jar in the master branch. |
Hi Jon, I encountered another problem. See the example below. public class Test {
public static void test(int x, int y, boolean b) {
Taint tg = null;
if (b) {
int g = x + y;
//System.out.println(g);
tg = MultiTainter.getTaint(g);
}
int h = 1;
Taint th = MultiTainter.getTaint(h);
System.out.println(tg);
System.out.println(th);
}
public static void main(String[] args) {
int x = -1;
int y = -1;
boolean b = true;
int xt = MultiTainter.taintedInt(x, "x");
int yt = MultiTainter.taintedInt(y, "y");
boolean bt = MultiTainter.taintedBoolean(b, "b");
test(xt, yt, bt);
}
} If I comment out System.out.println(g);, the output is: However, if I uncomment System.out.println(g);, the output becomes: Is this behavior expected? IMHO, the taint result for h in the second case should be null. |
Hi Bihuan, Let me clarify the semantics for how tags are applied in control flow. We apply the taint tag of the branch condition to all assignments that appear (through a simple post-dominator analysis) to be influenced by that branch. This analysis only considers the current method. So for example, in this case: if(b){
int g = 4; //g gets b's tag
}
else
{
int r = 2; //r gets b's tag
}
int x = 3; //x does not get b's tag It gets more complex when you call methods - because the state is carried over (persisting between methods). if(b){
someMethod();
}
...
public void someMethod()
{
this.var = 5; //when someMethod() is called by the code above, in the branch, this.var gets b's tag!
} Because we only remove a taint tag from the control flow marker when we are sure that it is no longer influencing the control flow, we can end up with marks accumulating that don't matter, for reasons like this: if(b){
//control flow now marked with b
return;//b still marks control flow at call site - and won't be removed
} This is what is happening in println - there is a branch on the value being printed, and no clear resolution to that branch. One way to see it is that we are being conservative here - if a function has multiple return statements, then the choice of which return is used may change the control flow in other methods. Tracking implicit flow is incredibly difficult to do both precisely and soundly. We are using the same semantics as DyTan. If you have any suggestions for improvements please let us know :) |
Thanks for the clarification, Jon. In my problem, I want to know the set of inputs that each branch depends on. My solution here is to (1) enable the data-flow taint analysis only, (2) identify the set of variables that each branch uses, and (3) instrument the calls to getTtaint(X) for the identified set of variables for each branch to get its taint result. |
I'm not sure how you are doing (2-3) above. |
Thanks for the advice, Jon. I have followed your suggested steps. I implemented BranchTaintTagFactory, which just stored the originally-popped taint tags via invoking the BranchTaintRecorder.add(X) method. import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.TaintUtils;
import edu.columbia.cs.psl.phosphor.instrumenter.DataAndControlFlowTagFactory;
import edu.columbia.cs.psl.phosphor.instrumenter.LocalVariableManager;
import edu.columbia.cs.psl.phosphor.instrumenter.TaintPassingMV;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Label;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.MethodVisitor;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Opcodes;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Type;
public class BranchTaintTagFactory extends DataAndControlFlowTagFactory {
@Override
public void jumpOp(int opcode, int branchStarting, Label label, MethodVisitor mv, LocalVariableManager lvs, TaintPassingMV ta) {
switch (opcode) {
case Opcodes.IFEQ:
case Opcodes.IFNE:
case Opcodes.IFLT:
case Opcodes.IFGE:
case Opcodes.IFGT:
case Opcodes.IFLE:
//top is val, taint
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IF_ICMPEQ:
case Opcodes.IF_ICMPNE:
case Opcodes.IF_ICMPLT:
case Opcodes.IF_ICMPGE:
case Opcodes.IF_ICMPGT:
case Opcodes.IF_ICMPLE:
//top is val, taint, val, taint
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
//top is val, val, taint
mv.visitInsn(DUP2_X1);
mv.visitInsn(POP2);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IF_ACMPEQ:
case Opcodes.IF_ACMPNE:
// O1 (T1?) O2 (T2?)
Type typeOnStack = ta.getTopOfStackType();
if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
// O1 T1 O2 (T2?)
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
//O1 O2 (T2?)
Type secondOnStack = ta.getStackTypeAtOffset(1);
if (secondOnStack.getSort() == Type.ARRAY && secondOnStack.getElementType().getSort() != Type.OBJECT) {
//O1 O2 T2
mv.visitInsn(DUP2_X1);
mv.visitInsn(POP2);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
if(Configuration.WITH_UNBOX_ACMPEQ && (opcode == Opcodes.IF_ACMPEQ || opcode == Opcodes.IF_ACMPNE))
{
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
mv.visitInsn(SWAP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
mv.visitInsn(SWAP);
}
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IFNULL:
case Opcodes.IFNONNULL:
// O1 (T1?)
typeOnStack = ta.getTopOfStackType();
if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
//O1 T1
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.GOTO:
mv.visitJumpInsn(opcode, label);
break;
default:
throw new IllegalArgumentException("Unimplemented: " + opcode);
}
}
} import java.util.HashSet;
import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.runtime.Taint;
import edu.columbia.cs.psl.phosphor.struct.Tainted;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithIntTag;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithObjTag;
public class BranchTaintRecorder {
private static HashSet<Integer> setInt;
private static HashSet<Taint> setObj;
public static void add(int tag) {
if (setInt == null) {
setInt = new HashSet<Integer>();
}
setInt.add(tag);
}
public static void add(Taint tag) {
if (setObj == null) {
setObj = new HashSet<Taint>();
}
setObj.add(tag);
}
public static void add(Object obj) {
if (obj instanceof Tainted) {
if (obj instanceof TaintedWithObjTag) {
add((Taint)(((TaintedWithObjTag) obj).getPHOSPHOR_TAG()));
} else if (obj instanceof TaintedWithIntTag) {
add(((TaintedWithIntTag) obj).getPHOSPHOR_TAG());
}
}
}
public static HashSet<?> getBranchTaint() {
if (Configuration.MULTI_TAINTING) {
return setObj;
} else {
return setInt;
}
}
} However, when testing on TestVariables, I got the StackOverflowError. Am I missing something here when extending DataAndControlFlowTagFactory? If possible, can you give some hints to solve the problem? Thank you. java.lang.StackOverflowError import edu.columbia.cs.psl.phosphor.runtime.Tainter;
public class TestVariables {
public static void test(Pair p) {
int tag1 = Tainter.getTaint(p);
int tag2 = Tainter.getTaint(p.getH());
int tag3 = Tainter.getTaint(p.getW());
System.out.println((tag1 & 1) + " " + (tag1 & 2) + " " + (tag1 & 4));
System.out.println((tag2 & 1) + " " + (tag2 & 2) + " " + (tag2 & 4));
System.out.println((tag3 & 1) + " " + (tag3 & 2) + " " + (tag3 & 4));
}
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
int y = Tainter.taintedInt(1, 2);
Pair p = new Pair(x, y);
Tainter.taintedObject(p, 4);
test(p);
}
}
class Pair {
private int h;
private int w;
public Pair(int h, int w) {
this.h = h;
this.w = w;
}
public int getH() {
return h;
}
public void setH(int h) {
this.h = h;
}
public int getW() {
return w;
}
public void setW(int w) {
this.w = w;
}
} |
When you are doing any kind of instrumentation like this, it's vital that your runtime code (in this case, Do you need it to be a hashset? Would a linked list suffice? If so, you can use class WrappedInt{
public int v;
} Or, if you really only care about the 31 distinct markers (and not what combination they were encountered in) then for this case, it's probably fastest and easiest just to use a single bit string to store which tags were encountered, like this: private static int setInts;
public void add(int tag)
{
setInts |= tag;
} |
Thanks for your patience and help, Jon. I use public class Test {
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
if (x > 0) {
int tag = Tainter.getTaint(x);
assert((tag & 1) == 1);
}
assert(BranchTaintRecorder.getBranchTaint() == null);
}
} Meanwhile, now it needs much more time than just popping those tags. Is this normal? BTW, is it possible to get the corresponding source line number of those jump bytecodes? |
Heh... There are some optimizations that will avoid loading taint tags onto the stack when we know that they are going to be popped right away (e.g. for You can look at the generated code by using
But it is applied at many other branches. I just added the instrument-time option This slowdown is probably what is to be expected: you are replacing EVERY SINGLE branch operation in the entire execution of the JVM with this big dynamic call. Are you interested in just targeting specific branches? Even if you are targeting all of them, you can also probably make it significantly faster like this: public static void add(int tag) {
if(tag==0) return;
/* 99.99999% of calls will probably have 0 tag,
* so why are we allocating a new wrappedint and putting it on a list?
*/
if (setInt == null) {
setInt = new LinkedList<WrappedInt>();
}
setInt.add(new WrappedInt(tag));
} I've added a callback for line number visitation. |
Thanks for adding the option, Jon. I added the public class Test {
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
if (x > 0) {
int tag = Tainter.getTaint(x);
assert((tag & 1) == 1);
}
assert(BranchTaintRecorder.getBranchTaint() == null);
//assert(BranchTaintRecorder.getAllInts() == 1);
}
} Then I used a single bit string to store the tags like this: private static int allInts = 0;
public static int getAllInts() {
return allInts;
} However, I got the following exception: Exception in thread "main" java.lang.NoSuchMethodError: edu.ntu.taint.BranchTaintRecorder.getAllInts$$PHOSPHORTAGGED(Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;)Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;
at phosphor.test.Test.main(Test.java:17) So I added the required method: public static TaintedIntWithIntTag getAllInts$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
ret.taint = allInts;
ret.val = allInts;
return ret;
} Then I passed the second assertion in the above test case, which indicated that the tags were correctly stored. I don't understand why For the source line number issue, I want to get the line number in the source file where the encountered jump instruction corresponds to, and push this number onto the stack in |
Hi, Jon. Both problems are now solved. For the first problem, I can get the stored tags like this: private static LinkedList<BranchTaint> branchTaints;
public static LinkedList<BranchTaint> getBranchTaints() {
return branchTaints;
} But I still don't know why this version works while the previous one does not. Besides, if the runtime code needs to get private int tag;
public int getTag() {
return tag;
}
public TaintedIntWithIntTag getTag$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
ret.val = tag;
ret.taint = tag;
return ret;
} For the second problem, it seems I totally misunderstand the mechanism of ASM... The parameter Thanks for your great help. |
Glad that you got it working. You need to provide the $$PHOSPHORTAGGED version because that's how the taint tags are passed back and forth for primitives (if you have any passed as arguments or returned). You'll see that we do the same thing in Tainter.java. Note that you probably want to say I am not sure exactly what the issue was with Am I correct in understanding that this is all working now, though? |
Thanks for the suggestions, Jon. All things are working now. |
It seems Phosphor produces wrong taint results for relational (e.g. >) and conditional (e.g., &&) operators.
Here is an example I used to test Phosphor.
The outputs are:
e = 0 -> Taint [lbl=x deps = []]
f = 1 -> Taint [lbl=y deps = []]
b1 = false -> Taint [lbl=null deps = [y ]]
b2 = true -> Taint [lbl=null deps = [x b ]]
b3 = true -> Taint [lbl=null deps = []]
Based on my understanding, the outputs should be:
e = 0 -> Taint [lbl=x deps = []]
f = 1 -> Taint [lbl=y deps = []]
b1 = false -> Taint [lbl=null deps = [x ]]
b2 = true -> Taint [lbl=null deps = [y ]]
b3 = true -> Taint [lbl=null deps = [x b ]]
Please help to confirm the problem.
Thanks.
The text was updated successfully, but these errors were encountered: