Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interprocedural data flow analysis for Java programs #391

Closed
tangfy97 opened this issue Sep 29, 2021 · 12 comments
Closed

Interprocedural data flow analysis for Java programs #391

tangfy97 opened this issue Sep 29, 2021 · 12 comments

Comments

@tangfy97
Copy link

Hi! First I want to thank you all for developing such a handy tool for flow analysis.

Since I saw you mentioned FlowDroid is also able to compute data flows for Java programs, just wondering is there any documentation on how to analyse Jar file (from a Java program) solely instead of including an Android JAR? I saw that soot-infoflow was implemented for Java programs but are there any documentation/guide on how to do a pure Java implementation with it? Sorry for the naive question :)

Thanks a lot!

@canliture
Copy link
Contributor

I think maybe you can see soot.jimple.infoflow#Infoflow()

/**
* Creates a new instance of the InfoFlow class for analyzing plain Java code
* without any references to APKs or the Android SDK.
*/
public Infoflow() {
  super();
}

and
Infoflow#computeInfoflow(String appPath, String libPath, IEntryPointCreator, ISourceSinkManager)

you can pass .jar path to appPath

@tangfy97
Copy link
Author

I think maybe you can see soot.jimple.infoflow#Infoflow()

/**
* Creates a new instance of the InfoFlow class for analyzing plain Java code
* without any references to APKs or the Android SDK.
*/
public Infoflow() {
  super();
}

and Infoflow#computeInfoflow(String appPath, String libPath, IEntryPointCreator, ISourceSinkManager)

you can pass .jar path to appPath

Thanks for your reply! Looking good but do I still to use libPath and pass an Android.jar into it?

@StevenArzt
Copy link
Member

You can have a look at the JUnit test cases inside the soot-infoflow project, look at package soot.jimple.infoflow.test.junit.

@tangfy97
Copy link
Author

tangfy97 commented Oct 4, 2021

You can have a look at the JUnit test cases inside the soot-infoflow project, look at package soot.jimple.infoflow.test.junit.

Thanks! The test cases are very helpful, still, for pure Java programs with jar files, should the entry point simply be main? Does the libPath for android still matters when we use computeInfoflow()?

@StevenArzt
Copy link
Member

For a normal Java program, the entry point is the main method. The test cases need to set the respective test methods as entry points. The library path is only relevant in case your program depends on external JAR files.

@tangfy97
Copy link
Author

tangfy97 commented Oct 6, 2021

Hi, thank you all for your patient help, really appreciate it! I have managed to make my small analysis work, however, when I try to find a flow that passes through different external libraries, the result came back as null. The source and sink methods are identified correctly and it worked perfectly well if all classes are within the same package. Do you know how to configure the code to enable analysis across external classes in different packages (I reckon the libpath was correct because the sink method locates in an external class and it got identified)?

Thank you so much for your help.

@StevenArzt
Copy link
Member

Is your setup as follows: One main program JAR and some external libraries as additional JARs? Did you put these additional JARs on the library path? Are the sources and sinks all in the main JAR or also in the library JARs?

@tangfy97
Copy link
Author

tangfy97 commented Oct 6, 2021

Hi! Thanks for the swift reply.
Yes one main JAR plus two external additional JARs.
Yes all of them in the library path.
Source locates in main JAR. So the data flows from this source in main JAR to the 1st external JAR then 2nd external JAR. I try to pin down the sink in the 2nd external JAR but it came out null flow. If I just set a sink in 1st external JAR, then the flow is identified. The missing part is somehow the link between two external JARs.

@StevenArzt
Copy link
Member

That is indeed strange, because FlowDroid loads all classes into the same Soot scene. Is the sink in the second library JAR detected correctly? That would indicate that all classes are actually loaded.

I'm not sure that the different JARs have anything to do with the issue. That might just be a coincidence. The next step would be to debug the data flow analysis. If you know the call chain between source and sink, you can try to see where the flow is lost. Have a look at method getCallFlowFunction in class InfoflowProblem. This flow function maps taints from a call site into a callee. If your taint flows between methods as A -> B -> C -> D -> E, you can check whether there is a call flow from C to D. If yes, the flow gets lost later. If no, you can check the flow from B to C. This would help you to bisect the position on the path where the flow is lost. The boundaries between the JARs are a logical starting point, but as I said, there might be other reasons, such as missing summaries for JDK methods.

@tangfy97
Copy link
Author

tangfy97 commented Oct 6, 2021

Hi, very sorry to bother you again... I did some tests on the code again. The data flow should be A->B->C->D, A and D are in main while B and C are from external JARs. It works simply as D(C(B(A))).

I tried to simply just define one source in main (A) and let all the rest methods just be sinks (hopefully to find all potential flows). Infoflow did identify the source and many sinks, but it only outputs one direct flow which is A->B, everything after B is somehow missed. But if I list B as a source, then B->C is reported. I suspect maybe I did miss some parts of the settings. Since sources and sinks are correctly identified, I assume ISourceSinkManager is correct (although when I tried to use infoflow.getCollectedSinks(), it does not output anything). Maybe there are some settings I shall modify for infoflow.computeinfoflow()? I am using the published soot-infoflow-2.9.0.jar as the library. The simple snippet I used is below. It is very odd to only have the pieces of flows, I would really appreciate it if you could guide me with some potential mistakes.

           `String targetPath = System.getProperty("user.dir")+"/examples/testing.jar";
	String libPath = System.getProperty("user.dir")+"/lib";

	IInfoflow infoflow = new Infoflow();
	Collection<String> epoints = new ArrayList<String>();
	epoints.add("<toy.test: void main(java.lang.String[])>");
	
	DefaultEntryPointCreator entryPoints = new DefaultEntryPointCreator(epoints);

	ISourceSinkManager sourceSinkMgr = new ISourceSinkManager() {

		@Override
		public SourceInfo getSourceInfo(Stmt sCallSite, InfoflowManager manager) {
			if (sCallSite.containsInvokeExpr()
					&& sCallSite instanceof DefinitionStmt
					&& sCallSite.getInvokeExpr().getMethod().getName().toLowerCase().contains("scan")) {
				AccessPath ap = manager.getAccessPathFactory().createAccessPath(
						((DefinitionStmt) sCallSite).getLeftOp(), true);
				return new SourceInfo(null, ap);
			}
			return null;
		}

		@Override
		public SinkInfo getSinkInfo(Stmt sCallSite, InfoflowManager manager, AccessPath ap) {
			if (!sCallSite.containsInvokeExpr())
				return null;

			SootMethod target = sCallSite.getInvokeExpr().getMethod();
			SinkInfo targetInfo = new SinkInfo((ISourceSinkDefinition) new MethodSourceSinkDefinition(new SootMethodAndClass(target)));
			
			return targetInfo;
		}

		@Override
		public void initialize() {
			// TODO Auto-generated method stub
			
		}
	};
	
	infoflow.computeInfoflow(targetPath, libPath, entryPoints, sourceSinkMgr);
	System.out.println(infoflow.getCollectedSinks());
	System.out.println(infoflow.getResults());`

The output:
[main] INFO soot.jimple.infoflow.Infoflow - Resetting Soot... [main] INFO soot.jimple.infoflow.Infoflow - Basic class loading done. [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Implicit flow tracking is NOT enabled [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Exceptional flow tracking is enabled [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Running with a maximum access path length of 5 [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Using path-agnostic result collection [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Recursive access path shortening is enabled [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Taint analysis enabled: true [main] INFO soot.jimple.infoflow.InfoflowConfiguration - Using alias algorithm FlowSensitive [main] INFO soot.jimple.infoflow.memory.MemoryWarningSystem - Registered a memory warning system for 614,419 MiB [main] INFO soot.jimple.infoflow.Infoflow - Callgraph construction took 0 seconds [main] INFO soot.jimple.infoflow.codeOptimization.InterproceduralConstantValuePropagator - Removing side-effect free methods is disabled [main] INFO soot.jimple.infoflow.Infoflow - Dead code elimination took 0.0095789 seconds [main] INFO soot.jimple.infoflow.Infoflow - Callgraph has 10 edges [main] INFO soot.jimple.infoflow.Infoflow - Starting Taint Analysis [main] INFO soot.jimple.infoflow.data.FlowDroidMemoryManager - Initializing FlowDroid memory manager... [main] INFO soot.jimple.infoflow.Infoflow - Using context- and flow-sensitive solver [main] INFO soot.jimple.infoflow.Infoflow - Using context- and flow-sensitive solver [main] WARN soot.jimple.infoflow.Infoflow - Running with limited join point abstractions can break context-sensitive path builders [main] INFO soot.jimple.infoflow.Infoflow - Looking for sources and sinks... [main] INFO soot.jimple.infoflow.Infoflow - Source lookup done, found 1 sources and 17 sinks. [main] INFO soot.jimple.infoflow.Infoflow - IFDS problem with 11 forward and 0 backward edges solved in 0 seconds, processing 1 results... [main] INFO soot.jimple.infoflow.Infoflow - Current memory consumption: 150 MB [main] INFO soot.jimple.infoflow.Infoflow - Memory consumption after cleanup: 30 MB [main] INFO soot.jimple.infoflow.data.pathBuilders.BatchPathBuilder - Running path reconstruction batch 1 with 1 elements [main] INFO soot.jimple.infoflow.data.pathBuilders.ContextSensitivePathBuilder - Obtainted 1 connections between sources and sinks [main] INFO soot.jimple.infoflow.data.pathBuilders.ContextSensitivePathBuilder - Building path 1... [main] INFO soot.jimple.infoflow.memory.MemoryWarningSystem - Shutting down the memory warning system... [main] INFO soot.jimple.infoflow.Infoflow - Memory consumption after path building: 27 MB [main] INFO soot.jimple.infoflow.Infoflow - Path reconstruction took 0 seconds [main] INFO soot.jimple.infoflow.Infoflow - The sink i1 = virtualinvoke r1.<calculation.add: int db(int)>(i0) in method <toy.test: void main(java.lang.String[])> was called with values from the following sources: [main] INFO soot.jimple.infoflow.Infoflow - - i0 = staticinvoke <toy.test: int scan()>() in method <toy.test: void main(java.lang.String[])>

@StevenArzt
Copy link
Member

The methods getCollectedSources() and getCollectedSinks() only return proper sets when InfoflowConfiguration.logSourcesAndSinks is set to true. Maybe enable this option and see what you get there.

The library path seems strange. You need to put the individual JARs, not a directory. Can you try this out?

@tangfy97
Copy link
Author

tangfy97 commented Oct 6, 2021

The methods getCollectedSources() and getCollectedSinks() only return proper sets when InfoflowConfiguration.logSourcesAndSinks is set to true. Maybe enable this option and see what you get there.

The library path seems strange. You need to put the individual JARs, not a directory. Can you try this out?

Thanks a lot! I tried to combine these two JARs together into one and pointed that JAR to the path and it worked :) Thanks again for the patient reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants