Fast Bytecode Analysis
Scala Java
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
notes
project
results
src
.gitignore
build.sbt
notes.txt
readme.md

readme.md

FABA = Fast Bytecode Analysis

FAQ

  • What is FABA about?
    • It infers @Nullable, @NotNull, @Contract annotations for java libraries. And FABA is really fast: it processes JDK for ~30 sec.
  • Does FABA perform global analysis of libraries?
    • Yes! FABA takes into account nested calls at infinite depth!
  • Why is FABA fast?
    • See FABA architecture section below.
  • Do I need source code of a library?
    • No, FABA inspects bytecode only.
  • What is the output?
  • Is it for production?
    • No.
  • What is for then?
    • FABA is a research playground for testing and prototyping new ideas and algorithms of bytecode analysis implemented in IntelliJ IDEA.
    • In other words, FABA to some extent is a reference implementation of IntelliJ IDEA bytecode analysis.
  • What is IntelliJ IDEA bytecode analysis?
  • Is IntelliJ IDEA bytecode analysis open source?
  • What is the difference between FABA and IDEA 14 inference?
    • The top priority of IDEA analysis was the speed of analysis, so some features of inference were sacrificed to the speed of indexing in IDEA.
    • FABA version 1.0 corresponds to IDEA implementation.
    • More @NotNull annotations are inferred by current FABA (more aggressive propagation of @NotNull constraints).
    • Possibly, more powerful analysis will be ported to IDEA in the future.
    • Current FABA stores all equations in memory (see section FABA architecture below) and solves them at once. IDEA dumps equations to indices and solves these equations lazily. (IDEA's indices are too IDEA-specific to be reproduced here.)
    • FABA (in master branch) handles hierarchy (if all implementations of an interface method return @NotNullResult, then this interface method is also considered as @NotNull and so on)
  • What are next steps?
    • @Contract inference performs a lot of unnecessary operations. Many contract are not intuitive (meaningless, to be honest). Like this one: @Contract("!null,_,_,_->true;_,!null,_,_->;true;_,_,!null,_->true"). Tuning contracts is a lot of technical boilerplate.
    • Deeper inference of purity. It requires an effect system under the hood (@Pure, @Read, @Write, @LocalEffect).
    • Handling nullity of fields (at least static final and final).
  • What is the license?

How to run

faba.Main path_to_lib1.jar path_to_lib2.jar output_dir

It is easier to experiment with FABA directly from sbt:

runMain faba.Main /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/rt.jar results/jdk
runMain faba.Main /Users/lambdamix/code/kanva-micro/data/commons-lang3-3.3.2.jar results/commons
runMain faba.Main data/groovy-2.3.3.jar results/groovy.jar

FABA architecture

There are a few principles that make FABA fast, robust and scalable.

  • Staged global analysis. Indexing, equations, solving.
    • Each method is processed (indexed) exactly once. The result of processing is an equation. Equations consume reasonable small amount of memory. Equations hold only necessary information needed for certain analysis. In some sense, a method's bytecode is supercompiled in an equation. Indexing of methods is fast.
    • Separate equations to handle inheritance (hierarchy of classes and methods).
    • When all methods are processed, all equations are solved together at once. Solving equations takes a few seconds even for a large set of libraries.
  • Specialized analyses and equations.
    • Many small specialized indexers for each inference.
      • Separate analysis and a separate equation for inference of each @NotNull parameter.
      • Separate analysis and a separate equation for inference of each @Nullable parameter.
      • Separate analysis and a separate equation for inference of a @NotNull method.
      • Separate analysis and a separate equation for inference of a @Nullable method.
      • Separate analysis and a separate equation for inference of each _,null,_->..., _,!null,_->... clause.
      • Separate analysis and a separate equation for inference of pure methods (@Contract(pure=true)).
    • Supercompilation instead of classical dataflow analysis.
  • Fast auxiliary analyses
    • Auxiliary analyses accelerates primary analyses.
      • Leaking parameters analysis.
      • Result origins analysis.
      • Parameter to result flow analysis.

Quick FABA code documentation for hackers

The main parts of FABA are written in Scala and minor parts are written in Java. FABA relies on ASM framework for bytecode reading and semantic based bytecode interpretation. Some extensions and specialization of ASM are implemented in Java (via copy-patch technology).

Java part

The minor parts of FABA (in alphabetical order).

  • /src/main/java/faba/asm - specialized and patched classes from asm lib
    • faba.asm.AnalyzerExt
      • Extended version of org.objectweb.asm.tree.analysis.Analyzer. It handles frames and Data. Used in NullableResultAnalysis, Data is a set of constraints (dereferenced values) in this case.
    • faba.asm.FramelessAnalyzer
      • Specialized lite version of org.objectweb.asm.tree.analysis.Analyzer. Used to build control flow graph. Calculation of fix-point of frames is removed, since frames are not needed to build control flow graph. So, the main point here is handling of subroutines (jsr) and try-catch-finally blocks.
    • faba.asm.InterpreterExt
      • Trait for an interpreter (to be mixed with org.objectweb.asm.tree.analysis.Interpreter) which calculates additional constraints. Used in NullableResultAnalysis to propagate information about not null values.
    • faba.asm.LiteAnalyzer
      • Specialized lite version of org.objectweb.asm.tree.analysis.Analyzer. No processing of subroutines. May be used for methods without JSR/RET instructions.
    • faba.asm.LiteAnalyzerExt:
      • Fusion of faba.asm.LiteAnalyzer and faba.asm.AnalyzerExt.
    • faba.asm.LiteFramelessAnalyzer
      • Fusion of faba.asm.LiteAnalyzer and faba.asm.FramelessAnalyzer
    • faba.asm.Subroutine
      • Copy of org.objectweb.asm.tree.analysis.Subroutine.

Scala part

The main parts of FABA, in logical (how to read) order, functionalities (classes, utilities) are grouped by file.

  • /src/main/scala - directory (without subdirectories) contains main classes (common infrastructure and orchestration).
    • data.scala
      • Data structures for representing result of analyses, utilities to serialize inferred annotations into xml.
    • engine.scala
      • Lattices, equations over lattices, fast solver of equations.
    • source.scala
      • IO infrastructure to traverse java bytecode (jar-files, classes in folders, classes reachable from classloader)
    • calls.scala
      • Infrastructure to resolve method calls (in hierarchy). Call resolution happens in two stages: at the (first) indexing stage hierarchy information (inheritance relationship + method index) and all encountered calls are collected; at the (second) stage of equation construction the full hierarchy is built from index, all calls are resolved to a set of concrete calls, equations for overridable methods are constructed.
    • faba.scala
      • The main logic of orchestration of different analysis. Runs different analyses and puts equations got from different analyzers into corresponding solvers. It corresponds to indexing of java libraries.
    • main.scala
      • Solving of equations gathered at indexing phase, dumping of solutions in the form of external annotations into xml files.
  • /src/main/scala/analysis - the heart of FABA, different analyses
    • core.scala
      • Core data structures used for analyses:
        • control flow graph
        • DFS tree
        • abstract values for semantic interpretation
        • configuration
        • state (configuration + history + constraints)
        • StagedScAnalysis - Skeleton for implementing staged analysis via exploration of graph of configurations.
    • utils.scala - AnalysisUtils
      • Equivalence relations for configurations and states, "instance of" (subset) relation.
    • combined.scala
      • "All in one" staged analysis - for linear methods (without branching). All equations are constructed in a single pass over method's instructions. It is a good point to look at for understanding of mechanics of all analyses.
    • parameters.scala
      • NotNullParameterAnalysis - construction of equations for inference of @NotNull parameters.
      • NullableParameterAnalysis - construction of equations for inference of @Nullable parameters.
    • result.scala
      • ResultAnalysis - construction of equations for inference of @NotNull methods, construction of equations for inference of @Contract("_,null,_->?") contracts.
    • nullableResult.scala
      • NullableResultAnalysis - inference of @Nullable methods. Analysis is not based on supercompilation (to be fast). Really, current @Nullable method analysis is quite ad hoc and may be better if rewritten to supercompilation approach.
    • controlFlow.scala
      • Utilities to build control flow graph, depth-first search tree and testing reducibility of control flow graph. Construction of control flow graph is also specialized for methods without JSR/RET instructions.
    • leakingParameters.scala
      • LeakingParameters analysis. Sound approximation of parameters usage. Preliminary analysis to explore whether next @NotNull parameter analysis, @Nullable parameter analysis and @Contract analysis will produce interesting result. In many cases leaking parameter analysis allows to understand the result of more complex analysis in a fast manner. Another result of this analysis is a fixed point of frames states (it is used later for result origins analysis).
    • resultOrigins.scala
      • Analysis to understand at which instructions values that may become the result of the method are created. Result origins analysis speeds up ResultAnalysis a lot.
    • parameterToResultFlow.scala
      • Analysis to understand whether a method may just return a given parameter.
    • purity.scala
      • Very simple (not deep) analysis to infer pure methods (@Contract(pure=true)).