FlowState is a flow-sensitive, intraprocedural static analysis engine designed to detect "taint-style" vulnerabilities, such as Cross-Site Scripting (XSS), in JavaScript.
By parsing source code into an Abstract Syntax Tree (AST) and constructing a Control Flow Graph (CFG), FlowState tracks the propagation of untrusted user input from sources to sensitive sinks
(Expect a more comprehensive academic report to be published here at a later date)
Completed as a 3rd year Computer Science project.
- Flow-Sensitive tracking: unlike naïve AST walkers that generate false positives on safe code, FlowState accurately tracks data flow to detect when tainted input from sources flow into sinks.
- Propagation tracing: generates step-by-step traces showing exactly how tainted data travelled from source to sink as text, JSON, or even interactive HTML reports.
- Path elimination: eliminates unreachable conditional branches that can be decided statically.
- CFG plotting: built-in Graphviz integration to automatically generate and plot the underlying CFG using the
--plot-cfgflag.
First, install the dependencies with pnpm install / npm install.
Next, build the project with npm run build.
You can then add the CLI to your PATH using npm link and invoke it with flowstate [options] <file>, where <file> is the JavaScript entry file to analyse.
(If you don't want to or don't have rights to link the CLI globally, you can instead use node ./dist/index.js [options] <file> to run it directly from the build output)
Usage: flowstate [options] <file>
Arguments:
file The JavaScript entry file to analyse
Options:
-v, --verbose Enable verbose logging
-a, --analyser <analyser> The analyser to use (naive, one_pass, iterative, worklist). The worklist analysis is strongly recommended. (default: "worklist")
-f, --format <formatters> The output formatters to use (text, json, html). Multiple can be specified, separated by commas. If multiple are selected, they will be separated by 4 newlines. (default: ["text"])
--console Output messages passed to console.log, .warn, .error etc to the log
--no-merge Disable merging of basic blocks in the CFG.
--no-path-elim Disable path elimination (dead code removal) in analysis.
--collect-code Collect the code corresponding to each CFG node, useful for plotting or verbose logging. Increases memory usage, especially for large programs.
--plot-cfg <path> Plots the CFG and outputs to the given path (automatically detecting type based on extension, e.g. png, dot). Requires graphviz to be installed and in the system PATH.
--plot-path-elim <none|soft|hard> How to plot eliminated paths in the CFG plot. 'none' will not visually distinguish eliminated paths, 'soft' will show them as dashed lines, and 'hard' will remove them entirely from the plot.
Default is 'soft'. This option has no effect if --plot-cfg is not used or if --no-path-elim is used. (default: "soft")
-h, --help display help for command
Analysers:
naive: an example of naive analysis of the AST, reporting any usage of a sink no matter the data passed. This will generate many false positives!
one_pass: a simple one pass analysis through the CFG. It does not know how to handle programs with loops and is therefore not recommended.
iterative: an iterative analysis that repeatedly processes the CFG until a fixed point is reached. This will handle loops, but may be slow on large programs.
worklist: a more efficient version of the iterative analysis that only reprocesses nodes when their predecessor states change. This is the recommended analyser to use.
To use plotting, install Graphviz and add it to your PATH.
To view HTML output in a web browser, use your shell's native piping to output to a file, such as flowstate ... --format html > out.html
FlowState is written in TypeScript, using strict typing and interfaces to define robust data structures.
- CLI (via Commander): a terminal interface for configuring the analysis environment.
- AST parser (via Espree): converts JavaScript source code into an ESTree-compliant Abstract Syntax Tree.
- CFG construction: translates the AST into a Control Flow Graph. Nodes are split into basic blocks and conditionals, with generated transfer functions attached to approximate how code mutates the program state.
- Data flow analyser: simulates program execution by applying transfer functions to state objects, using fixed-point mathematics to resolve loops.
- Output formatter: consolidates the findings and outputs the propagation reports to the requested format.
The engine simulates execution by treating every block of code (node
The program states at each program node can be modelled with the following equations:
The input state (
Once the input state is merged, the engine simulates the specific logic of that code block called a transfer function (
