Skip to content

Latest commit

 

History

History
253 lines (207 loc) · 10.2 KB

analysis-tools.md

File metadata and controls

253 lines (207 loc) · 10.2 KB

Introduction to analysis tools

Analysis tools are programs that analyze the codebases that are affected by CVEs in the dataset. For the purposes of the CVE Benchmarking, analysis tools are expected to identify the weaknesses in the vulnerable source code, while not flagging up an alert in the patched source code.

The analysis tools are integrated into the CVE Benchmark tooling through tool drivers: small scripts that act as the interface between the analysis tools and the CVE Benchmark tooling.

The purpose of an analysis tool driver is to execute its analysis tool appropriately on the relevant commits of a CVE and convert its results to a common form that can be used to generate benchmark reports.

Supported analysis tools

The contrib/tools directory contains analysis tool drivers that have been contributed by the community. The name of these directories is not significant, and drivers do not have to be installed into this directory in order to be used by bin/cli run.

Some versions of the following analysis tools are supported with drivers:

Analysis tool Driver location
CodeQL contrib/tools/codeql
ESLint contrib/tools/eslint
NodeJSScan (njsscan) contrib/tools/nodejsscan

Some vendors might decide to not publicly release a tool driver that integrates their security product into the CVE Benchmark. Please contact your security tool vendor for more information about their support.

Running analysis tools

Before using bin/cli run to evaluate the ability of a code analysis tool to generate alerts, you should:

Installing an analysis tool

For each supported analysis tool, there should be documentation about how to install the tool. For example, see eslint/README.md to see how to install eslint.

Configuring an analysis tool driver

A driver must be configured with an entry in the tools section of a configuration file (config.json). Drivers need to be configured with at least some knowledge about where their backing analysis tool is located. For more information about configuration files, see Configuration or the README for each driver. Note that it is up to the driver how paths in these configurations are interpreted, and that there are no general guarantees about how relative paths will be resolved.

For example, to benchmark eslint you must insert the following snippet in your configuration file:

{
    "tools": {
        "eslint-default": {
            "bin": "node",
            "args": [
                "/home/user-name/ossf-cve-benchmark/build/ts/contrib/tools/eslint/src/eslint.js",
            ],
            "options": {
                "eslintDir": "/home/user-name/analysis-tools/eslint-default"
            }
        }
    }
}

This snippet provides the identifier you must use when using bin/cli commands that require a --tools option. For example, in the snippet above eslint-default is the identifier for using eslint with bin/cli run.

Writing analysis tool drivers

To add support for a new analysis tool, you should write an analysis tool driver. Typically, implementing a driver only takes a couple of hundred lines of code! To add a new driver, you should:

  • Clone or fork the openssf-cve-benchmark repository.
  • Add a new directory to contrib/tools.
  • Add a README.md file that describes how the analysis tool can be installed, and how the driver can be configured.
  • Optionally, add an executable installers/install.sh or installers/install.cmd script that can install a version of the analysis tool. For example, see contrib/tools/eslint/installers/install.sh.
  • Add your new driver to the table of supported analysis tools.
  • Open a pull request!

When adding support for a new driver, you might want to reuse the existing logic for running analysis tools on multiple CVEs. For more information about using driver.ts in a new driver, see eslint.ts.

Inputs to analysis tool driver runs

When a user runs bin/cli run, they are required to supply a tool identifier <TOOL_ID> with the --tool option. bin/cli looks up <TOOL_ID> in the configuration file of the run, and executes the bin property of that driver configuration with the following positional arguments:

  • 1 ... n. fixed arguments: the values in the args property of the driver configuration. This is usually just a single string that points to the driver implementation
  • n + 1. options: a path to a dynamically generated JSON file that contains the inputs for the driver. The format of this JSON file is described by the DriverInputs type.

For example, consider the following contents of config.json:

{
  "tools": {
    "eslint-default": {
      "bin": "node",
        "args": [
          "/home/user-name/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js"
         ]
      },
      "options": {
        "eslintDir": "/home/user-name/analysis-tools/eslint-2020-12-08"
      }
    }
}

Here, the <TOOL-ID> is eslint-default and the args value is the path to the driver implementation eslint.js. To run an analysis on the command line you would use:

$ bin/cli run --tool eslint-default CVE-123-456 CVE-789-000

which executes:

$ node \
  /home/user-name/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js \
  /tmp/driver-inputs.json 

The JSON file with the driver inputs contains the --tool option value, the CVEs to analyze, and the effective configuration.

In this example, the effective configuration contains two things of particular interest:

  • the tools.<TOOL_ID>.options: a driver-specific JSON value with additional configuration options for the driver, for instance where the analysis tool is installed
  • results: the directory where the driver should emit result files for the analysis of each CVE

For the current example, the /tmp/driver-inputs.json file will contain the following fragments:

{
  "toolID": "eslint-default",
  "bcves": [ { "CVE": "CVE-123-456", ... }, { "CVE": "CVE-789-000", ... } ],
  "config": {
    "results": "/home/user-name/ossf-cve-benchmarking/work/results",
    "tools": { 
      "eslint-default": {
        "bin": "node",
          "args": [
            "/home/user-name/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js"
           ]
        },
        "options": {
          "eslintDir": "/home/user-name/analysis-tools/eslint-2020-12-08"
        }
      },
      ...
    }
  }
}

So after /home/user-name/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js has executed the eslint installed at /home/user-name/analysis-tools/eslint-2020-12-08, the /home/user-name/ossf-cve-benchmarking/work/results directory should contain one to four files with information about how the analysis tool performed on the vulnerable and fixed commits of CVE-123-456 and CVE-789-000.

Outputs from analysis tool driver runs

When bin/cli run ... executes a driver, the driver should emit files to the provided results directory. It's up to the driver how it names these files, but they should be valid according to Log.schema.json.

Below is an example with a run of the eslint driver on commit ba6a6f13691000ffaf22ef8e731513737659447f of CVE-2020-4066. We can see that an alert was raised on line 106 of file classifiers/svm/SvmLinear.js. It is then up to the subsequent report generator to decide the value of this alert with respect to CVE-2020-4066.

{
  "runs": [
    {
      "CVE": "CVE-2020-4066",
      "commit": "ba6a6f13691000ffaf22ef8e731513737659447f",
      "toolID": "eslint-default",
      "status": "SUCCESS", 
      "alerts": [ 
        ...
        {
          "ruleID": "security/detect-non-literal-fs-filename",
          "location": {
            "file": "classifiers/svm/SvmLinear.js",
            "line": 106
          }
        }
        ...
      ]
    }
  ]
}

Expected analysis tool driver behavior

When analyzing the commits of a CVE, drivers are expected to behave in much the same way as when the corresponding analysis tool has been configured by an expert. For example:

  • The driver should use the analysis rules that are relevant for the selected CVE data. For example, if a CVE is about a buffer overflow in C++ code, the driver shouldn't run queries about cross-site scripting in JavaScript. If a driver establishes that an analysis tool has no relevant rules or queries for a particular CVE, the driver can abort early without starting the analysis run.
  • Drivers should use any extra information that is normally available to an analysis tool when configured by an expert. For example, some analysis tools need information about special build instructions that are required to produce meaningful results for a particular project. Such information could be downloaded as part of the install steps for a driver.
  • If external information is not available for an analysis tool, the driver can use information contained in the benchmark CVE data. For example, the CWEs value of a benchmark CVE entry can be used to indicate the queries that are relevant for a driver to run. Additionally, the extensions of the files that contain known weaknesses indicate the relevant programming language, which can also be used to determine which queries are relevant.