# Introduction

Our goal is to evaluate the security of Open Source Components (OSCs), and ideally, to find any existing vulnerability - mainly in server-based JavaScript code. For now, we focus only on three main type of injection vulnerabilities: code injection, command injection and SQL injections. 

Injection vulnerabilities share a basic structure: 
\begin{itemize}
\item An injection point, where the attacker may provide the injection code as input
\item Attack surface: intermediate code via which the injection code is passed, and possibly modified, but not sanitized (removed)
\item An activation point, where the attacker's input is applied as part of a system function, resulting in the (code/command/SQL) injection. 
\end{itemize}

To evaluate OSCs and find vulnerabilities, we will develop one or more tools, and apply them as part of a manual analysis of OSCs. At this phase of the project, we focus on the activation-point search tool. 
In this document, we discuss (1) the process for developing the activation-point search tool, (2) the plan and tools we will use to evaluate and find vulnerabilities in a given OSC using the activation-point search tool, and (3) our plan for completing this phase of the project. 

The core idea is that by searching for activation points, we make it significantly easier to evaluate if a given OSC is vulnerable to injection attacks, since every injection vulnerability must involve an activation point. Code which has no activation point, does not require any further analysis - and we expect this to be the case for some OSCs and definitely for most of the code within an OSC. 

An activation point involves calling specific functions or methods or specific packages. Any such invocation is invariably expressed via a piece of syntax that is recognizable. It is, of course, also clear that not all such activation points are vulnerable. When used safely, the activation point can achieve its intended effect without introducing any risks. The _safe use_ of the activation point is a consequence of the context within which the potentially dangerous activation point is used. This context is driven by the surrounding syntax as well as the invocation environment. Rooting out false positives is therefore a process that must examine this dynamic context which typically involves one or more execution paths connecting program or script inputs to the potentially dangerous activation point. A subset of these paths may prove benign (e.g., when the inputs are shown to be constants). The surviving (and therefore still potentially dangerous) paths need to be subjected to one last phase of analysis that reasons about the relationships between input values and the payload at the activation point.   

## Overview of the OSC evaluation process

Note that an activation point does not necessarily imply existance of a vulnerability in a given OSC; once we find an activation point, we will need to perform further analysis to know if there is also a corresponding injection point allowing the attacker to inject the code to be activated. This further analysis involves two steps: flow analysis, which involves finding the attack surface and corresponding injection points, and semantic analysis, which finds if the attacker may actually perform an injection attack using these injection points, attack surface (intermediate code) and activation points. 

The core of the proposed method is therefore based on 3 phases:

- Finding activation points using a static (syntax) analysis tool. The goal of the tool is to find activation points in the source code. For a package under test, the output is a set of potentially dangerous activation points. 
- Finding the corresponding attack surface using flow analysis (currently semi-manually, later possibly using an automation tool). The goal is to trace back, through the control flow graph as well as the call graph, the set of paths leading to an activation point and sourced at some user input (injection point). Potentially dangerous sites whose path "dead-end" into specific constructions (like constant declaration) may be filtered out as a result. For the package under test, the output is the subset of still potentially dangerous activation points, each one supported by a (non-empty) set of paths connecting user inputs (injection points) to the activation point.
- Semantic analysis: It's purpose is to reason about each path to establish the relationships between the inputs and the payload delivered at the activation point. Each symbolic step in the execution of the program along this path can potentially transform or affect the input and reduce or altogether neuter the potency of an attack that would be delivered through the input of that path. This analysis can be carried out with a variety of techniques ranging from test generation, to model checking and constraint reasoning. 

As of \today{}, the implementation provides a working (albeit not as powerful as we wish) syntax analysis tool to find activation points. The tasks laid out between now and August 15 focus on improving this analysis to cover more patterns as identified in known, reported security vulnerabilities, and applying the tool and additional tools to evaluate several OSCs and find possible vulnerabilities in them. 

In future work (Fall'19) we plan to develop an automated or semi-automated tool for finding the injection-points and attack-surface connecting injection-points to invocation-points, via analysis in the CFG/CG, and to develop methodology for semantic analysis and apply the entire process to additional OSCs. 

## Vulnerabilities under consideration

We re-interate that in this project we focus on three classes of injection vulnerabilities, namely:

1. Javascript code injection
2. OS command injection
3. SQL code injection

This is to focus the conversation and the effort and is no way a limitation of the approach. These are classic and apparently the most common vulnerabilities. Addressing them effectively can establish the validity of the approach and provides useful results.

## Tools and data sources to be used

### Public Databases

- [Snyk](https://snyk.io/) 

<img src="https://res.cloudinary.com/snyk/image/upload/v1533761770/logo-1_wtob68.svg" height="100"/>

>Snyk generally works as described above. Along with collecting known security issues with packages via the NIS, NVD, and NSP, Snyk also compiles its own database of issues, and will create patches for bugs the package developers haven't fixed.

- [SourceClear](https://www.sourceclear.com/) 

<img src="https://www.sourceclear.com/images/SourceClear_Logo_Primary_Black.png" height="60"/>

>This tool is similar to Snyk, however SourceClear is able to do some scanning of user code to check for use of known vulnerable methods from dependencies. This feature is only available in paid versions.

> Premium users can view the actual vulnerable part of the library. Even if a vulnerable library is in use, SourceClear can identify if a vulnerable method is in use. If the specific vulnerable method in not in use, the project might not be subject to a potential exploit. **_-SourceClear FAQ_**

### Syntax

This is a brief list of tools tried. The marked one is what we are currently using. 

- graspJS [graspJS.com](https://www.graspjs.com/)

>Use `npm install -g grasp` to install.

>Can perform searches on the parse tree. Particularly useful are the two syntaxes one which builds the search from sample code blocks, and another which builds queries from parse tree structures. Provides the ability to search from javascript and from the command line. 

- Esprima [esprima.org](http://esprima.org/)

>Use `npm install esprima` to install.

>Can parse and tokenize code into data structures usable inside of javascript. Does not have the searching capabilities of graspJS, would need to add them manually. Provides a more easily accessed interface to the root of the source. [Documentation](http://esprima.org/doc/index.html).

- ESLint with Security Plugin [eslint.org](https://eslint.org/)

>Linter which has the ability to add new rules. The security plugin adds rule sets for various common security vulnerabilities. The rules require extending to be more useful, there are many false positives, and it misses some trivial exploits. It seems to be difficult to extend. The extension building is focused on linting and pushing easily spotted problems to the developer, rather than for continued programmatic consumption. 

- **ACorn [acorn](https://github.com/acornjs/acorn)**

>Install via `npm i acorn`

- **ESTree-walker [estree-walker](https://github.com/Rich-Harris/estree-walker)**

>Install via `npm i estree-walker`

### Control Flow Graph

- [ast-flow-graph](https://www.npmjs.com/package/ast-flow-graph)

>Offers a library API with `walker` capabilities to traverse the graph. Worth digging into

- [cfg-graph](https://www.npmjs.com/package/cfg-graph)

>Documentation quite limited. Unclear whether it is worth pursuing.

- [styx](https://www.npmjs.com/package/styx)

>Limited doc. It can export to JSON. 

- **[esgraph](https://www.npmjs.com/package/esgraph)**

>Can generate dot files to visualize the CFG. Seems to be working ok on small example. Potential candidate (along with ast-flow-graph). 

### Call Graph

- **[persper/js-callgraph](https://www.npmjs.com/package/@persper/js-callgraph)**

>Supports both CLI and library API. Multi-file analysis. 

- [callgraph](https://www.npmjs.com/package/callgraph)

>Highly experimental. Not clear it is worth pursuing any further. Not clear that there is an actual library API either. 

### Semantic Analysis

- [Aratha](https://people.eng.unimelb.edu.au/pstuckey/papers/cpaior19d.pdf)

>Newly published constraint programming system which can be used for dynamic symbolic execution. Useful for finding which execution states will actually expose vulnerable interfaces. Includes built in models for its constraint programming system to read javascript. Will require a decent amount of work, but a very powerful way to understand source code. 

# The activation-point search tool

An injection vulnerability, by definition, involves one or more activation-points, i.e., locations where the injected code is activated (executed). For example, a code-injection vulnerability may be due to an eval statement in the code, where the input was controlled by the attacker; that statement is the activation point. In order to automatically find security vulnerabilities in Javascript code fragments, we develop a tool to find potential activation-points, i.e., lines of code which involve activation (execution) based on any variables. 

## Basic approach: finding activation points via syntax analysis

The approach considered here is fairly straightforward. First, it is necessary to parse `Javascript` source files and obtain an Abstract Syntax Tree. Subsequent traversals can find the site of interest and filter out those that do not meet specific criteria. Consider, for instance, the task of locating potentially vulnerable uses of code injection capabilities. The following tiny example shows a web form and the associated server-side Javascript code.

```html
<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <title>Age Calculator</title>
  </head>
  <body>
    <h1>Current Age Calculator</h1>
    <p>Input your birth year:</p>
    <form action="" method="post">
      Birth Year: <input type="text" name="birthYear" value="1960">
      <input type="submit" value="Submit">
    </form>
  </body>
</html>
```
The form obtains a textual input (birthYear) that is posted to the application when submitted. The field is pure text (with a default value) that a user can change to whatever he or she wants. The Javascript code below is a simple application listening on port 8080.

In [None]:
const express = require('express');

const app = express(); //Creates app
app.use(express.urlencoded({extended : true}));

app.get('/', (req, res) => { //Sends form to client
  res.sendFile('index.html', {root: __dirname})
})

app.post('/', (req, res) => { //Handles form submissions
  let birthYear = eval(req.body.birthYear)

  res.end("You are " + (2019 - birthYear) + " years old.")
});

app.listen(process.env.port || 8080); //Starts app

\noindent The responder to the post is hooked via a call to `app.post` and it executes a lambda that uses `eval` to obtain the `birthYear` field. This of course is vulnerable as nothing prevents the end-user to populate that `req.body.birthYear` field with a malicious piece of code. 

To determine whether the code is potentially vulnerable, the tool needs to:

1. Parse this code
2. Locate the activation-points: a call to `eval` (or `Function`)
3. Extract the argument(s) passed to `eval` (or `Function`)
4. Determine whether the argument(s) passed in are dangerous (variables). 
5. Return the set of sites deemed dangerous to trace their paths (phase 2).

The code to do this task is shown below, which locates and returns the potentially vulnerable site. 


In [None]:
const acorn = require('acorn');               //ast parser / builder
const walk = require('estree-walker').walk;      //tree walker function
var fs = require('fs');
var path = require('path');

var buildLinkedASTFromFile = function(filePath) { // builds and returns an AST
   let source = fs.readFileSync(filePath).toString('utf-8');
   let ast = acorn.parse(source, {ecmaVersion: 6, locations: true,});
   return ast;
};

const packageData = function(rec, argz, tree = null, prev = null, typeSearch = null, directory = null){
   return {
      receiver : rec,
      args     : argz,
      scope    : tree,
      previous : prev,
      type     : typeSearch,
      path     : directory
   } //packs data to be returned
};

// helper functions omitted

const evalVulnerabilityFinder = function(tree, dir) {
   let result = [];
   let blank = [packageData({name: ''}, null, tree, null, 'm', dir)];
   var eCalls = filterFunction('eval',blank).concat(filterFunction('Function',blank));
   var dangerousCalls = verifyArgs(eCalls);
   for (var i = 0; i < dangerousCalls.length; i++) {
      result.push(dangerousCalls[i].receiver.loc);
   }
   return result;
};

const findVulns = function(tree, directory){
    var foundVulns = new Object;
    foundVulns.sql = mySQLVulnerabilityFinder(tree, directory);
    foundVulns.os = execVulnerabilityFinder(tree, directory);
    foundVulns.js = evalVulnerabilityFinder(tree, directory);
    return foundVulns;
};

const dirScan = function(curpath, curfiles){
   var data = { dir   : curpath , files : [] };
   for (var i = 0; i < curfiles.length; i++) {
      var file = { name  : null, vulns : null };
      if (path.extname(curfiles[i]) === '.js'){
         file.name = curfiles[i];
         const ast = buildLinkedASTFromFile(path.resolve(data.dir, file.name));
         file.vulns = findVulns(ast, data.dir);
         data.files.push(file);
      }else if (path.extname(curfiles[i]) === ''){
         try {
            const newpath = path.resolve(data.dir, curfiles[i]);
            const newfiles = fs.readdirSync(newpath);
            var result = dirScan(newpath, newfiles);
            data.files.push(result);
         } catch (e) {}
      }
   }
   return data;
};

A call to `dirScan` is meant to process all the Javascript files in a specific folder. It loops over each file, parses it and invokes the `findVulns` function to look for one of the three classes of vulnerabilities. `findVulns` is indeed quite generic and simply applies one function per vulnerability class. 

The `evalVulnerabilityFinder` function focuses on code injections. It simply walks the AST twice, once looking for a call to `eval` and once looking for a call to `Function`. Those searches are done with `filterFunction`. Once all these interesting call sites are collected, the `verifyArgs` function examines each one to eliminate those that are uninteresting because the arguments are constant values or literals. The non-filtered sites are returned in the end. 

The implementation of `filterFunction` is shown below:

In [None]:
const filterFunction = function(method, arr){//filters the list to find if the method is called
   res = [];
   for (var k = 0; k < arr.length; k++) {
      const found = methodFinder(arr[k].scope, arr[k].type,method,
                                 arr[k].receiver.name,null,arr[k].path);
      for (var i = 0; i < found.length; i++) {
         if (found.length === 0) {
         }else if(found[i].receiver.type === 'Identifier'){
            res.push(packageData(found[i].receiver, found[i].args, 
                                 found[i].scope, arr[k], 'v', arr[k].path));
         }else if (found[i].receiver.type === 'CallExpression') {
            res.push(packageData(found[i].receiver.callee, found[i].args, 
                                 found[i].scope, arr[k], 'm', arr[k].path));
         }else if (found[i].receiver.type === 'MemberExpression') {
            res.push(packageData({name: [found[i].receiver.object.name, 
                                         found[i].receiver.property.name], 
                                  loc: found[i].receiver.loc}, 
                                 found[i].args, found[i].scope, arr[k], 
                                 'r', arr[k].path));
         }
      }
   }
   return res;
};

The function takes as input the name of the callee it should find. It also takes an array of _contexts_ that indicate where to search for this construction (e.g., in a specific scope). The implementation therefore simply repeats the same search in each context starting with a call to `methodFinder` to locate such a call. If _found_, it then analyzes the AST to package the context in which the construction was found. The case analysis is to reflect the fact that it could mean a function call, a method invocation on a specific receiver or the call of a member expression.  The `methodFinder` is the lengthier function. An abridged skeleton is shown below:

In [None]:
const methodFinder = function(tree, type, method, name, ii = null, dir) {
   var result = [];
   walk(tree, { //from estree.walk
      enter: function (node, p, prop, index) { 
      //returns current node, parent and properties (if any), plus the index
         switch (type) {
            case 'm': //if the current node is a method
            if (node.type === 'CallExpression'){
               switch (node.callee.type) {
                  case 'Identifier':
                  if(node.callee.name === method) {
                     if (!node.callee.object) {
                        result.push(packageData(node.callee,node.arguments,
                                                node.callee));
                     }else if (node.callee.object.type === 'Identifier') {
                        result.push(packageData(node.callee.object,
                                                node.arguments,tree));
                     }else if (node.callee.object.type === 'CallExpression') {
                        result.push(packageData(node.callee.object,
                                                node.arguments,node.callee));
                     }
                  }
                  break;
                  // other cases...
               }
            }
            break;
            case 'v': // [omitted] if the node is a variable declaration
            case 'f': // [omitted] searches for parameter in a method/function
            case 'r': // [omitted] searches for a 'required' variable
            case 'e': // [omitted] expression
            case 'a': // [omitted] assignment           
         }
      }});
   return result;
};

## Enhancing the invocation-point finder tool with additional patterns

 We discovered that the core templates used initially do not necessarily cover all the constructions that allow injections in Javascript. In particular, the initial implementation proved unable to locate vulnerabilities in versions of libraries known to have a specific issue. 
 
We developed the following methodology to find additional - hopefully, all - patterns allowing injections in Javascript. Namely, we will identify a collection of OSCs with known (injection) vulnerabilities; apply the invocation-point finding tool to each of them; and whenever it fails, we will analyze to identify the additional javascript construct allowing injection and not found by the tool - and extend the tool to find this additional pattern of invocation-point in addition to its current set. We repeat the process until we find at least one injection point in each vulnerable OSC. This methodology is described in more details below. 

### Finding vulnerable Open Source Components

The first task is to wade through the collection of open-source components and use a vulnerability database such as: [https://snyk.io](https://snyk.io). to find example of libraries that contain a known injection vulnerability. This process is applied continuously to build a list of __challenges__ for the template finder. For each such challenge, the finder is applied. If it finds corresponding injection points, the sample is uninteresting (appears not to contain new injection-point patterns). However, if an injection-point is not identified, it is necessary to analyze the nature of the vulnerability and understand what Javascript construction the developer adopted that allowed injection (and was not found), and to add the necessary detection logic. 

As we are unaware of an automatic tool for extracting real vulnerabilities from public databases, we have manually searched for such vulnerabilites. We use them to challege our template finder and enhance it everytime it fails the challenge. The remainder of the section shows such vulnerable packages. For each case we provide its package name, vulnerable version, candidate spot, and the path that leads to that spot.

- __Package name:__ dns-sync
- __Vulnerable version:__ 0.1.1
- __Category:__ command injection
- __Description:__ lack of input validation allows an attacker to submit input into __resolve()__ method.
- __Vulnerable spot:__ dns-sync-0.1.1/package/lib/dns-sync.js:33
- __Example:__

![DNSSync](images/dnsSync.jpg)

---

- __Package name:__ apex-publish-static-files
- __Vulnerable version:__ 2.0.0
- __Category:__ command injection
- __Description:__ The __connectionString__ argument is not sanitized when passed to __execSync()__
- __Vulnerable spot:__ apex-publish-static-files-2.0.0/package/index.js:54
- __Example:__

![APex](images/apex.jpg)

---

- __Package name:__ kill-port
- __Vulnerable version:__ 1.3.1
- __Category:__ OScommand injection
- __Description:__ An attacker is able to inject arbitrary OS commands due to the usage of __exec__ function, if the attacker is able to control the port.
- __Vulnerable spot:__ kill-port-1.3.1/package/index.js:16
- __Example:__

![KillPort](images/killPort.jpg)

---

- __Package name:__ syntax-error
- __Vulnerable version:__ 1.1.0
- __Category:__ code injection (eval)
- __Description:__ Allows remote attackers to execute arbitrary code via a crafted file.
- __Vulnerable spot:__ syntax-error-1.1.0/package/index.js:7
- __Example:__

![Syntax Error](images/syntax.jpg)

---

- __Package name:__ opencv
- __Vulnerable version:__ 6.1.0
- __Category:__ command injection 
- __Description:__ User input is not validated for the `flag` parameter, which would allow an attacker to inject and execute arbitrary commands.
- __Vulnerable spot:__ opencv-6.1.0/package/utils/find-opencv.js:15
- __Example:__ This is a __false positive__ case as the __flag__ variable is not controlled by the user, it is a constant.

![OpenCV](images/opencv.jpg)

---

### Manually finding the vulnerability

The purpose of this task is to create an exploit for the vulnerable package and trace its execution to uncover how the exploit takes advantage of Javascript. The objective is to identify the dangerous construction (invocation point) and how to reach it. It is also to understand the scope and severity of that construction beyond its occurrence in this example. 

The output of this task is a clear understanding of the Javascript construction, its flaws, its limitations and what would have been a better option. Javascript is a complex language with dynamic code execution, and full instrospection providing multiple ways to achieve some subtle effects. 

### Extracting a new pattern variant

Once understood, a new pattern that recognizes the dangerous construction is formulated in as general a form as possible. It scans the specific library call in a specific package or an introspective behavior. In all cases, it is expressed via a syntactic construction. A survey of the Javascript documentation can reveal variants that should be considered alongside the primary one that was detected.

### Enhancing **Invocation-point (Pattern) Finder**

The ultimate goal is to refine the injection-point (pattern) finder and add the ability to recognize (for the vulnerability under consideration) the offending construction. The extension may be quite straightforward when it boils down to using a variant of a library already known to be problematic. It can be more involved when using a new linguistic mechanism in JS. 

### Example `js-yaml`

To illustrate, consider the `js-yaml` package. It's versions, prior 3.13.1, were reported vulnerable to a [code injection attack](https://snyk.io/vuln/SNYK-JS-JSYAML-174129). The issue was reported on April 5,2019 and published April 7, 2019. Version 3.13.0 is the last version with the vulnerability and was therefore downloaded with

```bash
npm install js-yaml@3.13.0
```

Briefly, this library is meant to load a `YAML` file as a JS datastructure. `YAML` is a key-value store and values can have multiple types (scalars, arrays, text, ...). One of the capability of the library is to associate in a `YAML` file a key to a JS closure. Once loaded in this way, any invocation of the key is a call to the closure. Consider the following payload 

```yaml
toString: !!js/function 'function (){return Date.now()}'
```
stored in a file `payload.yml`. It defines one key-value pair (the key is `toString`) in which the value is a Javascript lambda that does something innocuous in this case (returns the current date/time). When loaded with the following fragment:

In [None]:
yaml = require('js-yaml');
fs   = require('fs');

try {
  var doc = yaml.load(fs.readFileSync('payload.yml', 'utf8'));
    console.log(doc);
    console.log(doc.toString());
} catch (e) {
  console.log(e);
}

It prints out the key-value pair and invokes the lambda. Yet, neither the library function `yaml.load` nor any of its callees use the Javascript `eval` construction. It is therefore necessary to investigate and deterime what construction is used to manufacture a callable entity. To this end, one can run the vulnerable code above in a debugger and trace the execution of the parser until it reaches an interesting location. An inspection of the source of `js-yaml` also reveals the following:

```
ldm@meriadoc > ~/node_modules/js-yaml > tree
.
├── CHANGELOG.md
├── LICENSE
├── README.md
├── bin
│   └── js-yaml.js
├── dist
│   ├── js-yaml.js
│   └── js-yaml.min.js
├── index.js
├── lib
│   ├── js-yaml
│   │   ├── common.js
│   │   ├── dumper.js
│   │   ├── exception.js
│   │   ├── loader.js
│   │   ├── mark.js
│   │   ├── schema
│   │   │   ├── core.js
│   │   │   ├── default_full.js
│   │   │   ├── default_safe.js
│   │   │   ├── failsafe.js
│   │   │   └── json.js
│   │   ├── schema.js
│   │   ├── type
│   │   │   ├── binary.js
│   │   │   ├── bool.js
│   │   │   ├── float.js
│   │   │   ├── int.js
│   │   │   ├── js
│   │   │   │   ├── function.js
│   │   │   │   ├── regexp.js
│   │   │   │   └── undefined.js
│   │   │   ├── map.js
│   │   │   ├── merge.js
│   │   │   ├── null.js
│   │   │   ├── omap.js
│   │   │   ├── pairs.js
│   │   │   ├── seq.js
│   │   │   ├── set.js
│   │   │   ├── str.js
│   │   │   └── timestamp.js
│   │   └── type.js
│   └── js-yaml.js
└── package.json
```
Namely, there is a sub-directory `type` containing multiple source files named after the types that values can take in a `yaml` file. Interestingly, there is even a `js/function.js` file that would point to an ability to reach a function. Upon close inspection, this file contains the following function (comments removed for brevity's sake):

In [None]:
function constructJavascriptFunction(data) {
  var source = '(' + data + ')',
      ast    = esprima.parse(source, { range: true }),
      params = [],
      body;

  if (ast.type                    !== 'Program'             ||
      ast.body.length             !== 1                     ||
      ast.body[0].type            !== 'ExpressionStatement' ||
      (ast.body[0].expression.type !== 'ArrowFunctionExpression' &&
        ast.body[0].expression.type !== 'FunctionExpression')) {
    throw new Error('Failed to resolve function');
  }
  ast.body[0].expression.params.forEach(function (param) {
    params.push(param.name);
  });
  body = ast.body[0].expression.body.range;
  if (ast.body[0].expression.body.type === 'BlockStatement') {
    return new Function(params, source.slice(body[0] + 1, body[1] - 1));
  }
  return new Function(params, 'return ' + source.slice(body[0], body[1]));
}

A close look directly shows that this function parses a string input (data) to check that the payload is indeed a closure and then relies on the builtin type `Function` to create an executable object. It is therefore straighforward to extend the pattern finder by not only looking for `eval` but also looking looking for the `Function` constructor or its brethren `ASyncFunction` which has the same capabilities. A quick use of a Javascript debugger with a breakpoint in this function does indeed confirm the hypothesis. 

Extending the pattern finder is straighforward in this case as it suffices to look for two more risky constructions within the AST. _The purpose of the next 3 weeks is to repeat this process to identify as many constructs as this as possible to strenghten the pattern finder._

# Current Status and Time line

## Completed

1. Search for tools. We looked into a variety of tools and resources to obtain information about open source components and their evolution (particularly when vulnerabilities are found in specific versions). We looked at multiple tools for syntacx analysis as well as control flow graphs constructions. 
2. We also considered commercial tools, but the focus was primarily on open-source resources to make this work as open and reusable as possible.
3. For a specific task we sometimes had to consider multiple tools to find reasonable compromises between actual (vs. advertised) functionality and ease of use. Some time was lost on tools that were ultimately pushed aside. 
4. We wrote a "template finder" code base that uses parsing libraries to construct Abstract Syntax Trees and traverse them. This is at the core of the search for vulnerable patterns (activation points). 

## Ongoing 

From July 12 to approximatley August 7, the objective is to improve the tool for finding activation points (so called the template finder). Our objective is twofold. First, we wish to build the infrastructure of a tool capable to detect (precisely) the 3 types of vulnerabilities discussed earlier. Second, we wish to apply an _intermediate_ version of the tool we are building on the 50 modules identified by Comcast. This has the dual purpose to guide the selection of libraries to analyze and focus on those most likely to contain an actual vulnerability, but also to validate the Phase 1 of the analyzer.

## Deliverables by August 16

A working **activation point (template) Finder** focused on the three classes of vulnerabilities described earlier. We intend to demo its capabilities as well as document the patterns that it covers. 

## Deliverables by August 31

A **manual analysis** of the selected libraries. This also has a dual purpose:
1. Provide the desired analysis to Comcast
2. Establish a ground truth for these packages by producing an attack payload and articulating how it takes advantage of Javascript, and helping us to develop the tools to automate the evaluation of vulnerability of OSCs. 

The end of the summer will also see the start of the Phase 2 effort. The objective there is to leverage the `npm` libraries we identified to date to build and navigate both the control flow graph and the call graph to generate the set of potential paths. 

Interestingly the tool, in its current early state, has identified _five_ activation points for code injection spread across 3 files. It will be interesting to start the investigation with those packages in the weeks to come. 

# Fall'19 Horizon

September will focus on the CFG and the CG and complete the implementation of those analyses. The rest of the Fall will focus on tools such as `Aratha` and their use on the paths produced by the combined Phases 1-2. Alternatives beyond `Aratha` will be considered such as test generation and model checking. With a 3 month horizon, it is likely that only one of them will be pursued to completion. 

# Timeline

To get a sense of the schedule, consider the following timeline

![Timeline](images/schedule.png)

It outlines the effort between now and the end of september. The 3 "Phases" are focused on

- improving the search tool (1.1) (i.e., the syntax analysis to find invocation points)
- developing the attack surface tool (1.2) (i.e., the flow analysis)
- developing the vulnerability finding tool (1.3) (i.e., the semantics analysis)

Task (2) corresponds to the automated analysis of the 50 modules selected by Comcast to identify 3-5 OSCs of interest. Task 3 represents the manual analysis of 3 selected OSCs. Note how the phase 2 and 3 efforts are scheduled to start no earlier than late August and extends well into September. 

# Following Up on Leads

## Prototype

### evalScripts()

This function is a true positive. Found in lib/String.js, this function can execute JavaScript. The script that came from extractScripts was not sanitized before being passed to eval.

```javascript
function evalScripts() {
    return this.extractScripts().map(function(script) { return eval(script) });
  }
```

#### Payload

### evalJSON()

This function is a true positive. Found in lib/String.js, JavaScript can be passed to this function. 

```javascript
function evalJSON(sanitize) {
    var json = this.unfilterJSON(),
        cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g;
    if (cx.test(json)) {
      json = json.replace(cx, function (a) {
        return '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
      });
    }
    try {
      if (!sanitize || json.isJSON()) return eval('(' + json + ')');
    } catch (e) { }
    throw new SyntaxError('Badly formed JSON string: ' + this.inspect());
  }
```

#### Payload

### Obj[].apply()

This is a lead that will need further manual review. It has the potential to lead to an eval call.

```javascript
function invoke(method) {
    var args = $A(arguments).slice(1);
    return this.map(function(value) {
      return value[method].apply(value, args);
    });
  }
```

#### Payload

### Obj[].apply()

This is a lead that will need further manual review. It has the potential to lead to an eval call.

```javascript
function addMethods(source) {
    var ancestor   = this.superclass && this.superclass.prototype,
        properties = Object.keys(source);

    ...
    for (var i = 0, length = properties.length; i < length; i++) {
      var property = properties[i], value = source[property];
      if (ancestor && Object.isFunction(value) &&
          value.argumentNames()[0] == "$super") {
        var method = value;
        value = (function(m) {
          return function() { return ancestor[m].apply(this, arguments); };
        })(property).wrap(method);

        value.valueOf = method.valueOf.bind(method);
        value.toString = method.toString.bind(method);
      }
      this.prototype[property] = value;
    }

    return this;
  }
```

#### Payload

## Depot

### assert.operator()

This function is a true positive. Embedded in the package is an outdated version of Chai that can run JavaScript from un-sanitized parameter val. This is only possible if assert.operator is called.

```javascript
assert.operator = function (val, operator, val2, msg) {
        if (!~['==', '===', '>', '>=', '<', '<=', '!=', '!=='].indexOf(operator)) {
          throw new Error('Invalid operator "' + operator + '"');
        }
        var test = new Assertion(eval(val + operator + val2), msg);
        test.assert(
            true === flag(test, 'object')
          , 'expected ' + util.inspect(val) + ' to be ' + operator + ' ' + util.inspect(val2)
          , 'expected ' + util.inspect(val) + ' to not be ' + operator + ' ' + util.inspect(val2) );
      };
```

#### Payload

### Obj[].apply()

This is a lead that will need further manual review. It has the potential to lead to an eval call.

```javascript
if ('function' == typeof handler) {
    handler.apply(this, args);
  } else if (isArray(handler)) {
    var listeners = handler.slice();
for (var i = 0, l = listeners.length; i < l; i++) {
  listeners[i].apply(this, args);
}
  } else {
    return false;
  }
```

#### Payload

## Jade 

### toConstant()

This is a true positive. More manual review of a payload is required.

```javascript
function toConstant(src, constants) {
  if (!isConstant(src, constants)) throw new Error(JSON.stringify(src) + ' is not constant.');
  return Function(Object.keys(constants || {}).join(','), 'return (' + src + ')').apply(null, Object.keys(constants || {}).map(function (key) {
    return constants[key];
  }));
}
```

#### Payload

### parseObj()

This is a true positive. This only works at the CLI. Expects a file name as an argument, but can be passed any JavaScript because it doesn't sanitize before being passed to eval.

```javascript
program.parse(process.argv);

if (program.obj) {
  options = parseObj(program.obj);
}

function parseObj (input) {
  var str, out;
  try {
    str = fs.readFileSync(program.obj);
  } catch (e) {
    return eval('(' + program.obj + ')');
  }
  // We don't want to catch exceptions thrown in JSON.parse() so have to
  // use this two-step approach.
  return JSON.parse(str);
}
```

#### Payload

`$ jade -O "console.log('Hello World')"`

### isExpression()

This is a false positive. It can never run any unexpected Javascript because it can either fail the eval function call, or if it passed then the STOP will be thrown before executing the src.

```javascript
function isExpression(src) {
  try {
    eval('throw "STOP"; (function () { return (' + src + '); })()');
    return false;
  }
  catch (err) {
    return err === 'STOP';
  }
}
```

#### Payload

## Backbone

### Obj[]()

This function is a false positive. The property of the object will always evaluate to either 'replaceState' or 'pushState'.

```javascript
if (this._usePushState) {
        this.history[options.replace ? 'replaceState' : 'pushState']({}, document.title, url);
```

#### Payload

### Obj[]()

This function is a false positive. The property of the object will always evaluate to either 'reset' or 'set'.

```javascript
fetch: function(options) {
      options = _.extend({parse: true}, options);
      var success = options.success;
      var collection = this;
      options.success = function(resp) {
        var method = options.reset ? 'reset' : 'set';
        collection[method](resp, options);
        if (success) success.call(options.context, collection, resp, options);
        collection.trigger('sync', collection, resp, options);
      };
      wrapError(this, options);
      return this.sync('read', this, options);
    },
```

#### Payload

### Obj[]()

This is a lead that will need further manual review. It has the potential to lead to an eval call.

```javascript
var addMethod = function(base, length, method, attribute) {
    switch (length) {
      case 1: return function() {
        return base[method](this[attribute]);
      };
      case 2: return function(value) {
        return base[method](this[attribute], value);
      };
      case 3: return function(iteratee, context) {
        return base[method](this[attribute], cb(iteratee, this), context);
      };
      case 4: return function(iteratee, defaultVal, context) {
        return base[method](this[attribute], cb(iteratee, this), defaultVal, context);
      };
      default: return function() {
        var args = slice.call(arguments);
        args.unshift(this[attribute]);
        return base[method].apply(base, args);
      };
    }
  };
```

#### Payload

## Grep vs Template finder

Grep has found 384 spots of interest with the regex:

`.+\[.+\]\s*(\(.*\)|\.call|\.apply)`

The tool has found 195 spots on the same packages.

Both searched the files without comments to narrow down the search. The grep search may also contain examples where there are no arguments and thus a false positive. The tool rules these finds out, which may contribute to the lower number.