Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 23 additions & 38 deletions change-notes/1.19/analysis-python.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# Improvements to Python analysis


## General improvements

> Changes that affect alerts in many files or from many queries
> For example, changes to file classification

### Representation of the control flow graph

The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python.
The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python. As part of these changes, a new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements.

#### CFG nodes removed

The following statement types no longer have a CFG node for the statement itself, as their sub-expressions already contain all the
semantically significant information:
Expand All @@ -20,42 +18,31 @@ semantically significant information:

For example, the CFG for `if cond: foo else bar` now starts with the CFG node for `cond`.

For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to better reflect the semantics:
#### CFG nodes reordered

For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to follow Python semantics:

* `Print`
* `TemplateWrite`
* `ImportStar`

For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, better reflecting the runtime behavior.

For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, to reflect the runtime behavior.

The CFG for the `with` statement has been re-ordered to more closely reflect the semantics.
For the `with` statement:
```python
with cm as var:
body
```
The order of the CFG changes from:

<with>
cm
var
body

to:

cm
<with>
var
body

A new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements.

* Previous CFG node order: `<with>` -> `cm` -> `var` -> `body`
* New CFG node order: `cm` -> `<with>` -> `var` -> `body`

## New queries

| **Query** | **Tags** | **Purpose** |
|-----------------------------|-----------|--------------------------------------------------------------------|
| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Results are hidden on LGTM by default. |
| Flask app is run in debug mode (`py/flask-debug`) | security, external/cwe/cwe-215, external/cwe/cwe-489 | Finds instances where a Flask application is run in debug mode. Results are shown on LGTM by default. |
| Information exposure through an exception (`py/stack-trace-exposure`) | security, external/cwe/cwe-209, external/cwe/cwe-497 | Finds instances where information about an exception may be leaked to an external user. Results are shown on LGTM by default. |
| Jinja2 templating with autoescape=False (`py/jinja2/autoescape-false`) | security, external/cwe/cwe-079 | Finds instantiations of `jinja2.Environment` with `autoescape=False` which may allow XSS attacks. Results are hidden on LGTM by default. |
Expand All @@ -65,35 +52,33 @@ A new predicate `Stmt.getAnEntryNode()` has been added to make it easier to writ
## Changes to existing queries

All taint-tracking queries now support visualization of paths in QL for Eclipse.
Most security alerts are now visible on LGTM by default.
Most security alerts are now visible on LGTM by default. This means that you may see results that were previously hidden for the following queries:

* Code injection (`py/code-injection`)
* Reflected server-side cross-site scripting (`py/reflective-xss`)
* SQL query built from user-controlled sources (`py/sql-injection`)
* Uncontrolled data used in path expression (`py/path-injection`)
* Uncontrolled command line (`py/command-line-injection`)

| **Query** | **Expected impact** | **Change** |
|----------------------------|------------------------|------------------------------------------------------------------|
| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Not shown by default. |
| Code injection (`py/code-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results |
| Command injection (`py/command-line-injection`) | Additional sinks in the `os`, and `popen` modules | Possibility of new results |
| Deserializing untrusted input (`py/unsafe-deserialization`) | Supports path visualization | No change to expected results |
| Encoding error (`py/encoding-error`) | Better alert location | Alert is now shown at the position of the first offending character, rather than at the top of the file. |
| Command injection (`py/command-line-injection`) | More results | Additional sinks in the `os`, and `popen` modules may find more results in some projects. |
| Encoding error (`py/encoding-error`) | Better alert location | Alerts are now shown at the start of the encoding error, rather than at the top of the file. |
| Missing call to \_\_init\_\_ during object initialization (`py/missing-call-to-init`) | Fewer false positive results | Results where it is likely that the full call chain has not been analyzed are no longer reported. |
| Reflected server-side cross-site scripting (`py/reflective-xss`) | Supports path visualization and is now visible on LGTM by default | No change to expected results |
| SQL query built from user-controlled sources (`py/sql-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results |
| Uncontrolled data used in path expression (`py/path-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results |
| Uncontrolled command line (`py/command-line-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results |
| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results and now supports path visualization | Taint is no longer tracked from the right hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. |
| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results | Taint is no longer tracked from the right-hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. |


## Changes to code extraction

* Improved scalability: Scaling is near linear to at least 20 CPU cores.
* Five levels of logging can be selected: `ERROR`, `WARN`, `INFO`, `DEBUG` and `TRACE`. `WARN` is the stand-alone default, but `INFO` will be used when run by LGTM.
* Five levels of logging can be selected: `ERROR`, `WARN`, `INFO`, `DEBUG` and `TRACE`. LGTM uses `INFO` level logging. QL tools use `WARN` level logging by default.
* The `-v` flag can be specified multiple times to increase logging level by one per `-v`.
* The `-q` flag has been added and can be specified multiple times to reduce the logging level by one per `-q`.
* Log lines are now in the `[SEVERITY] message` style and never overlap.
* Extractor now outputs the location of the first offending character when an EncodingError is encountered.
* The extractor now outputs the location of the first character that triggers an EncodingError.

## Changes to QL libraries

* Taint tracking analysis now understands HTTP requests in the `twisted` library.

* Taint-tracking analysis now understands HTTP requests in the `twisted` library.
* The analysis now handles `isinstance` and `issubclass` tests involving the basic abstract base classes better. For example, the test `issubclass(list, collections.Sequence)` is now understood to be `True`
* Taint tracking automatically tracks tainted mappings and collections, without you having to add additional taint kinds. This means that custom taints are tracked from `x` to `y` in the following flow: `l = [x]; y =l[0]`.