Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions docs/language/learn-ql/python/control-flow-graph.rst

This file was deleted.

49 changes: 32 additions & 17 deletions docs/language/learn-ql/python/control-flow.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
Tutorial: Control flow analysis
===============================
Analyzing control flow in Python
================================

To analyze the `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood.
You can write CodeQL queries to explore the control-flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code.

About analyzing control flow
--------------------------------------

To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ on Wikipedia.

The ``ControlFlowNode`` class
-----------------------------
Expand All @@ -19,11 +24,18 @@ To show why this complex relation is required consider the following Python code
finally:
close_resource()

There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. (An annotated flow graph can be seen :doc:`here <control-flow-graph>`.)
There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``.

An annotated flow graph:

|Python control flow graph|

.. |Python control flow graph| image:: ../../images/python-flow-graph.png

The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find unreachable code. There is one ``ControlFlowNode`` per path through any ``AstNode`` and any ``AstNode`` that is unreachable has no paths flowing through it. Therefore, any ``AstNode`` without a corresponding ``ControlFlowNode`` is unreachable.

**Unreachable AST nodes**
Example finding unreachable AST nodes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future work: perhaps these two examples could be combined into a single procedural section.

Adding Example to the heading text (here and in other places below) doesn't feel quite right to me, but it's probably fine for this first round of changes (and I can't think of a better suggestion).

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: ql

Expand All @@ -33,9 +45,10 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u
where not exists(node.getAFlowNode())
select node

➤ `See this in the query console <https://lgtm.com/query/669220024/>`__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements:
➤ `See this in the query console <https://lgtm.com/query/669220024/>`__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements.

**Unreachable statements**
Example finding unreachable statements
Comment thread
felicitymay marked this conversation as resolved.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: ql

Expand All @@ -45,15 +58,15 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u
where not exists(s.getAFlowNode())
select s

➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: `Unreachable code <https://lgtm.com/rules/3980095>`__.
➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard "Unreachable code" query. For more information, see `Unreachable code <https://lgtm.com/rules/3980095>`__ on LGTM.com.

The ``BasicBlock`` class
------------------------

The ``BasicBlock`` class represents a `basic block <http://en.wikipedia.org/wiki/Basic_block>`__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory.
The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `Basic block <http://en.wikipedia.org/wiki/Basic_block>`__ and `Dominator <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ on Wikipedia.

Example: Finding mutually exclusive basic blocks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Example finding mutually exclusive basic blocks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Suppose we have the following Python code:

Expand Down Expand Up @@ -84,7 +97,8 @@ However, by that definition, two basic blocks are mutually exclusive if they are

Combining these conditions we get:

**Mutually exclusive blocks within the same function**
Example finding mutually exclusive blocks within the same function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: ql

Expand All @@ -98,10 +112,11 @@ Combining these conditions we get:
)
select b1, b2

➤ `See this in the query console <https://lgtm.com/query/671000028/>`__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis which is covered in the next tutorial.
➤ `See this in the query console <https://lgtm.com/query/671000028/>`__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see :doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`.

Further reading
---------------

What next?
----------
- ":doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`"

- Experiment with the worked examples in the tutorial topic :doc:`Taint tracking and data flow analysis in Python <taint-tracking>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
.. include:: ../../reusables/python-other-resources.rst
20 changes: 13 additions & 7 deletions docs/language/learn-ql/python/functions.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Tutorial: Functions
Functions in Python
===================

This example uses the standard CodeQL class ``Function`` (see :doc:`Introducing the Python libraries <introduce-libraries-python>`).
You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them.

These examples use the standard CodeQL class `Function <https://help.semmle.com/qldoc/python/semmle/python/Function.qll/type.Function$Function.html>`__. For more information, see ":doc:`Introducing the Python libraries <introduce-libraries-python>`."

Finding all functions called "get..."
-------------------------------------
Expand Down Expand Up @@ -55,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a
and count(f.getAStmt()) = 1
select f, "This function is (probably) a getter."

➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in :doc:`Tutorial: Statements and expressions <statements-expressions>`.
➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python <statements-expressions>`."

Finding a call to a specific function
-------------------------------------
Expand All @@ -76,8 +78,12 @@ The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate
Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not.
In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called.

What next?
----------
Further reading
---------------

- ":doc:`Expressions and statements in Python <statements-expressions>`"
- ":doc:`Pointer analysis and type inference in Python <pointsto-type-infer>`"
- ":doc:`Analyzing control flow in Python <control-flow>`"
- ":doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`"

- Experiment with the worked examples in the following tutorial topics: :doc:`Statements and expressions <statements-expressions>`, :doc:`Control flow <control-flow>`, and :doc:`Points-to analysis and type inference <pointsto-type-infer>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
.. include:: ../../reusables/python-other-resources.rst
Loading