From 8db3651dfd51d518dbe0b2cbed33a6120db555b0 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Wed, 12 Feb 2020 16:38:23 +0000 Subject: [PATCH 01/14] Add draft introductions and create map topic for Python --- .../learn-ql/python/ql-for-python.rst | 37 ++++++++++--------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/docs/language/learn-ql/python/ql-for-python.rst b/docs/language/learn-ql/python/ql-for-python.rst index 680c0c374b5c..6321f64eefb4 100644 --- a/docs/language/learn-ql/python/ql-for-python.rst +++ b/docs/language/learn-ql/python/ql-for-python.rst @@ -8,30 +8,33 @@ CodeQL for Python introduce-libraries-python functions statements-expressions + pointsto-type-infer control-flow - control-flow-graph taint-tracking - pointsto-type-infer - -The following tutorials and worked examples are designed to help you learn how to write effective and efficient queries for Python projects. You should work through these topics in the order displayed. - -- `Basic Python query `__ describes how to write and run queries using LGTM. -- :doc:`Introducing the CodeQL libraries for Python ` introduces the standard libraries used to write queries for Python code. +Experiment and learn how to write effective and efficient queries for Python projects. -- :doc:`Tutorial: Functions ` demonstrates how to write queries using the standard CodeQL library classes for Python functions. +:doc:`CodeQL libraries for Python ` +--------------------------------------------------------------- +Overview of the standard CodeQL libraries for writing CodeQL queries on Python code. -- :doc:`Tutorial: Statements and expressions ` demonstrates how to write queries using the standard CodeQL library classes for Python statements and expressions. +:doc:`Functions in Python ` +-------------------------------------- +Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. -- :doc:`Tutorial: Control flow ` demonstrates how to write queries using the standard CodeQL library classes for Python control flow. +:doc:`Expressions and statements in Python ` +-------------------------------------------------------------------- +Expressions define a value. Statements represent a command or action. You can explore how they are used in a code base using syntactic classes from the standard CodeQL library. -- :doc:`Tutorial: Points-to analysis and type inference ` demonstrates how to write queries using the standard CodeQL library classes for Python type inference. +:doc:`Pointer analysis and type inference in Python ` +-------------------------------------------------------------------------- +At run time, each Python expression has a value with an associated type. You can learn how an expression behaves at run time using type-inference classes from the standard CodeQL library. -- :doc:`Taint tracking and data flow analysis in Python ` demonstrates how to write queries using the standard taint tracking and data flow libraries for Python. +:doc:`Analyzing control flow in Python ` +------------------------------------------------------ +You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. -Other resources ---------------- +:doc:`Analyzing data flow and tracking tainted data in Python ` +------------------------------------------------------------------------------- +You can use CodeQL to track the flow of data through a Python program to its use. Tracking user-controlled, or tainted, data is a key technique for security researchers. -- For examples of how to query common Python elements, see the `Python cookbook `__. -- For the queries used in LGTM, display a `Python query `__ and click **Open in query console** to see the code used to find alerts. -- For more information about the library for Python see the `CodeQL library for Python `__. From 39ba3dedc168e0012388223b3d39e618b4cb41e3 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Wed, 12 Feb 2020 17:16:31 +0000 Subject: [PATCH 02/14] Fix build failure by moving control-flow image --- docs/language/learn-ql/python/control-flow-graph.rst | 9 --------- docs/language/learn-ql/python/control-flow.rst | 8 +++++++- 2 files changed, 7 insertions(+), 10 deletions(-) delete mode 100644 docs/language/learn-ql/python/control-flow-graph.rst diff --git a/docs/language/learn-ql/python/control-flow-graph.rst b/docs/language/learn-ql/python/control-flow-graph.rst deleted file mode 100644 index 099c252784b4..000000000000 --- a/docs/language/learn-ql/python/control-flow-graph.rst +++ /dev/null @@ -1,9 +0,0 @@ -Python control flow graph -========================= - -:doc:`Back to tutorial: control flow analysis ` - -|Python control flow graph| - -.. |Python control flow graph| image:: ../../images/python-flow-graph.png - diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index fc41f59c9332..bbabc8f32e26 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -19,7 +19,13 @@ To show why this complex relation is required consider the following Python code finally: close_resource() -There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. (An annotated flow graph can be seen :doc:`here `.) +There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. + +An annotated flow graph: + +|Python control flow graph| + +.. |Python control flow graph| image:: ../../images/python-flow-graph.png The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find unreachable code. There is one ``ControlFlowNode`` per path through any ``AstNode`` and any ``AstNode`` that is unreachable has no paths flowing through it. Therefore, any ``AstNode`` without a corresponding ``ControlFlowNode`` is unreachable. From 38e40622f180615b62d1aee3aa1a26aba5b7c47d Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 18 Feb 2020 12:03:51 +0000 Subject: [PATCH 03/14] Update topic titles and update map topic as discussed with JF and SP --- .../language/learn-ql/python/control-flow.rst | 6 ++-- docs/language/learn-ql/python/functions.rst | 4 ++- .../python/introduce-libraries-python.rst | 6 ++-- .../learn-ql/python/pointsto-type-infer.rst | 6 ++-- .../learn-ql/python/ql-for-python.rst | 30 ++----------------- .../python/statements-expressions.rst | 4 ++- .../learn-ql/python/taint-tracking.rst | 6 ++-- 7 files changed, 24 insertions(+), 38 deletions(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index bbabc8f32e26..a8361e482220 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -1,5 +1,7 @@ -Tutorial: Control flow analysis -=============================== +Analyzing control flow in Python +================================ + +You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. To analyze the `Control-flow graph `__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index c3c8a5e6eacf..a50c61f32419 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -1,6 +1,8 @@ -Tutorial: Functions +Functions in Python =================== +Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. + This example uses the standard CodeQL class ``Function`` (see :doc:`Introducing the Python libraries `). Finding all functions called "get..." diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 54276aedd8e8..bfe2429d4ddd 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -1,7 +1,7 @@ -Introducing the CodeQL libraries for Python -=========================================== +CodeQL library for Python +========================= -There is an extensive library for analyzing CodeQL databases extracted from Python projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: +Overview of the extensive library you use to analyze databases generated from Python code bases. This library uses classes with abstractions and predicates to present the data in an object-oriented form. This abstraction makes it easier for you to write queries. .. code-block:: ql diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index 7ae9368d02cb..b0e68cf4278f 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -1,5 +1,7 @@ -Tutorial: Points-to analysis and type inference -=============================================== +Pointer analysis and type inference in Python +============================================= + +At run time, each Python expression has a value with an associated type. You can learn how an expression behaves at run time using type-inference classes from the standard CodeQL library. This topic contains worked examples of how to write queries using the standard CodeQL library classes for Python type inference. diff --git a/docs/language/learn-ql/python/ql-for-python.rst b/docs/language/learn-ql/python/ql-for-python.rst index 6321f64eefb4..b4f47e8a70cf 100644 --- a/docs/language/learn-ql/python/ql-for-python.rst +++ b/docs/language/learn-ql/python/ql-for-python.rst @@ -1,9 +1,11 @@ CodeQL for Python ================= +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from Python code bases. + .. toctree:: :glob: - :hidden: + :maxdepth: 2 introduce-libraries-python functions @@ -12,29 +14,3 @@ CodeQL for Python control-flow taint-tracking -Experiment and learn how to write effective and efficient queries for Python projects. - -:doc:`CodeQL libraries for Python ` ---------------------------------------------------------------- -Overview of the standard CodeQL libraries for writing CodeQL queries on Python code. - -:doc:`Functions in Python ` --------------------------------------- -Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. - -:doc:`Expressions and statements in Python ` --------------------------------------------------------------------- -Expressions define a value. Statements represent a command or action. You can explore how they are used in a code base using syntactic classes from the standard CodeQL library. - -:doc:`Pointer analysis and type inference in Python ` --------------------------------------------------------------------------- -At run time, each Python expression has a value with an associated type. You can learn how an expression behaves at run time using type-inference classes from the standard CodeQL library. - -:doc:`Analyzing control flow in Python ` ------------------------------------------------------- -You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. - -:doc:`Analyzing data flow and tracking tainted data in Python ` -------------------------------------------------------------------------------- -You can use CodeQL to track the flow of data through a Python program to its use. Tracking user-controlled, or tainted, data is a key technique for security researchers. - diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index d3b4e68af6c9..0d8667cf491e 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -1,6 +1,8 @@ -Tutorial: Statements and expressions +Expressions and statements in Python ==================================== +Expressions define a value. Statements represent a command or action. You can explore how they are used in a code base using syntactic classes from the standard CodeQL library. + Statements ---------- diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index 2ea24369bf40..3982f2d6bb4f 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -1,5 +1,7 @@ -Taint tracking and data flow analysis in Python -=============================================== +Analyzing data flow and tracking tainted data in Python +======================================================= + +You can use CodeQL to track the flow of data through a Python program to its use. Tracking user-controlled, or tainted, data is a key technique for security researchers. Overview -------- From 8ab4cebc9b1c78b5c6eb8c289e1c53150a009b4f Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 18 Feb 2020 12:16:33 +0000 Subject: [PATCH 04/14] Add reusable for other resources and make 'Further reading' section --- docs/language/learn-ql/python/control-flow.rst | 8 ++++---- docs/language/learn-ql/python/functions.rst | 13 ++++++++----- .../learn-ql/python/introduce-libraries-python.rst | 14 +++++++++----- .../learn-ql/python/pointsto-type-infer.rst | 9 +++++---- .../learn-ql/python/statements-expressions.rst | 13 ++++++++----- docs/language/learn-ql/python/taint-tracking.rst | 10 ++++++---- docs/language/reusables/python-other-resources.rst | 3 +++ 7 files changed, 43 insertions(+), 27 deletions(-) create mode 100644 docs/language/reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index a8361e482220..9405e8cacc24 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -108,8 +108,8 @@ Combining these conditions we get: ➤ `See this in the query console `__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis which is covered in the next tutorial. -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the tutorial topic :doc:`Taint tracking and data flow analysis in Python `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index a50c61f32419..6e7b5aa9a4d9 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -78,8 +78,11 @@ The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not. In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called. -What next? ----------- - -- Experiment with the worked examples in the following tutorial topics: :doc:`Statements and expressions `, :doc:`Control flow `, and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +Further reading +--------------- + +- ":doc:`Expressions and statements in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index bfe2429d4ddd..8526db5ce69a 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -329,8 +329,12 @@ Summary These classes are explained in more detail in :doc:`Tutorial: Taint tracking and data flow analysis in Python `. -What next? ----------- - -- Experiment with the worked examples in the following tutorial topics: :doc:`Functions `, :doc:`Statements and expressions `, :doc:`Control flow `, :doc:`Points-to analysis and type inference `, and :doc:`Taint tracking and data flow analysis in Python `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +Further reading +--------------- + +- ":doc:`Functions in Python `" +- ":doc:`Expressions and statements in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index b0e68cf4278f..4397ac7ddceb 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -227,8 +227,9 @@ Then we can use ``Value.getACall()`` to identify calls to the ``eval`` function, ➤ `See this in the query console `__. This accurately identifies calls to the builtin ``eval`` function even when they are referred to using an alternative name. Any false positive results with calls to other ``eval`` functions, reported by the original query, have been eliminated. -What next? ----------- +Further reading +--------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Read a description of the CodeQL database in :doc:`What's in a CodeQL database? <../database>` +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index 0d8667cf491e..ff59aec4e7b7 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -273,8 +273,11 @@ Here is the relevant part of the class hierarchy: - ``Class`` - ``Function`` -What next? ----------- - -- Experiment with the worked examples in the following tutorial topics: :doc:`Control flow ` and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +Further reading +--------------- + +- ":doc:`Functions in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index 3982f2d6bb4f..f759d16debe2 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -253,8 +253,10 @@ which defines the simplest possible taint kind class, ``HardcodedValue``, and cu } } -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the following tutorial topics: :doc:`Control flow ` and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/reusables/python-other-resources.rst b/docs/language/reusables/python-other-resources.rst new file mode 100644 index 000000000000..9668db06d6d2 --- /dev/null +++ b/docs/language/reusables/python-other-resources.rst @@ -0,0 +1,3 @@ +- "`QL language handbook `__" +- `Python cookbook queries `__ in the Semmle wiki +- `Python queries in action `__ on LGTM.com From 8a44f51fc5ed8bf727319347ee833360f24602c0 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 18 Feb 2020 13:18:02 +0000 Subject: [PATCH 05/14] Bring headings more into line with content models --- .../language/learn-ql/python/control-flow.rst | 13 +++-- docs/language/learn-ql/python/functions.rst | 2 +- .../python/introduce-libraries-python.rst | 44 +++++++++------- .../learn-ql/python/pointsto-type-infer.rst | 3 +- .../python/statements-expressions.rst | 52 ++++++------------- .../learn-ql/python/taint-tracking.rst | 23 ++++---- 6 files changed, 66 insertions(+), 71 deletions(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index 9405e8cacc24..ae0328fff44d 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -31,7 +31,8 @@ An annotated flow graph: The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find unreachable code. There is one ``ControlFlowNode`` per path through any ``AstNode`` and any ``AstNode`` that is unreachable has no paths flowing through it. Therefore, any ``AstNode`` without a corresponding ``ControlFlowNode`` is unreachable. -**Unreachable AST nodes** +Example finding unreachable AST nodes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql @@ -43,7 +44,8 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u ➤ `See this in the query console `__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements: -**Unreachable statements** +Example finding unreachable statements +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql @@ -60,8 +62,8 @@ The ``BasicBlock`` class The ``BasicBlock`` class represents a `basic block `__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates `__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. -Example: Finding mutually exclusive basic blocks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding mutually exclusive basic blocks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Suppose we have the following Python code: @@ -92,7 +94,8 @@ However, by that definition, two basic blocks are mutually exclusive if they are Combining these conditions we get: -**Mutually exclusive blocks within the same function** +Example finding mutually exclusive blocks within the same function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index 6e7b5aa9a4d9..f9c436232634 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -3,7 +3,7 @@ Functions in Python Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. -This example uses the standard CodeQL class ``Function`` (see :doc:`Introducing the Python libraries `). +These examples use the standard CodeQL class `Function `__. For more information, see :doc:`Introducing the Python libraries `. Finding all functions called "get..." ------------------------------------- diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 8526db5ce69a..624814705c72 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -3,14 +3,14 @@ CodeQL library for Python Overview of the extensive library you use to analyze databases generated from Python code bases. This library uses classes with abstractions and predicates to present the data in an object-oriented form. This abstraction makes it easier for you to write queries. -.. code-block:: ql +About the CodeQL library for Python +----------------------------------- - import python +The CodeQL library for each programming language is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: -The rest of this tutorial summarizes the contents of the standard libraries for Python. We recommend that you read this and then work through the practical examples in the tutorials shown at the end of the page. +.. code-block:: ql -Overview of the library ------------------------ + import python The CodeQL library for Python incorporates a large number of classes. Each class corresponds either to one kind of entity in Python source code or to an entity that can be derived from the source code using static analysis. These classes can be divided into four categories: @@ -20,16 +20,16 @@ The CodeQL library for Python incorporates a large number of classes. Each class - **Taint tracking** - classes that represent the source, sinks and kinds of taint used to implement taint-tracking queries. Syntactic classes -~~~~~~~~~~~~~~~~~ +----------------- -This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an `Abstract syntax tree `__ (AST). The root of each AST is a ``Module``. +This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. For more information, see `Abstract syntax tree `__. -`Symbolic information `__ is attached to the AST in the form of variables (represented by the class ``Variable``). +Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Symbolic information `__. Scope ^^^^^ -A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes (`Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``. +A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes (`Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``). - ``Scope`` @@ -153,8 +153,8 @@ Both forms are equivalent. Using the positive expression, the whole query looks ➤ `See this in the query console `__. Many projects include pass-only ``except`` blocks. -Summary -^^^^^^^ +Summary of syntactic classes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The most commonly used standard classes in the syntactic part of the library are organized as follows: @@ -237,11 +237,14 @@ Other - ``Comment`` – A comment Control flow classes -~~~~~~~~~~~~~~~~~~~~ +-------------------- This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into `basic blocks `__. -As an example, we might want to find the longest sequence of code without any branches. A ``BasicBlock`` is, by definition, a sequence of code without any branches, so we just need to find the longest ``BasicBlock``. +Example +^^^^^^^ + +If we want to find the longest sequence of code without any branches, we need to consider control flow. A ``BasicBlock`` is, by definition, a sequence of code without any branches, so we just need to find the longest ``BasicBlock``. First of all we introduce a simple predicate ``bb_length()`` which relates ``BasicBlock``\ s to their length. @@ -289,7 +292,12 @@ The classes in the control-flow part of the library are: Type-inference classes ---------------------- -The CodeQL library for Python also supplies some classes for accessing the inferred types of values. The classes ``Value`` and ``ClassValue`` allow you to query the possible classes that an expression may have at runtime. For example, which ``ClassValue``\ s are iterable can be determined using the query: +The CodeQL library for Python also supplies some classes for accessing the inferred types of values. The classes ``Value`` and ``ClassValue`` allow you to query the possible classes that an expression may have at runtime. + +Example +^^^^^^^ + +For example, which ``ClassValue``\ s are iterable can be determined using the query: **Find iterable "ClassValue"s** @@ -304,7 +312,7 @@ The CodeQL library for Python also supplies some classes for accessing the infer ➤ `See this in the query console `__ This query returns a list of classes for the projects analyzed. If you want to include the results for `builtin classes `__, which do not have any Python source code, show the non-source results. Summary -~~~~~~~ +^^^^^^^ - `Value `__ @@ -312,7 +320,7 @@ Summary - ``CallableValue`` - ``ModuleValue`` -These classes are explained in more detail in :doc:`Tutorial: Points-to analysis and type inference `. +For more information about these classes, see :doc:`Pointer analysis and type inference in Python `. Taint-tracking classes ---------------------- @@ -321,12 +329,12 @@ The CodeQL library for Python also supplies classes to specify taint-tracking an Summary -~~~~~~~ +^^^^^^^ - `TaintKind `__ - `Configuration `__ -These classes are explained in more detail in :doc:`Tutorial: Taint tracking and data flow analysis in Python `. +For more information about these classes, see :doc:`Analyzing data flow and tracking tainted data in Python `. Further reading diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index 4397ac7ddceb..e0fbdac403cf 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -3,6 +3,7 @@ Pointer analysis and type inference in Python At run time, each Python expression has a value with an associated type. You can learn how an expression behaves at run time using type-inference classes from the standard CodeQL library. + This topic contains worked examples of how to write queries using the standard CodeQL library classes for Python type inference. The ``Value`` class @@ -11,7 +12,7 @@ The ``Value`` class The ``Value`` class and its subclasses ``FunctionValue``, ``ClassValue``, and ``ModuleValue`` represent the values an expression may hold at runtime. Summary -~~~~~~~ +^^^^^^^ Class hierarchy for ``Value``: diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index ff59aec4e7b7..2a7a57c33b5a 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -39,13 +39,11 @@ Here is the full class hierarchy: - ``While`` – A ``while`` statement - ``With`` – A ``with`` statement -Example: Finding redundant 'global' statements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding redundant 'global' statements +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``global`` statement in Python declares a variable with a global (module-level) scope, when it would otherwise be local. Using the ``global`` statement outside a class or function is redundant as the variable is already global. -**Finding redundant global statements** - .. code-block:: ql import python @@ -58,13 +56,11 @@ The ``global`` statement in Python declares a variable with a global (module-lev The line: ``g.getScope() instanceof Module`` ensures that the ``Scope`` of ``Global g`` is a ``Module``, rather than a class or function. -Example: Finding 'if' statements with redundant branches --------------------------------------------------------- +Example finding 'if' statements with redundant branches +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An ``if`` statement where one branch is composed of just ``pass`` statements could be simplified by negating the condition and dropping the ``else`` clause. -**An 'if' statement that could be simplified** - .. code-block:: python if cond(): @@ -72,9 +68,7 @@ An ``if`` statement where one branch is composed of just ``pass`` statements cou else: do_something -To find statements like this we can run the following query: - -**Find 'if' statements with empty branches** +To find statements like this that could be simplified we can write a query. .. code-block:: ql @@ -133,8 +127,8 @@ Each kind of Python expression has its own class. Here is the full class hierarc - ``Yield`` – A ``yield`` expression - ``YieldFrom`` – A ``yield from`` expression (Python 3.3+) -Example: Finding comparisons to integer or string literals using 'is' -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding comparisons to integer or string literals using 'is' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python implementations commonly cache small integers and single character strings, which means that comparisons such as the following often work correctly, but this is not guaranteed and we might want to check for them. @@ -143,9 +137,7 @@ Python implementations commonly cache small integers and single character string x is 10 x is "A" -We can check for these as follows: - -**Find comparisons to integer or string literals using** ``is`` +We can check for these using a query. .. code-block:: ql @@ -166,15 +158,11 @@ The clause ``cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal`` che We have to use ``cmp.getOp(0)`` and ``cmp.getComparator(0)``\ as there is no ``cmp.getOp()`` or ``cmp.getComparator()``. The reason for this is that a ``Compare`` expression can have multiple operators. For example, the expression ``3 < x < 7`` has two operators and two comparators. You use ``cmp.getComparator(0)`` to get the first comparator (in this example the ``3``) and ``cmp.getComparator(1)`` to get the second comparator (in this example the ``7``). -Example: Duplicates in dictionary literals -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding duplicates in dictionary literals +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If there are duplicate keys in a Python dictionary, then the second key will overwrite the first, which is almost certainly a mistake. We can find these duplicates with CodeQL, but the query is more complex than previous examples and will require us to write a ``predicate`` as a helper. -Here is the query: - -**Find duplicate dictionary keys** - .. code-block:: ql import python @@ -206,12 +194,10 @@ is equivalent to The short version is usually used as this is easier to read. -Example: Finding Java-style getters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Returning to the example from :doc:`Tutorial: Functions `, the query identified all methods with a single line of code and a name starting with ``get``: +Example finding Java-style getters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -**Basic: Find Java-style getters** +Returning to the example from :doc:`Tutorial: Functions `, the query identified all methods with a single line of code and a name starting with ``get``. .. code-block:: ql @@ -222,9 +208,7 @@ Returning to the example from :doc:`Tutorial: Functions `, the query and count(f.getAStmt()) = 1 select f, "This function is (probably) a getter." -This basic query can be improved by checking that the one line of code is of the form ``return self.attr`` - -**Improved: Find Java-style getters** +This basic query can be improved by checking that the one line of code is a Java-style getter of the form ``return self.attr``. .. code-block:: ql @@ -238,21 +222,17 @@ This basic query can be improved by checking that the one line of code is of the ➤ `See this in the query console `__. Of the demo projects on LGTM.com, only the *openstack/nova* project has examples of functions that appear to be Java-style getters. -In this query, the condition: - .. code-block:: ql ret = f.getStmt(0) and ret.getValue() = attr -checks that the first line in the method is a return statement and that the expression returned (``ret.getValue()``) is an ``Attribute`` expression. Note that the equality ``ret.getValue() = attr`` means that ``ret.getValue()`` is restricted to ``Attribute``\ s, since ``attr`` is an ``Attribute``. - -The condition: +This condition checks that the first line in the method is a return statement and that the expression returned (``ret.getValue()``) is an ``Attribute`` expression. Note that the equality ``ret.getValue() = attr`` means that ``ret.getValue()`` is restricted to ``Attribute``\ s, since ``attr`` is an ``Attribute``. .. code-block:: ql attr.getObject() = self and self.getId() = "self" -checks that the value of the attribute (the expression to the left of the dot in ``value.attr``) is an access to a variable called ``"self"``. +This condition checks that the value of the attribute (the expression to the left of the dot in ``value.attr``) is an access to a variable called ``"self"``. Class and function definitions ------------------------------ diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index f759d16debe2..90cb20b55179 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -3,8 +3,8 @@ Analyzing data flow and tracking tainted data in Python You can use CodeQL to track the flow of data through a Python program to its use. Tracking user-controlled, or tainted, data is a key technique for security researchers. -Overview --------- +About data flow and taint tracking +---------------------------------- Taint tracking is used to analyze how potentially insecure, or 'tainted' data flows throughout a program at runtime. You can use taint tracking to find out whether user-controlled input can be used in a malicious way, @@ -16,12 +16,12 @@ For example, in the assignment ``dir = path + "/"``, if ``path`` is tainted then even though there is no data flow from ``path`` to ``path + "/"``. Separate CodeQL libraries have been written to handle 'normal' data flow and taint tracking in :doc:`C/C++ <../cpp/dataflow>`, :doc:`C# <../csharp/dataflow>`, :doc:`Java <../java/dataflow>`, and :doc:`JavaScript <../javascript/dataflow>`. You can access the appropriate classes and predicates that reason about these different modes of data flow by importing the appropriate library in your query. -In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve value and those that don't by defining additional data flow properties. +In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve values and those that don't by defining additional data flow properties. For further information on data flow and taint tracking with CodeQL, see :doc:`Introduction to data flow <../intro-to-data-flow>`. -Fundamentals of taint tracking and data flow analysis -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Fundamentals of taint tracking using data flow analysis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The taint tracking library is in the `TaintTracking `__ module. Any taint tracking or data flow analysis query has three explicit components, one of which is optional, and an implicit component. @@ -41,7 +41,7 @@ The kind of taint determines which non-value-preserving steps are possible, in a In the above example ``dir = path + "/"``, taint flows from ``path`` to ``dir`` if the taint represents a string, but not if the taint is ``None``. Limitations -~~~~~~~~~~~ +^^^^^^^^^^^ Although taint tracking is a powerful technique, it is worth noting that it depends on the underlying data flow graphs. Creating a data flow graph that is both accurate and covers a large enough part of a program is a challenge, @@ -81,6 +81,9 @@ A simple taint tracking query has the basic form: where config.hasFlow(src, sink) select sink, "Alert message, including reference to $@.", src, "string describing the source" +Example +^^^^^^^ + As a contrived example, here is a query that looks for flow from a HTTP request to a function called ``"unsafe"``. The sources are predefined and accessed by importing library ``semmle.python.web.HttpRequest``. The sink is defined by using a custom ``TaintTracking::Sink`` class. @@ -128,8 +131,8 @@ The sink is defined by using a custom ``TaintTracking::Sink`` class. -Implementing path queries -~~~~~~~~~~~~~~~~~~~~~~~~~ +Converting a taint-tracking query to a path query +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Although the taint tracking query above tells which sources flow to which sinks, it doesn't tell us how. For that we need a path query. @@ -204,8 +207,8 @@ Thus, our example query becomes: -Custom taint kinds and flows ----------------------------- +Tracking custom taint kinds and flows +------------------------------------- In the above examples, we have assumed the existence of a suitable ``TaintKind``, but sometimes it is necessary to model the flow of other objects, such as database connections, or ``None``. From 74d93ba70436e27588ab0955d16232084b18bc72 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 18 Feb 2020 16:05:01 +0000 Subject: [PATCH 06/14] Tidy up some references --- docs/language/learn-ql/python/control-flow.rst | 6 +++--- docs/language/learn-ql/python/functions.rst | 2 +- .../learn-ql/python/introduce-libraries-python.rst | 12 +++++------- .../language/learn-ql/python/pointsto-type-infer.rst | 8 +++----- .../learn-ql/python/statements-expressions.rst | 4 ++-- docs/language/learn-ql/python/taint-tracking.rst | 6 +++--- 6 files changed, 17 insertions(+), 21 deletions(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index ae0328fff44d..ec2d602d492e 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -3,7 +3,7 @@ Analyzing control flow in Python You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. -To analyze the `Control-flow graph `__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. +To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph `__ in Wikipedia. The ``ControlFlowNode`` class ----------------------------- @@ -55,12 +55,12 @@ Example finding unreachable statements where not exists(s.getAFlowNode()) select s -➤ `See this in the query console `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: `Unreachable code `__. +➤ `See this in the query console `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: unreachable code. For more information, see `Unreachable code `__ on LGTM.com. The ``BasicBlock`` class ------------------------ -The ``BasicBlock`` class represents a `basic block `__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates `__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. +The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `basic block `__ and `dominates `__ on Wikipedia. Example finding mutually exclusive basic blocks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index f9c436232634..2924a77f910a 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -57,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a and count(f.getAStmt()) = 1 select f, "This function is (probably) a getter." -➤ `See this in the query console `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in :doc:`Tutorial: Statements and expressions `. +➤ `See this in the query console `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Tutorial: Statements and expressions `." Finding a call to a specific function ------------------------------------- diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 624814705c72..e29e25edb4b0 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -22,9 +22,7 @@ The CodeQL library for Python incorporates a large number of classes. Each class Syntactic classes ----------------- -This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. For more information, see `Abstract syntax tree `__. - -Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Symbolic information `__. +This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Abstract syntax tree `__ and `Symbolic information `__ in Wikipedia. Scope ^^^^^ @@ -239,7 +237,7 @@ Other Control flow classes -------------------- -This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into `basic blocks `__. +This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into basic blocks. For more information, see `basic blocks `__ in Wikipedia. Example ^^^^^^^ @@ -309,7 +307,7 @@ For example, which ``ClassValue``\ s are iterable can be determined using the qu where cls.hasAttribute("__iter__") select cls -➤ `See this in the query console `__ This query returns a list of classes for the projects analyzed. If you want to include the results for `builtin classes `__, which do not have any Python source code, show the non-source results. +➤ `See this in the query console `__ This query returns a list of classes for the projects analyzed. If you want to include the results for ``builtin`` classes, which do not have any Python source code, show the non-source results. For more information, see `builtin classes `__ in the Python documentation. Summary ^^^^^^^ @@ -320,7 +318,7 @@ Summary - ``CallableValue`` - ``ModuleValue`` -For more information about these classes, see :doc:`Pointer analysis and type inference in Python `. +For more information about these classes, see ":doc:`Pointer analysis and type inference in Python `." Taint-tracking classes ---------------------- @@ -334,7 +332,7 @@ Summary - `TaintKind `__ - `Configuration `__ -For more information about these classes, see :doc:`Analyzing data flow and tracking tainted data in Python `. +For more information about these classes, see ":doc:`Analyzing data flow and tracking tainted data in Python `". Further reading diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index e0fbdac403cf..ddee31da7c0f 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -25,9 +25,7 @@ Class hierarchy for ``Value``: Points-to analysis and type inference ------------------------------------- -Points-to analysis, sometimes known as `pointer analysis `__, allows us to determine which objects an expression may "point to" at runtime. - -`Type inference `__ allows us to infer what the types (classes) of an expression may be at runtime. +Points-to analysis, sometimes known as pointer analysis, allows us to determine which objects an expression may "point to" at runtime. Type inference allows us to infer what the types (classes) of an expression may be at runtime. For more information, see `pointer analysis `__ and `Type inference `__ on Wikipedia. The predicate ``ControlFlowNode.pointsTo(...)`` shows which object a control flow node may "point to" at runtime. @@ -126,7 +124,7 @@ Combining the parts of the query we get this: ) select t, ex1, ex2 -➤ `See this in the query console `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard query: `Unreachable 'except' block `__. +➤ `See this in the query console `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard query: Unreachable 'except' block. For more information, see `Unreachable 'except' block `__ on LGTM.com. .. pull-quote:: @@ -186,7 +184,7 @@ The ``Value`` class has a method ``getACall()`` which allows us to find calls to If we wish to restrict the callables to actual functions we can use the ``FunctionValue`` class, which is a subclass of ``Value`` and corresponds to function objects in Python, in much the same way as the ``ClassValue`` class corresponds to class objects in Python. -Returning to an example from :doc:`Tutorial: Functions `, we wish to find calls to the ``eval`` function. +Returning to an example from ":doc:`Tutorial: Functions `," we wish to find calls to the ``eval`` function. The original query looked this: diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index 2a7a57c33b5a..016fdb11a3e4 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -178,7 +178,7 @@ If there are duplicate keys in a Python dictionary, then the second key will ove and k1 != k2 and same_key(k1, k2) select k1, "Duplicate key in dict literal" -➤ `See this in the query console `__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard `Duplicate key in dict literal `__ query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. +➤ `See this in the query console `__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard "Duplicate key in dict literal" query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. For more information, see `Duplicate key in dict literal `__ on LGTM.com. The supporting predicate ``same_key`` checks that the keys have the same identifier. Separating this part of the logic into a supporting predicate, instead of directly including it in the query, makes it easier to understand the query as a whole. The casts defined in the predicate restrict the expression to the type specified and allow predicates to be called on the type that is cast-to. For example: @@ -197,7 +197,7 @@ The short version is usually used as this is easier to read. Example finding Java-style getters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Returning to the example from :doc:`Tutorial: Functions `, the query identified all methods with a single line of code and a name starting with ``get``. +Returning to the example from ":doc:`Tutorial: Functions `," the query identified all methods with a single line of code and a name starting with ``get``. .. code-block:: ql diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index 90cb20b55179..4f892b437a58 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -18,7 +18,7 @@ even though there is no data flow from ``path`` to ``path + "/"``. Separate CodeQL libraries have been written to handle 'normal' data flow and taint tracking in :doc:`C/C++ <../cpp/dataflow>`, :doc:`C# <../csharp/dataflow>`, :doc:`Java <../java/dataflow>`, and :doc:`JavaScript <../javascript/dataflow>`. You can access the appropriate classes and predicates that reason about these different modes of data flow by importing the appropriate library in your query. In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve values and those that don't by defining additional data flow properties. -For further information on data flow and taint tracking with CodeQL, see :doc:`Introduction to data flow <../intro-to-data-flow>`. +For further information on data flow and taint tracking with CodeQL, see ":doc:`Introduction to data flow <../intro-to-data-flow>`." Fundamentals of taint tracking using data flow analysis ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -231,8 +231,8 @@ The ``TaintKind`` itself is just a string (a QL string, not a CodeQL entity repr which provides methods to extend flow and allow the kind of taint to change along the path. The ``TaintKind`` class has many predicates allowing flow to be modified. This simplest ``TaintKind`` does not override any predicates, meaning that it only flows as opaque data. -An example of this is the `Hard-coded credentials query `_, -which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes. +An example of this is the "Hard-coded credentials" query, +which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes. For more information, see `Hard-coded credentials `_ on LGTM.com. .. code-block:: ql From 2a5ac2e8294280451ee6b847a93dfc4557873fa2 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 18 Feb 2020 16:50:48 +0000 Subject: [PATCH 07/14] Fix failing Sphinx tests --- docs/language/learn-ql/python/control-flow.rst | 1 + docs/language/learn-ql/python/functions.rst | 1 + docs/language/learn-ql/python/introduce-libraries-python.rst | 1 + docs/language/learn-ql/python/pointsto-type-infer.rst | 1 + docs/language/learn-ql/python/statements-expressions.rst | 1 + docs/language/learn-ql/python/taint-tracking.rst | 3 ++- 6 files changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index ec2d602d492e..6fb7e8d919f1 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -115,4 +115,5 @@ Further reading --------------- - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index 2924a77f910a..a412b57e72ab 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -85,4 +85,5 @@ Further reading - ":doc:`Pointer analysis and type inference in Python `" - ":doc:`Analyzing control flow in Python `" - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index e29e25edb4b0..d5c7ab77aaba 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -343,4 +343,5 @@ Further reading - ":doc:`Pointer analysis and type inference in Python `" - ":doc:`Analyzing control flow in Python `" - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index ddee31da7c0f..b64762c1d0dc 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -231,4 +231,5 @@ Further reading - ":doc:`Analyzing control flow in Python `" - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index 016fdb11a3e4..2b857f4f9a24 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -260,4 +260,5 @@ Further reading - ":doc:`Pointer analysis and type inference in Python `" - ":doc:`Analyzing control flow in Python `" - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index 4f892b437a58..ff9eff4ed6f5 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -132,7 +132,7 @@ The sink is defined by using a custom ``TaintTracking::Sink`` class. Converting a taint-tracking query to a path query -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Although the taint tracking query above tells which sources flow to which sinks, it doesn't tell us how. For that we need a path query. @@ -262,4 +262,5 @@ Further reading - ":doc:`Pointer analysis and type inference in Python `" - ":doc:`Analyzing control flow in Python `" - ":doc:`Analyzing data flow and tracking tainted data in Python `" + .. include:: ../../reusables/python-other-resources.rst From f8c876176a046b7ea2657fe0585db1392b7f258c Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Wed, 19 Feb 2020 16:31:15 +0000 Subject: [PATCH 08/14] Apply suggestions from code review Many thanks for the review suggestions. Co-Authored-By: Shati Patel <42641846+shati-patel@users.noreply.github.com> --- docs/language/learn-ql/python/control-flow.rst | 8 ++++---- docs/language/learn-ql/python/functions.rst | 4 ++-- .../learn-ql/python/introduce-libraries-python.rst | 10 +++++----- docs/language/learn-ql/python/pointsto-type-infer.rst | 8 ++++---- .../learn-ql/python/statements-expressions.rst | 2 +- docs/language/learn-ql/python/taint-tracking.rst | 2 +- 6 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index 6fb7e8d919f1..425f900dcd1d 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -3,7 +3,7 @@ Analyzing control flow in Python You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. -To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph `__ in Wikipedia. +To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph `__ on Wikipedia. The ``ControlFlowNode`` class ----------------------------- @@ -42,7 +42,7 @@ Example finding unreachable AST nodes where not exists(node.getAFlowNode()) select node -➤ `See this in the query console `__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements: +➤ `See this in the query console `__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements. Example finding unreachable statements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -55,12 +55,12 @@ Example finding unreachable statements where not exists(s.getAFlowNode()) select s -➤ `See this in the query console `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: unreachable code. For more information, see `Unreachable code `__ on LGTM.com. +➤ `See this in the query console `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard "Unreachable code" query. For more information, see `Unreachable code `__ on LGTM.com. The ``BasicBlock`` class ------------------------ -The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `basic block `__ and `dominates `__ on Wikipedia. +The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `Basic block `__ and `Dominator `__ on Wikipedia. Example finding mutually exclusive basic blocks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index a412b57e72ab..2090372ac3c7 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -3,7 +3,7 @@ Functions in Python Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. -These examples use the standard CodeQL class `Function `__. For more information, see :doc:`Introducing the Python libraries `. +These examples use the standard CodeQL class `Function `__. For more information, see ":doc:`Introducing the Python libraries `." Finding all functions called "get..." ------------------------------------- @@ -57,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a and count(f.getAStmt()) = 1 select f, "This function is (probably) a getter." -➤ `See this in the query console `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Tutorial: Statements and expressions `." +➤ `See this in the query console `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python `." Finding a call to a specific function ------------------------------------- diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index d5c7ab77aaba..3309487c1913 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -22,12 +22,12 @@ The CodeQL library for Python incorporates a large number of classes. Each class Syntactic classes ----------------- -This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Abstract syntax tree `__ and `Symbolic information `__ in Wikipedia. +This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Abstract syntax tree `__ and `Symbolic information `__ on Wikipedia. Scope ^^^^^ -A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes (`Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``). +A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes `Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``. - ``Scope`` @@ -151,7 +151,7 @@ Both forms are equivalent. Using the positive expression, the whole query looks ➤ `See this in the query console `__. Many projects include pass-only ``except`` blocks. -Summary of syntactic classes +Summary ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The most commonly used standard classes in the syntactic part of the library are organized as follows: @@ -237,7 +237,7 @@ Other Control flow classes -------------------- -This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into basic blocks. For more information, see `basic blocks `__ in Wikipedia. +This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into basic blocks. For more information, see `Basic block `__ on Wikipedia. Example ^^^^^^^ @@ -332,7 +332,7 @@ Summary - `TaintKind `__ - `Configuration `__ -For more information about these classes, see ":doc:`Analyzing data flow and tracking tainted data in Python `". +For more information about these classes, see ":doc:`Analyzing data flow and tracking tainted data in Python `." Further reading diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index b64762c1d0dc..6dfee9f4bbbb 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -1,7 +1,7 @@ Pointer analysis and type inference in Python ============================================= -At run time, each Python expression has a value with an associated type. You can learn how an expression behaves at run time using type-inference classes from the standard CodeQL library. +At runtime, each Python expression has a value with an associated type. You can learn how an expression behaves at runtime using type-inference classes from the standard CodeQL library. This topic contains worked examples of how to write queries using the standard CodeQL library classes for Python type inference. @@ -25,7 +25,7 @@ Class hierarchy for ``Value``: Points-to analysis and type inference ------------------------------------- -Points-to analysis, sometimes known as pointer analysis, allows us to determine which objects an expression may "point to" at runtime. Type inference allows us to infer what the types (classes) of an expression may be at runtime. For more information, see `pointer analysis `__ and `Type inference `__ on Wikipedia. +Points-to analysis, sometimes known as pointer analysis, allows us to determine which objects an expression may "point to" at runtime. Type inference allows us to infer what the types (classes) of an expression may be at runtime. For more information, see `Pointer analysis `__ and `Type inference `__ on Wikipedia. The predicate ``ControlFlowNode.pointsTo(...)`` shows which object a control flow node may "point to" at runtime. @@ -124,7 +124,7 @@ Combining the parts of the query we get this: ) select t, ex1, ex2 -➤ `See this in the query console `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard query: Unreachable 'except' block. For more information, see `Unreachable 'except' block `__ on LGTM.com. +➤ `See this in the query console `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard "Unreachable 'except' block" query. For more information, see `Unreachable 'except' block `__ on LGTM.com. .. pull-quote:: @@ -184,7 +184,7 @@ The ``Value`` class has a method ``getACall()`` which allows us to find calls to If we wish to restrict the callables to actual functions we can use the ``FunctionValue`` class, which is a subclass of ``Value`` and corresponds to function objects in Python, in much the same way as the ``ClassValue`` class corresponds to class objects in Python. -Returning to an example from ":doc:`Tutorial: Functions `," we wish to find calls to the ``eval`` function. +Returning to an example from ":doc:`Functions in Python `," we wish to find calls to the ``eval`` function. The original query looked this: diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index 2b857f4f9a24..209d22ae3d1d 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -197,7 +197,7 @@ The short version is usually used as this is easier to read. Example finding Java-style getters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Returning to the example from ":doc:`Tutorial: Functions `," the query identified all methods with a single line of code and a name starting with ``get``. +Returning to the example from ":doc:`Functions in Python `," the query identified all methods with a single line of code and a name starting with ``get``. .. code-block:: ql diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index ff9eff4ed6f5..bfdae7aa4eb4 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -1,7 +1,7 @@ Analyzing data flow and tracking tainted data in Python ======================================================= -You can use CodeQL to track the flow of data through a Python program to its use. Tracking user-controlled, or tainted, data is a key technique for security researchers. +You can use CodeQL to track the flow of data through a Python program. Tracking user-controlled, or tainted, data is a key technique for security researchers. About data flow and taint tracking ---------------------------------- From 552d2edb5b128a309aa24cf7faa318c5504c695e Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Wed, 19 Feb 2020 16:35:59 +0000 Subject: [PATCH 09/14] Correction one more mention of tutorials --- docs/language/learn-ql/python/control-flow.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index 425f900dcd1d..736bb89f08bd 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -109,7 +109,7 @@ Example finding mutually exclusive blocks within the same function ) select b1, b2 -➤ `See this in the query console `__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis which is covered in the next tutorial. +➤ `See this in the query console `__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see :doc:`Analyzing data flow and tracking tainted data in Python `. Further reading --------------- From 1da1d921707694535d9847f812381b3d7e25af74 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Wed, 19 Feb 2020 16:39:29 +0000 Subject: [PATCH 10/14] Update intro for library overview topic Based on suggestions from James and Shati --- .../language/learn-ql/python/introduce-libraries-python.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 3309487c1913..fa0a0ac7bcb0 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -1,12 +1,14 @@ CodeQL library for Python ========================= -Overview of the extensive library you use to analyze databases generated from Python code bases. This library uses classes with abstractions and predicates to present the data in an object-oriented form. This abstraction makes it easier for you to write queries. +This is an overview of the extensive library you use to analyze databases generated from Python code bases. Using this library makes it easier for you to write queries. About the CodeQL library for Python ----------------------------------- -The CodeQL library for each programming language is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: +The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. This abstraction makes it easier for you to write queries. + +Each CodeQL library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: .. code-block:: ql From 96f37c910b9530b28c25fddc3c29c12b18f8623b Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Thu, 20 Feb 2020 12:32:18 +0000 Subject: [PATCH 11/14] Apply suggestions from code review --- docs/language/learn-ql/python/functions.rst | 2 +- docs/language/learn-ql/python/introduce-libraries-python.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index 2090372ac3c7..a58f7ef889a0 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -1,7 +1,7 @@ Functions in Python =================== -Functions are key building blocks of Python code bases. You can find functions and identify calls to them using syntactic classes from the standard CodeQL library. +Functions are key building blocks of Python code bases. You can use syntactic classes from the standard CodeQL library to find functions and identify calls to them. These examples use the standard CodeQL class `Function `__. For more information, see ":doc:`Introducing the Python libraries `." diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index fa0a0ac7bcb0..44818219bb64 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -6,7 +6,7 @@ This is an overview of the extensive library you use to analyze databases genera About the CodeQL library for Python ----------------------------------- -The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. This abstraction makes it easier for you to write queries. +The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. Each CodeQL library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: From 7a2bb120ecd9567fbacea0a297debec8e3adc855 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Mon, 2 Mar 2020 14:50:52 +0000 Subject: [PATCH 12/14] Update introductions for feedback --- docs/language/learn-ql/python/control-flow.rst | 5 ++++- docs/language/learn-ql/python/functions.rst | 2 +- docs/language/learn-ql/python/introduce-libraries-python.rst | 4 ++-- docs/language/learn-ql/python/pointsto-type-infer.rst | 5 +---- docs/language/learn-ql/python/statements-expressions.rst | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index 736bb89f08bd..d16453157845 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -1,7 +1,10 @@ Analyzing control flow in Python ================================ -You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. +You can write CodeQL queries to explore the control-flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. + +About analyzing control flow +-------------------------------------- To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph `__ on Wikipedia. diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index a58f7ef889a0..8fa89f5e188a 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -1,7 +1,7 @@ Functions in Python =================== -Functions are key building blocks of Python code bases. You can use syntactic classes from the standard CodeQL library to find functions and identify calls to them. +You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them. These examples use the standard CodeQL class `Function `__. For more information, see ":doc:`Introducing the Python libraries `." diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 44818219bb64..80944b852a20 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -1,12 +1,12 @@ CodeQL library for Python ========================= -This is an overview of the extensive library you use to analyze databases generated from Python code bases. Using this library makes it easier for you to write queries. +When you need to analyze a Python program, you can make use of the large collection of classes in the Python library for CodeQL. About the CodeQL library for Python ----------------------------------- -The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. +The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. Each CodeQL library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index 6dfee9f4bbbb..40f2ecb81fff 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -1,10 +1,7 @@ Pointer analysis and type inference in Python ============================================= -At runtime, each Python expression has a value with an associated type. You can learn how an expression behaves at runtime using type-inference classes from the standard CodeQL library. - - -This topic contains worked examples of how to write queries using the standard CodeQL library classes for Python type inference. +At runtime, each Python expression has a value with an associated type. You can learn how an expression behaves at runtime by using type-inference classes from the standard CodeQL library. The ``Value`` class -------------------- diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index 209d22ae3d1d..eda2d1e45781 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -1,7 +1,7 @@ Expressions and statements in Python ==================================== -Expressions define a value. Statements represent a command or action. You can explore how they are used in a code base using syntactic classes from the standard CodeQL library. +You can use syntactic classes from the CodeQL library to explore how Python expressions and statements are used in a code base. Statements ---------- From 90a9a6d2ac0fd0cc2f2860e1c04aee137f9facc0 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Fri, 6 Mar 2020 15:13:10 +0000 Subject: [PATCH 13/14] Update docs/language/learn-ql/python/introduce-libraries-python.rst --- docs/language/learn-ql/python/introduce-libraries-python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 80944b852a20..81889a0cb940 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -154,7 +154,7 @@ Both forms are equivalent. Using the positive expression, the whole query looks ➤ `See this in the query console `__. Many projects include pass-only ``except`` blocks. Summary -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^ The most commonly used standard classes in the syntactic part of the library are organized as follows: From f1238f1ec92e87a68ef3a6bf0fb121aadc4ae0c0 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 10 Mar 2020 17:11:59 +0000 Subject: [PATCH 14/14] Update docs/language/learn-ql/python/introduce-libraries-python.rst --- docs/language/learn-ql/python/introduce-libraries-python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 81889a0cb940..c7809eb710b3 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -1,7 +1,7 @@ CodeQL library for Python ========================= -When you need to analyze a Python program, you can make use of the large collection of classes in the Python library for CodeQL. +When you need to analyze a Python program, you can make use of the large collection of classes in the CodeQL library for Python. About the CodeQL library for Python -----------------------------------