docs: add page on query performance (close #2316) (#3693)

hasura · May 12, 2020 · 9a16e25 · 9a16e25
1 parent d17b223
commit 9a16e25
Show file tree

Hide file tree

Showing 9 changed files with 245 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -45,6 +45,7 @@ Read more about the session argument for computed fields in the [docs](https://h
 - cli: list all available commands in root command help (fix #4623)
 - docs: add section on actions vs. remote schemas to actions documentation (#4284)
 - docs: fix wrong info about excluding scheme in CORS config (#4685)
+- docs: add docs page on query performance (close #2316) (#3693)
 
 ## `v1.2.0`
 

diff --git a/docs/graphql/manual/queries/index.rst b/docs/graphql/manual/queries/index.rst
@@ -67,4 +67,5 @@ based on a typical author/article schema for reference.
   pagination
   Using multiple arguments <multiple-arguments>
   multiple-queries
-  variables-aliases-fragments-directives
+  Using variables / aliases / fragments / directives <variables-aliases-fragments-directives>
+  performance
diff --git a/docs/graphql/manual/queries/performance.rst b/docs/graphql/manual/queries/performance.rst
@@ -0,0 +1,238 @@
+.. meta::
+   :description: Performance of Hasura GraphQL queries
+   :keywords: hasura, docs, schema, queries, performance
+
+.. _query_performance:
+
+Query performance
+=================
+
+.. contents:: Table of contents
+  :backlinks: none
+  :depth: 2
+  :local:
+
+Introduction
+------------
+
+Sometimes queries can become slow due to large data volumes or levels of nesting. 
+This page explains how to identify the query performance, how the query plan caching in Hasura works, and how queries can be optimized.
+
+.. _analysing_query_performance:
+
+Analysing query performance
+---------------------------
+
+Let's say we want to analyse the following query:
+
+
+.. code-block:: graphql
+
+   query {
+      authors(where: {name: {_eq: "Mario"}}) {
+         rating
+      }
+   }
+
+In order to analyse the performance of a query, you can click on the ``Analyze`` button on the Hasura console:
+
+.. thumbnail:: ../../../img/graphql/manual/queries/analyze-query.png
+   :class: no-shadow
+   :width: 75%
+   :alt: Query analyze button on Hasura console
+
+The following query execution plan is generated:
+
+.. thumbnail:: ../../../img/graphql/manual/queries/query-analysis-before-index.png
+   :class: no-shadow
+   :width: 75%
+   :alt: Execution plan for Hasura GraphQL query
+
+We can see that a sequential scan is conducted on the ``authors`` table. This means that Postgres goes through every row of the ``authors`` table in order to check if the author's name equals "Mario".
+The ``cost`` of a query is an arbitrary number generated by Postgres and is to be interpreted as a measure of comparison rather than an absolute measure of something.
+
+Read more about query performance analysis in the `Postgres explain statement docs <https://www.postgresql.org/docs/current/sql-explain.html>`__.
+
+.. _query_plan_caching:
+
+Query plan caching
+------------------
+
+How it works
+^^^^^^^^^^^^
+
+Hasura executes GraphQL queries as follows:
+
+1. The incoming GraphQL query is parsed into an `abstract syntax tree <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__ (AST) which is how GraphQL is represented.
+2. The GraphQL AST is validated against the schema to generate an internal representation.
+3. The internal representation is converted into an SQL statement (a `prepared statement <https://www.postgresql.org/docs/current/sql-prepare.html>`__ whenever possible).
+4. The (prepared) statement is executed on Postgres to retrieve the result of the query.
+
+For most use cases, Hasura constructs a "plan" for a query, so that a new instance of the same query can be executed without the overhead of steps 1 to 3.
+
+For example, let's consider the following query:
+
+.. code-block:: graphql
+
+   query getAuthor($id: Int!) {
+      authors(where: {id: {_eq: $id}}) {
+         name
+         rating
+      }
+   }
+
+With the following variable:
+
+.. code-block:: graphql
+
+   {
+      "id": 1
+   }
+
+Hasura now tries to map a GraphQL query to a prepared statement where the parameters have a one-to-one correspondence to the variables defined in the GraphQL query. 
+The first time a query comes in, Hasura generates a plan for the query which consists of two things:
+
+1. The prepared statement
+2. Information necessary to convert variables into the prepared statement's arguments
+
+For the above query, Hasura generates the following prepared statement (simplified):
+
+.. code-block:: plpgsql
+
+   select name, rating from author where id = $1
+
+With the following prepared variables:
+
+.. code-block:: plpgsql
+
+   $1 = 1
+
+This plan is then saved in a data structure called ``Query Plan Cache``. The next time the same query is executed, 
+Hasura uses the plan to convert the provided variables into the prepared statement's arguments and then executes the statement. 
+This will significantly cut down the execution time for a GraphQL query resulting in lower latencies and higher throughput.
+
+Caveats
+^^^^^^^
+
+The above optimization is not possible for all types of queries. For example, consider this query:
+
+.. code-block:: graphql
+
+   query getAuthorWithCondition($condition: author_bool_exp!) {
+      author(where: $condition)
+         name
+         rating
+      }
+   }
+
+The statement generated for ``getAuthorWithCondition`` is now dependent on the variables.
+
+With the following variables:
+
+.. code-block:: json
+
+   {
+      "condition": {"id": {"_eq": 1}}
+   }
+
+the generated statement will be:
+
+.. code-block:: plpgsql
+
+   select name, rating from author where id = $1
+
+However, with the following variables:
+
+.. code-block:: json
+
+   {
+      "condition": {"name": {"_eq": "John"}}
+   }
+
+the generated statement will be:
+
+.. code-block:: plpgsql
+
+   select name, rating from author where name = 'John'
+
+A plan cannot be generated for such queries because the variables defined in the GraphQL query don't have a one-to-one correspondence to the parameters in the prepared statement.
+
+Query optimization
+------------------
+
+Using GraphQL variables
+^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to leverage Hasura's query plan caching (as explained in the :ref:`previous section <query_plan_caching>`) to the full extent, GraphQL queries should be defined with
+variables whose types are **non-nullable scalars** whenever possible.
+
+To make variables non-nullable, add a ``!`` at the end of the type, like here:
+
+.. code-block:: graphql
+   :emphasize-lines: 1
+
+   query getAuthor($id: Int!) {
+      authors(where: {id: {_eq: $id}}) {
+         name
+         rating
+      }
+   }
+
+If the ``!`` is not added and the variable is nullable, the generated query will be different depending if an ``id`` is passed or if the variables is ``null``
+(for the latter, there is no ``where`` statement present). Therefore, it's not possible for Hasura to create a reusable plan for a query in this case.
+
+.. note::
+
+   Hasura is fast even for queries which cannot have a reusable plan.
+   This should concern you only if you face a high volume of traffic (thousands of requests per second).
+
+Using PG indexes
+^^^^^^^^^^^^^^^^
+
+`Postgres indexes <https://www.tutorialspoint.com/postgresql/postgresql_indexes.htm>`__ are special lookup tables that Postgres can use to speed up data lookup.
+An index acts as a pointer to data in a table, and it works very similar to an index in the back of a book. 
+If you look in the index first, you'll find the data much quicker than searching the whole book (or - in this case - database).
+
+Let's say we know that ``authors`` table is frequently queried by ``name``:
+
+.. code-block:: graphql
+
+   query {
+      authors(where: {name: {_eq: "Mario"}}) {
+         rating
+      }
+   }
+
+We've seen in the :ref:`above example <analysing_query_performance>` that by default Postgres conducts a sequential scan i.e. going through all the rows.
+Whenever there is a sequential scan, it can be optimized by adding an index.
+
+.. rst-class:: api_tabs
+.. tabs::
+
+  .. tab:: Console
+
+      An index can be added in the ``SQL -> Data`` tab in the Hasura console:
+
+  .. tab:: API
+
+      An index can be added via the :ref:`run_sql <run_sql>` metadata API.
+
+The following statement sets an index on ``name`` in the ``authors`` table.
+
+.. code-block:: plpgsql
+
+  CREATE INDEX ON authors (name);
+
+Let's compare the performance analysis to :ref:`the one before adding the index <analysing_query_performance>`.
+What was a ``sequential scan`` in the example earlier is now an ``index scan``. ``Index scans`` are usually more performant than ``sequential scans``.
+We can also see that the ``cost`` of the query is now lower than the one before we added the index.
+
+.. thumbnail:: ../../../img/graphql/manual/queries/query-analysis-after-index.png
+   :class: no-shadow
+   :width: 75%
+   :alt: Execution plan for Hasura GraphQL query
+
+.. note::
+
+   In some cases sequential scans can still be faster than index scans, e.g. if the result returns a high percentage of the rows in the table.
+   Postgres comes up with multiple query plans and takes the call on what kind of scan would be faster.
diff --git a/docs/graphql/manual/queries/variables-aliases-fragments-directives.rst b/docs/graphql/manual/queries/variables-aliases-fragments-directives.rst
@@ -50,6 +50,10 @@ In order to make a query re-usable, it can be made dynamic by using variables.
       "author_id": 1
     }
 
+.. admonition:: Variables and performance
+
+  Variables have an impact on query performance. Refer to :ref:`query performance <query_performance>` to learn more about Hasura's query plan caching and about optimizing when using variables.
+
 Using aliases
 -------------
 

diff --git a/docs/img/graphql/manual/queries/analyze-query.png b/docs/img/graphql/manual/queries/analyze-query.png
diff --git a/docs/img/graphql/manual/queries/query-analysis-after-index.png b/docs/img/graphql/manual/queries/query-analysis-after-index.png
diff --git a/docs/img/graphql/manual/queries/query-analysis-before-index.png b/docs/img/graphql/manual/queries/query-analysis-before-index.png
diff --git a/docs/img/graphql/manual/queries/query-execution-plan-after-index.png b/docs/img/graphql/manual/queries/query-execution-plan-after-index.png
diff --git a/docs/img/graphql/manual/queries/query-execution-plan.png b/docs/img/graphql/manual/queries/query-execution-plan.png