Skip to content

Development

Nikita Stupin edited this page Jul 10, 2023 · 19 revisions

Principles

  • Do One Thing and Do It Well to keep the program simple, maintainable, and robust.
  • Play nicely with other tools. Particularly, produce an output that can be consumed by other tools. To support the Do One Thing and Do It Well principle.

Testing

To run unit tests:

cd clairvoyance
python3 -m unittest tests/*_test.py

Publishing

  1. First bump version in the pyproject.toml
  2. Then trigger the CD process
    git tag v2.0.1 main
    git push origin v2.0.1

How clairvoyance works?

Since we're trying to obtain valid schema it's good to know what essential components each valid schema has. Pick a schema of your choice and look at top-level keys.

cat schema.json | jq '.data.__schema | keys'
[
  "directives",
  "mutationType",
  "queryType",
  "subscriptionType",
  "types"
]

We can skip directives for now, as well as mutationType, queryType and subscriptionType because they are simple {"name": "Root"} dictionaries where Root can be obtained by __typename field.

The interesting part is types key. Let's take a closer look at it. It's an array each element of which is following.

cat schema.json | jq '.data.__schema.types[0] | keys'
[
  "description",
  "enumValues",
  "fields",
  "inputFields",
  "interfaces",
  "kind",
  "name",
  "possibleTypes"
]

Most important keys are name, kind, fields and inputFields. enumValues and possibleTypes are important too but they aren't supported by clairvoyance yet. description and interfaces aren't so important and probably hard to obtain.

...

Let's assume that we have Apollo Server as our target and introspection is disabled. What "features" of Apollo Server can we use in order to obtain schema?

Please note that examples shown on fields but same techniques apply to arguments as well.

Suggestions

If we supply invalid field which is similar to valid field, underlying graphql-js library will kindly give us list of suggestions with valid fields.

{
  "query": "{ star }"
}
{
  "errors": [
    {
      "message": "Cannot query field \"star\" on type \"Root\". Did you mean \"starship\"?",
      "locations": [
        {
          "line": 1,
          "column": 3
        }
      ]
    }
  ]
}

More information about this behaviour can be found in How apollo-server suggestions works? issue.

Valid fields cause specific errors or do not cause errors at all

{
  "query": "{ vehicle }"
}
{
  "errors": [
    {
      "message": "Field \"vehicle\" of type \"Vehicle\" must have a selection of subfields. Did you mean \"vehicle { ... }\"?",
      "locations": [
        {
          "line": 1,
          "column": 3
        }
      ]
    }
  ]
}

So we can understand if field is valid even without suggestions.

We don't need valid arguments

film field requires id or filmID to be provided.

{
  "query": "{ film {id} }"
}
{
  "errors": [
    {
      "message": "must provide id or filmID",
      "locations": [
        {
          "line": 1,
          "column": 3
        }
      ],
      "path": [
        "film"
      ]
    }
  ],
  "data": {
    "film": null
  }
}

However it doesn't matter in case of obtaining field name and type because Apollo Server / graphql-js kindly generates error messages for film's fields even without valid arguments.

{
  "query": "{ film { titl } }"
}
{
  "errors": [
    {
      "message": "Cannot query field \"titl\" on type \"Film\". Did you mean \"title\"?",
      "locations": [
        {
          "line": 1,
          "column": 10
        }
      ]
    }
  ]
}

Supplying multiple fields

If we supply multiple fields, we'll get error for each field.

{
  "query": "{ star spice }"
}
{
  "errors": [
    {
      "message": "Cannot query field \"star\" on type \"Root\". Did you mean \"starship\"?",
      "locations": [
        {
          "line": 1,
          "column": 3
        }
      ]
    },
    {
      "message": "Cannot query field \"spice\" on type \"Root\". Did you mean \"species\"?",
      "locations": [
        {
          "line": 1,
          "column": 8
        }
      ]
    }
  ]
}

This allows us to speed up the process of probing for fields and arguments (up to several thousand words per second with single thread).

Modules description

__main__.py

This module contains code needed for running clairvoyance from command line (e.g. python3 -m clairvoyance).

As of 24 Oct 2020 it also contains code for running clairvoyance() on each of types. This part should be moved to oracle.py or another module.

oracle.py

This module contains code for creating schema. There are two main types of functions:

  • Those starting with probe_ perform HTTP request, do basic response analysis and call functions starting with get_.
  • Functions with get_ in turn works offline and tries to extract valid fields / args / ... from error messages.

clairvoyance() function used to manage the process of step-by-step schema construction (probe_* and get_* functions).

graphql.py

This module contains classes for various GraphQL concepts such as Schema, Type, Field, InputValue, TypeRef. Each of them has methods for converting to / from JSON.

There is also Config class which holds GraphQL endpoint configuration (e.g. URL, headers, bucket size).

Performance

Concerning async, there still is room for improvements, I'll push some as I review the integrity of clairvoyance features.

So, a program is either I/O or CPU bound.

If it's CPU bound, you want to start multiple process/worker to take over each CPU (in python there is the GIL which make it harder etc..)

Clairvoyance is a I/O bound program.

The aiohttp module implements a few recycling mechanism but it takes over sync program because you can proceed task while waiting for server. Basically, requests module block the thread until the server replies completely and in a HTTP request lifecycle, this is the longest time elapsed.

The idea is to send a batch of requests and wait for the batch once. Complexity wide, the waiting time will result in O(n) -> O(1).

by @c3b5aw