Skip to content

Latest commit

 

History

History
373 lines (268 loc) · 16.2 KB

DEVELOPERS.md

File metadata and controls

373 lines (268 loc) · 16.2 KB

Notes for developers

System requirements

  • just
  • Docker
  • recent version of Bash (see macOS notes)

Local development environment

The just command provides a list of available recipes:

just list

Running any of the just commands that need it will set up a local environment and install dependencies.

Testing

Test categories

Tests are divided into the following categories.

unit
fast tests of small code units
spec
tests generated from the ehrQL spec
generative
query engine tests generated by Hypothesis
acceptance
tests which demonstrate how ehrQL is used and check compatibility with real studies
integration
tests of detailed code logic that require a database
docker
tests of the ehrQL docker image
docs
tests of the documentation examples

Each category lives in its own directory (for example tests/unit) and has its own just command to run it (for example just test-unit).

Running tests

To run all tests, as they're run in CI with code coverage):

just test-all

To run just one category of tests:

just test-<category>

For example:

just test-unit

Additional arguments can be passed to any test commands, for example:

just test-unit --verbose --durations=10

For maximum flexibility, the test command can be used to run individual test files or tests, or to do other clever things with pytest. It just delegates to pytest. For example:

just test tests/integration/backends/test_tpp.py

There are further notes on using pytest in the wiki here: https://github.com/opensafely-core/ehrql/wiki/Tips-for-using-pytest

Generative tests

The generative tests use Hypothesis to generate variable definitions (in the query model) and test data. They then execute the resulting dataset definitions using all the available query engines, and check that the results are the same.

To get the most out of our generative tests we want them to run for a long time and to explore different parts of the query space each time they're run. But neither of these qualities are desirable in CI or in local test runs. For this reason, the default configuration generates only a small number of examples, of limited query depth, and in a determinstic fashion. They thus function more as a test that the generative test machinery is working correctly than as a serious attempt to do generative testing.

By default, CI generates more examples than the local tests on the basis that the extra time is less noticeble there and it gives us a better chance of actully catching some novel bugs. But the examples are still deterministic for reasons of sanity preservation.

To do some "proper" generative testing you can run the command:

just test-generative

which increases the example count even further and enables randomisation by setting:

GENTEST_RANDOMIZE=t

It would be worth running these locally when adding new query model operations or making significant changes to a query engine. You may even want to crank the settings further e.g.

GENTEST_EXAMPLES=10000 GENTEST_MAX_DEPTH=20 just test-generative

You can control which query engines the tests exercise using the enviornment variable GENTEST_QUERY_ENGINES. For instance, if you have made a change to the basic SQL-building logic in BaseSQLQueryEngine and you want to rapidly test this with a large number of examples you could compare just the in-memory and SQLite engines using:

GENTEST_QUERY_ENGINES='in_memory sqlite' GENTEST_EXAMPLES=10000 just test-generative

In addition to whatever you do locally, a scheduled Github Actions workflow runs the generative test overnight with settings as high as we can get away with and alerts us in Slack if it finds a failure.

You can get Hypothesis to dump statistics at the end of the run with --hypothesis-show-statistics, or (more usefully) dump some of our own statistics about the generated data and queries by setting GENTEST_DEBUG=t.

When debugging a failure you'll probably want to reproduce it. Hypothesis often struggles to shrink the examples it finds, and even small examples can appear overwhelmingly verbose due to the repetitive nature of query model reprs. To help with this there is some tooling, and a process to follow:

  • Copy the population, variable and data arguments from the example that Hypothesis displays and paste them into a new file. (Don't worry about stripping indentation or trailing commas here.)

  • Run the command:

    just gentest-example-simplify PATH_TO_FILE.py
    

    This should transform the copied code into a valid test example and pull out some of the repeated elements into variables.

  • Run the command:

    just gentest-example-run PATH_TO_FILE.py
    

    This should execute the example and confirm that the test fails in the expected way.

  • To further simplify the example you can copy a repeated element, assign it to a variable and then re-run gentest-example-simplify on the file. This will replace occurances of that element with a reference to the variable.

Hypothesis can generate query graphs that are very deeply nested; after 100 draws in a test example, hypothesis will return the example as invalid. In order to avoid this, the variable strategies check for a maximum depth and return a terminal node if the maximum depth is exceeded (A SelectColumn node for a series strategy, and a SelectTable or SelectPatientTable for a table strategy). The max depth defaults to 15 and can be overridden with environment variable GENTEST_MAX_DEPTH.

See the generative tests documentation for more details.

Writing tests

Please think carefully about how to test code that you are adding or changing. We think that test maintainability is a big risk for this system, so we're trying to be very deliberate about the kind of tests that we write. You should follow these guidelines and raise it with the rest of the team for discussion if you think that they are problematic.

  • Major features in ehrQL should be motivated by a study used in an acceptance test. The number of such studies should be kept small in order that they don't be come a maintenance burden. The studies we use for the acceptance tests will need to be chosen carefully as representative of how we expect ehrQL to be used; they may be real studies or synthetic ones as appropriate.
  • All erhQL features should be covered by spec tests.
  • Complex query language logic that is not fully covered by the spec tests should be covered by unit tests. To avoid duplication, you should not write unit tests for logic that is adequately covered by the spec tests.
  • The main functionality of query engines will naturally be covered by spec tests, which are run against all the query engines.
  • Complex query engine logic that is not fully covered by the spec tests should be covered by unit or integration tests as appropriate. To avoid duplication, you should not write such tests for logic that is adequately covered by the spec tests.
  • Basic operation of ehrQL as a CLI tool is exercised by one trivial embedded acceptance test.
  • Functionality of the Docker image and how it invokes ehrQL are covered by a small set of docker tests.
  • Where backend tables do not map directly to the schemas that they implement, it may be helpful to write integration tests.
  • All other supporting logic should be covered by unit tests. Please avoid the temptation to cover this using acceptance, integration or docker tests that run the ehrQL end-to-end.

Contrary to practice in some quarters we allow disk access by unit tests because it doesn't seem to cause any significant slow-down in those tests at the moment. We'll keep this under review.

Logging in tests

Logging is very verbose and is turned off by default in tests. To turn it on, set the environment variable LOG_SQL=1 and pass the -s option to turn off log capture in pytest.

LOG_SQL=1 just test-all -s

Codebase structure

The files for test categories that target individual modules (for example unit and integration tests) are organized into roughly the same directory structure as the ehrql package itself.

Generally a module ehrql.foo will have a corresponding test file like tests/unit/test_foo.py. However we do not stick slavishly to this: where appropriate we may collapse tests for submodules like ehrql.foo.{bar,bam} into a single test file like tests/unit/test_foo.py, or break tests for a module like ehrql.foo into multiple test files like tests/unit/foo/test_{one,another}_aspect.py.

Test categories that run against the ehrQL as a whole or against multiple components (for example spec and acceptance tests) have their own internal structure.

Code coverage

Our approach to code coverage is to fail the build with less than 100% coverage, but be reasonably liberal about allowing lines to be marked as not requiring coverage. If you make a change that results in a newly un-covered line, you should make a good attempt to test it and expect to have to justify any omissions to PR reviewers; but for genuinely hard or impossible to hit cases it's okay to mark them as no cover.

Any no cover pragmas should include a note explaining why the code can't be hit. Common cases are configured in pyproject.toml with their own explanations.

Test databases

For tests that need to run against a database, we run the database in a Docker container. Each run of the tests starts the database if it's not already running and then leaves it running at the end to speed up future runs. (Each test cleans out the schema to avoid pollution.)

There is a just command to remove the database containers:

just remove-database-containers

Displaying SQL queries

Set the environment variable LOG_SQL=1 (or anything non-empty) to get all SQL queries logged to the console. To get SQL queries in test runs, also use -s to turn off log capture in pytest.

macOS

Starting with version 4.0, Bash is licenced under GPLv3. Because of this, macOS still ships with version 3.2, which is incompatible with some scripts in this repository. We recommend using homebrew to install a more recent version, ensure that the new version is on your $PATH, and restart your Terminal/shell session if necessary.

brew install bash

Static Type Checking

We previously used mypy and type annotations to perform correctness checking of the code base. However, we made the decision to remove this stack after finding it was not a good fit for large parts of the code base.

This does not mean we've abandoned type annotations entirely. The query_model module still makes heavy use of them and implements its own runtime checking to enforce them. And developers should feel free to use them wherever this aids clarity vs a docstring or a comment.

Dataclasses have also retained their annotations to avoid initialising all fields with None.

Documentation

The documentation in this repository forms part of the main OpenSAFELY documentation.

It can also be built as a standalone documentation site with MkDocs to preview content changes, by running:

just docs-serve

⚠️ The documentation will look slightly different from OpenSAFELY's. Relative links to sections of the main documentation outside of the /ehrql sections will not work (although a scheduled Github Action runs overnight to check them).

We may be able to improve this later, depending on the behaviour of the mkdocs plugin that we use: see #1126

Documentation redirects

These are handled in the main OpenSAFELY documentation repository. If you need to redirect URLs — and this should be fairly infrequent — make any changes to the _redirects file in the main documentation repository, and test them in a preview there.

Structure

ehrQL documentation is located in the docs directory. Local configuration is specified in the mkdocs.yml located at the repo root.

The docs/ directory contains some files which are generated from the ehrql code and from other documentation files. Specifically these are files at:

The process for generating these files is described below.

When the main OpenSAFELY documentation is built, it imports the ehrql docs/ directory and builds it within the main documentation site. This assumes that all generated documentation has been updated already (see below for a description of pre-commit hooks and Github Actions that mechanisms that check this happens).

Process for updating ehrql documentation

  1. Developer makes changes to documentation files or code/tests that generate documentation
  2. Changes committed; pre-commit hook ensures generated docs are up-to-date
  3. PR opened; CI:
    • ensures generated docs are up to date
  4. PR merged; CI:
    • triggers a deploy of the main OpenSAFELY documentation site
  5. On a schedule (weekly), CI in the main opensafely/documentation repository:
    • checks all the documentation links are valid

Using includes from the parent documentation

The most likely use case is including the glossary.md from the parent documentation.

To do so, use a slightly different snippet syntax:

!!! parent_snippet:'includes/glossary.md'

Generating data for documentation

Some ehrQL documentation is generated from code in this repo.

See the spec tests docs for further information on writing tests that contribute to the ehrQL docs.

An intermediate step generates the markdown files that are included in the documentation.

To generate this file, run:

just generate-docs

This generates the markdown files in docs/includes/generated_docs.

This command runs as a pre-commit hook and will fail if there are any changes to the generated markdown files. It is a developer's responsibility to update the generated docs in their PR if required. There is also a CI step that will check that the documentation is up to date.

Testing ehrQL definitions included in the documentation

All of the example tests can be run with:

just test-docs-examples

Dataset and measures definitions may be included:

  • inline in Markdown files in docs/, labelled as code blocks with the ehrql syntax,
  • or as Python .py files in docs/.

Examples using codelist_from_csv()

For testing examples, codelist_from_csv() is currently patched out to work without any CSV, nor are codelist codes validated.

The function signature of codelist_from_csv() calls from examples is checked.

This may be improved in future to make the testing more rigorous; see #1694.

Inline code blocks (Markdown fences)

Examples in the documentation Markdown source will be tested as part of the test suite if you place complete examples in a code block with the ehrql syntax label: ```ehrql

This will still highlight the code as if it were Python.

⚠️ The ehrql syntax label is for inline and complete ehrQL blocks only.

We use the SuperFences extension for extracting Markdown fences. Refer to the SuperFences documentation for more details of the fence format.

ehrQL definitions as included Python files

Python files in the docs/ directory are assumed to be working ehrQL dataset or measures definitions.

They are also tested in the test suite.

If included in the documentation using the snippet syntax, they must be used with a python syntax label. (If they were labelled as ehrql, the snippet line itself would be extracted from the Markdown, and treated as a dataset definition.)

Updating the main OpenSAFELY documentation repository

Merges to the main branch in this repo trigger a deployment of the main OpenSAFELY documentation via a Github Action.