Mixing SQL queries into your python code may not be the ideal approach in every scenario, however sometimes an ORM just doesn't meet your complex querying needs and a SQL query is much simpler.

Regardless of the merits of using ORMs or not, this post assumes you are already in the situation where you have a complex SQL query sitting in your python application somewhere, and you want to test it just like you test your other functions. 
<br>


# Setting the scene

Let's begin by motivating the examples in this post.

We're currently working on a flask app and we've been asked to integrate a new data source to an existing view in the app. We take a look at the endpoint that renders the template for the view and this is what we're greeted with:

In [3]:
#| code-fold: true
#| code-summary: Dummy flask app setup code
from flask import Flask, request
import sqlite3
from flask import g

DATABASE = 'database.db'

# for a real app we'd use the factory pattern
app = Flask(__name__)

# quickstart from https://flask.palletsprojects.com/en/2.2.x/patterns/sqlite3/
def get_db()-> sqlite3.Connection:
    db = getattr(g, '_database', None)
    if db is None:
        db = g._database = sqlite3.connect(DATABASE)
    return db

@app.teardown_appcontext
def close_connection(exception):
    db = getattr(g, '_database', None)
    if db is not None:
        db.close()


In [4]:
@app.get('/')
def home():
    filter = request.args.get('filter')
    query = "SELECT some_column FROM some_table WHERE some_column = ?"
    cur = get_db().execute(query, [filter])
    res = cur.fetchall()
    return res

A user sends a request to `/?filter=columnvalue` and on the server side we send the following query to our DB: 
```sql
SELECT some_column FROM some_table WHERE some_column = 'columnvalue'
```

We've been asked to add `another_table` to the query and so I decide something like this works pretty well for what we need:

```sql
SELECT some_column FROM some_table union another_table WHERE some_column = 'columnvalue'

```
(lets assume `another_table` has the same schema as `some_table` here)

This is pretty simple, and going through the effort of adding a test for this doesn't just doesn't seem worth the effort, I'm feeling pretty confident that my new query will work fine after testing it manually in my local dev environment.

Fair enough.

But here's a query that I'd want to be more careful with:

```sql
-- taken from https://learnsql.com/blog/cte-with-examples/
WITH avg_position AS (
    SELECT position, AVG(bonus) AS average_bonus_for_position
    FROM bonus_jan
    GROUP BY position),
    avg_region AS (
    SELECT region, AVG (bonus) AS average_bonus_for_region
    FROM bonus_jan
    GROUP BY region)   
SELECT b.employee_id, b.first_name, b.last_name, b.position, b.region, b.bonus, ap.average_bonus_for_position, ar.average_bonus_for_region
FROM bonus_jan b
JOIN avg_position ap
ON b.position = ap.position
JOIN avg_region ar
ON b.region = ar.region;
```

When the query is an amalgamation of dozens of business rules and domain specific quirks of the data - it can end up looking quite unwieldy and I've find that testing is a huge help in understanding the expected behaviour, reducing iteration time in development and increasing confidence in my changes.

The goal of this post is to demonstrate an example of a more realistic query that you might encounter in production, we'll first go through the manual process of how we could test this query and ensure it returns the data we expect, and then we'll automate the testing steps with pytest and incorporate into the rest of our test suite, finally we'll make some changes to query to demonstrate how adding test helps with development and maintenance. 