# Multiple input sets

In the [previous chapter](./1.2%20Eval%20-%20multiple%20i-o.ipynb) we
constructed a SQL query that can evaluate a neural network with multiple input
and output neurons. This works nicely, but the query can only handle one set of
input values at a time. It would be nice if we could pass multiple input sets at
once, as we can do with a PyTorch model. In theory, SQL would give us
parallelization for free then.

## The Neural Network

We'll start from the same neural network as before.

In [1]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import utils.sqlite as db
import utils.nn as nn
import pandas as pd

torch.manual_seed(223)

def f(x, y):
    return [2*x, 4*y]

num_samples = 100
x_train = torch.randn(num_samples, 2) * 100
y_train = [f(x,y) for [x,y] in x_train]

model = nn.ReLUFNN(input_size=2, output_size=2, hidden_size=2, num_hidden_layers=1)
nn.train(model, x_train, y_train, save_path="models/eval_multiple_sets.pt")
db.load_pytorch_model_into_db(model)

## The query

We'll look at a query that handles two sets of input at the same time. The full
query looks like this:

```sql
WITH input_values AS (
    SELECT 0 AS input_set_id, 1 AS input_node_idx, ? AS input_value
    UNION 
    SELECT 0 AS input_set_id, 2 AS input_node_idx, ? AS input_value
    UNION
    SELECT 1 AS input_set_id, 1 AS input_node_idx, ? AS input_value
    UNION 
    SELECT 1 AS input_set_id, 2 AS input_node_idx, ? AS input_value
),
input_nodes AS (
    SELECT
        id,
        bias,
        ROW_NUMBER() OVER (ORDER BY id) AS input_node_idx
    FROM node
    WHERE id NOT IN
    (
        SELECT dst FROM edge
    )
),
t1 AS (
    SELECT
        v.input_set_id AS input_set_id,
        MAX(
            0,
            n.bias + SUM(e.weight * v.input_value)
        ) AS t1,
        e.dst AS id
    FROM edge e
    JOIN input_nodes i ON i.id = e.src
    JOIN node n ON e.dst = n.id
    JOIN input_values v ON i.input_node_idx = v.input_node_idx
    GROUP BY e.dst, n.bias, v.input_set_id
),
outputs AS (
    SELECT
        t1.input_set_id AS input_set_id,
        n.bias + SUM(e.weight * t1.t1) AS output_value,
        e.dst AS output_node_id
    FROM edge e
    JOIN t1 ON t1.id = e.src
    JOIN node n ON e.dst = n.id
    GROUP BY e.dst, n.bias, t1.input_set_id
)
SELECT * FROM outputs ORDER BY input_set_id, output_node_id;
```

The main difference with the previous iterations of the query is that we now add
a numeric index `input_set_id` to the input values. This index is propagated
through the $t_l$ queries and added to each aggregation. The result is a
relation with each input set and their corresponding output values.

The following code constructs this query by checking how many input sets are
passed and how many layers are present in the neural network.

In [2]:
# Input value is a list of input value sets
def eval_nn(model, input_value):
    num_layers = int(len(model.state_dict()) / 2)

    input_clauses = []
    for input_set, input in enumerate(input_value):
        for i,_ in enumerate(input):
            input_clauses.append(f"""
                SELECT
                    {input_set} AS input_set_id,
                    {i + 1} AS input_node_idx,
                    ? AS input_value
            """)

    query = f"""
        WITH input_values AS (
            {" UNION ".join(input_clauses)}
        ),
        input_nodes AS (
            SELECT
                id,
                bias,
                ROW_NUMBER() OVER (ORDER BY id) AS input_node_idx
            FROM node
            WHERE id NOT IN
            (
                SELECT dst FROM edge
            )
        ),
        t1 AS (
            SELECT
                v.input_set_id AS input_set_id,
                MAX(
                    0,
                    n.bias + SUM(e.weight * v.input_value)
                ) AS t1,
                e.dst AS id
            FROM edge e
            JOIN input_nodes i ON i.id = e.src
            JOIN node n ON e.dst = n.id
            JOIN input_values v ON i.input_node_idx = v.input_node_idx
            GROUP BY e.dst, n.bias, v.input_set_id
        ),
        """

    for hidden_layer in range(2, num_layers):
        curr = hidden_layer
        prev = hidden_layer - 1
        query += f"""
            t{curr} AS (
                SELECT
                    t{prev}.input_set_id AS input_set_id,
                    MAX(
                        0,
                        n.bias + SUM(e.weight * t{prev}.t{prev})
                    ) AS t{curr},
                    e.dst AS id
                FROM edge e
                JOIN t{prev} ON t{prev}.id = e.src
                JOIN node n ON e.dst = n.id
                GROUP BY e.dst, n.bias, t{prev}.input_set_id
            ),
        """

    prev = num_layers - 1
    query += f"""
        outputs AS (
            SELECT
                t{prev}.input_set_id AS input_set_id,
                n.bias + SUM(e.weight * t{prev}.t{prev}) AS output_value,
                e.dst AS output_node_id
            FROM edge e
            JOIN t{prev} ON t{prev}.id = e.src
            JOIN node n ON e.dst = n.id
            GROUP BY e.dst, n.bias, t{prev}.input_set_id
        )
        SELECT * FROM outputs ORDER BY input_set_id, output_node_id;
    """

    args = []
    for input_set in input_value:
        for value in input_set:
            args.append(value)

    results = [[] for _ in range(0, len(input_value))]
    for row in db.con.execute(query, args).fetchall():
        (input_set_id, output, output_node_id) = row
        results[input_set_id].append(output)

    return np.array(results)


We can now use this query to pass multiple input sets at the same time, The
query's output contains all output values.

In [3]:
nn_output = model(torch.tensor([[1, 10], [5,20]], dtype=torch.float32)).detach().numpy()
sql_output = eval_nn(model, [[1, 10], [5, 20]])

print(f"The neural network predicted {nn_output}")
print(f"The SQL query calculated {sql_output}")

The neural network predicted [[-13.467458  34.828564]
 [-24.508732  54.89358 ]]
The SQL query calculated [[-13.46745742  34.82856474]
 [-24.50873458  54.89358546]]


## Adding more layers

As we did in the previous chapter, let's try it with a slightly larger model as
well.

In [4]:
bigger_model = nn.ReLUFNN(input_size=2, output_size=2, hidden_size=10, num_hidden_layers=100)
nn.train(bigger_model, x_train, y_train, save_path="models/eval_multiple_sets_bigger.pt")
db.load_pytorch_model_into_db(bigger_model)

Comparing the results again:

In [5]:
nn_output = bigger_model(torch.tensor([[1, 10], [5,20]], dtype=torch.float32)).detach().numpy()
sql_output = eval_nn(bigger_model, [[1, 10], [5, 20]])

print(f"The neural network predicted {nn_output}")
print(f"The SQL query calculated {sql_output}")

The neural network predicted [[-33.277645 -32.772892]
 [-33.27765  -32.772896]]
The SQL query calculated [[-33.27764824 -32.77289471]
 [-33.27764824 -32.77289471]]


And finally, let's compare a lot of input values at once. Note that we did
something similar in the previous chapter, but there we had to pass multiple
input sets in a for-loop. With our new implementation, the code for evaluating
the model is almost identical to PyTorch.

In [6]:
df = pd.DataFrame(columns=['in_1', 'in_2', 'nn_out_1', 'nn_out_2', 'sql_out_1', 'sql_out_2'])

nn_output = bigger_model(x_train).detach().numpy()
sql_output = eval_nn(bigger_model, x_train.tolist())

for i in range(0, len(x_train)):
    df.loc[i] = [
        x_train[i][0], x_train[i][1],
        nn_output[i][0], nn_output[i][1],
        sql_output[i][0], sql_output[i][1]
    ]

delta_1_avg = (abs(df['nn_out_1'] - df['sql_out_1'])).mean()
delta_2_avg = (abs(df['nn_out_2'] - df['sql_out_2'])).mean()

print(f"The average difference for output value 1 is {delta_1_avg}")
print(f"The average difference for output value 2 is {delta_2_avg}")

The average difference for output value 1 is 1.9073486328125e-06
The average difference for output value 2 is 1.9073486328125e-06


## Conclusion

Now that our `eval` function in SQL is feature-complete, we'll finally take a
look at getting rid of the repetition in the query in the next chapter.
Specifically we'll use SQL's `WITH RECURSIVE` construct.