#SQL With Pandas
A notebook for doing some SQL examples of matrix operations with pandas. Keep it nice and simple.

In [1]:
import pandas as pd
import numpy as np
import pandasql

In [2]:
np.random.seed(4)

Suppose you had two matrices, A and B. For the sake of argument let's say they're both square matrices of size 5. Use  SQL syntax to represent them in a database and perform matrix multiplication.

In [3]:
## Set up two 5x5 matrices
data_dict = {"row":[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
                    0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4], 
             "col":[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4,
                    0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4], 
             "val":np.random.randint(10, size=50), 
             "matrix":["A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
                       "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"]}

In [4]:
data_df = pd.DataFrame(data_dict)

## Matrix Addition/Subtraction Operations
$$
\mathbf{A} \pm \mathbf{B} = \{ A_{ij} \pm B_{ij}\}_{i,j}^{M,N}
$$
For $i$ in $M$ rows and $j$ in $N$ columns

In [5]:
the_query = "SELECT A.row, A.col, A.val + B.val"
the_query += " FROM data_df as A, data_df as B"
the_query += " WHERE A.matrix = 'A' AND B.matrix = 'B'"
the_query += " AND A.row = B.row AND A.col = B.col"
the_query += " LIMIT 10;"
pandasql.sqldf(the_query, locals())

Unnamed: 0,row,col,A.val + B.val
0,0,0,15
1,0,1,7
2,0,2,6
3,0,3,16
4,0,4,8
5,1,0,10
6,1,1,9
7,1,2,9
8,1,3,15
9,1,4,10


In [6]:
M_a = np.matrix(data_dict["val"][:25]).reshape((5,5))
M_b = np.matrix(data_dict["val"][25:]).reshape((5,5))

In [7]:
M_a + M_b

matrix([[15,  7,  6, 16,  8],
        [10,  9,  9, 15, 10],
        [ 8,  9, 11,  6,  5],
        [ 8, 13,  5,  1,  7],
        [ 6, 14, 18,  6, 13]])

## Matrix Dot Product
For matrices **A** and **B**, 
$$\mathbf{A} = \begin{bmatrix} 
a_{11} & a_{12}\\
a_{21} & a_{22}\\
\end{bmatrix}$$

$$\mathbf{B} = \begin{bmatrix} 
b_{11} & b_{12}\\
b_{21} & b_{22}\\
\end{bmatrix}$$

\begin{align}
\mathbf{dot}(\mathbf{A},\mathbf{B}) &=& \mathbf{A}\cdot\mathbf{B} = \begin{bmatrix} 
a_{11}\cdot b_{11} + a_{12}\cdot b_{21} & a_{11}\cdot b_{12} + a_{12}\cdot b_{22} \\
a_{21}\cdot b_{11} + a_{22}\cdot b_{21} & a_{21}\cdot b_{12} + a_{22}\cdot b_{22} \\
\end{bmatrix}
\end{align}

In [8]:
the_query = "SELECT A.row, B.col, SUM(A.val * B.val)"
the_query += " FROM data_df as A, data_df as B"
the_query += " WHERE A.matrix = 'A' AND B.matrix = 'B'"
the_query += " AND A.col = B.row"
the_query += " GROUP BY A.row, B.col"
the_query += " LIMIT 10;"
pandasql.sqldf(the_query, locals())

Unnamed: 0,row,col,SUM(A.val * B.val)
0,0,0,90
1,0,1,184
2,0,2,117
3,0,3,106
4,0,4,74
5,1,0,98
6,1,1,156
7,1,2,144
8,1,3,105
9,1,4,90


In [9]:
np.dot(M_a, M_b)

matrix([[ 90, 184, 117, 106,  74],
        [ 98, 156, 144, 105,  90],
        [ 92, 131,  85, 148,  72],
        [ 66, 103, 102,  86,  76],
        [ 77, 153, 118, 104,  89]])

##Determinant
For matrix **A**, 
$$\mathbf{A} = \begin{bmatrix} 
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}\\
\end{bmatrix}$$

the determinant can be found as 
$$
\mathbf{det(A)} = a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{23}a_{31})
$$

In [10]:
#Setup a 2x2
np.random.seed(4)
matrix_a = {"row":[0,0,1,1],
            "col":[0,1,0,1],
            "val":np.random.randint(10, size=4)}
matrix_df = pd.DataFrame(matrix_a)

In [11]:
matrix_df

Unnamed: 0,col,row,val
0,0,0,7
1,1,0,5
2,0,1,1
3,1,1,8


In [12]:
the_query = "SELECT DISTINCT(("
the_query += "SELECT DISTINCT(A1.val * A2.val)"
the_query += " FROM matrix_df AS A1, matrix_df AS A2"
the_query += " WHERE A1.col=A1.row AND A2.col=A2.row AND A1.col IS NOT A2.row"
the_query += ") - ("
the_query += "SELECT DISTINCT(A1.val * A2.val)"
the_query += " FROM matrix_df AS A1, matrix_df AS A2"
the_query += " WHERE A1.col=A2.row AND A2.col=A1.row AND A1.row IS NOT A1.col AND A2.row IS NOT A2.col"
the_query += ")) AS determinant FROM matrix_df"

pandasql.sqldf(the_query, locals())

Unnamed: 0,determinant
0,51


In [13]:
M_a = np.matrix(matrix_a["val"].reshape((2,2)))
np.linalg.det(M_a)

51.0

I have no idea how to do a determinant for anything larger than a 2x2 using SQL