# Sum of Squares
https://en.wikipedia.org/wiki/Residual_sum_of_squares

RSS= \begin{equation*} 
\left( \sum_{i=1}^{n}(y_{i}-f(x_{i}))^{2} \right)
\end{equation*}

Let's create a vector X that is the original values; and a vector Xrec which are the recreated values of X. 


In [2]:
import numpy
X = numpy.random.random_integers(0,100, 100)   # this is an array of lenght 100, with values between 0 and 100. 
Xrec = X + (numpy.random.random_integers(-10,10,100)) # this adds a random integer between -10 and 10 to each element of X

To perform the above equation and determine the sum of squares we need to determine the differnce for each data point, then square those differences and then sum them. So: 

In [7]:
diff = X-Xrec
diffSquared = numpy.square(diff)
sumOfSquares = numpy.sum(diffSquared)
print('The sum of squares is: ' + str(sumOfSquares))

The sum of squares is: 3299


Alternatively. We can do some matrix math. 

The equation from the paper Says: 
\begin{equation*} 
\left( \sum_{n=1}^{p}(X_{n*p}-f(x_{n*p}))(X_{n*p}-f(x_{n*p}))' \right)
\end{equation*}

And matrix math says that we do MatrixA rows * matrix B columns and that we sum them. This way, we end up with matrix that is the size of the rows of matrix A and the size of the columns of matrix B. 

In this equation essentially in the first set of brackets we will have a n*p matrix, where p = number of datapoints in a waveform and n = participant. In the low back pain study X n*p was 318*51 because the waveforms were normalized to 51 points and because they had 106 participants (50 healthy; 56 low back pain) * 3 conditions = 318 observations (rows). 

So, in their case the matrix X is 318*51 with 318 = number of participants and 51 = number of data points. 
It says that the result is a 1xn matrix. So is this matrix calcualted and then the sum along each of the columns is performed. 

So, below I want to look at the idea of determining the sum of squares using matrix math

## Example
Here we take a row vector (assume these are differences or residuals). We first determine the square of each element (A2), and then we take the sum of A2 to get the sum of squares). 

In [6]:
A = numpy.array([1,2,3,4,5]) 
A2 = numpy.square(A)
sumOfA = numpy.sum(A2)
print (A)
print (A2)
print (sumOfA)

[1 2 3 4 5]
[ 1  4  9 16 25]
55


To get eh same result via just some math, we do the dot product of row vector A with its transpose. This yields the same result - 55. 

In [8]:
sumMatrixMath = numpy.dot(A,numpy.transpose(A)) #similarly here, if we just do the dot product of the row vector with its transpose we get the sum of squares. 
print(sumMatrixMath)

55


So... what happens when we make this a multidimensional array. 

In [10]:
B = numpy.array([[1,2,3,4,5], [6,7,8,9,10]])
B2 = numpy.square(B)
sumOfB = numpy.sum(B2)
print(B)
print(B2)
print(sumOfB)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[[  1   4   9  16  25]
 [ 36  49  64  81 100]]
385


In [11]:
sumMatrixMathB = numpy.dot(B,numpy.transpose(B))
print(sumMatrixMathB)

[[ 55 130]
 [130 330]]


So, now.... when we do this we get a 2x2. Essenntially, we end up with a square matrix the same size as the first dimension (rows) of the original array. The sum of squares here is actually the sum of the diagonals.

In [13]:
diagonalB = numpy.diagonal(sumMatrixMathB)
print(diagonalB)

[ 55 330]


In [14]:
sumDiagonalB = numpy.sum(diagonalB)
print(sumDiagonalB)

385


So. We see here that the resuls is the same. 

If we were to instead do the sum of each column of the result of the math we would get: 

In [16]:
sumRows = numpy.sum(sumMatrixMathB,1)
print(sumRows)

[185 460]


Quickly interpretting... this cant be the sum of squares for each individual trial (row) because one of these values is greater than the sum of squares for the entire matrix which we calculated above (385). Though, we can prove that the diagonal is the sum of squares for that respective trial

In [20]:
sumSquaresRow1 = numpy.dot(B[0,:], numpy.transpose(B[0,:]))
sumSquaresRow2 = numpy.dot(B[1,:], numpy.transpose(B[1,:]))
print('The sum of squares for row 1 is: ' + str(sumSquaresRow1))
print('The sum of squares for row 2 is: ' + str(sumSquaresRow2))


The sum of squares for row 1 is: 55
The sum of squares for row 2 is: 330


So, if you look at these resulsts and them go back to the matrix diaconalB which is printed above (In[13]:) you will see that these values (sumOfSquares) for the rows calcualted manually are the same as the diagonal of the matrix. 