## Least Square using QR decomposition

### Least squares problem: 

Finding approximate solutions of **over-determined** ( *the ${m} \times {n}$ matrix $A$ is tall, so the system of linear equations $Ax = b$, where $b$ is an $m$-vector, is over-determined, i.e., there are more equations ($m$) than variables to choose ($n$)*) systems of linear equations by minimizing the sum of the squares of the errors in the equations.

These equations have a solution only if $b$ is a linear combination of the columns of $A$.

For most choices of $b$, there is no $n$-vector $x$ for which $Ax = b$. As a compromise, we find an $x$ for which ***residual* $r = Ax - b$**, is as small as possible. Hence, we should choose $x$ so as to minimize the norm of the residual, $||Ax - b||$.

Minimizing the norm of the residual and its square are the same, so we can just as well minimize $ ||Ax −b||^2 = ||r||^2 = r_{1}^2  + ... +  r_{m}^2$ the sum of squares of the residuals.

The problem of finding an $n$-vector $\hat{x}$ that minimizes $||Ax - b||^2$ , over all possible choices of $x$, is called the ***least squares problem***.


In [2]:
import os
import random

In [3]:
def generate_dataset(r, c, seed = 42, sample = 0):
    '''
    Parameters
    ----------
    A : array-like, shape=[r, c]
    b : array-like, shape=[r]
    ''' 
    if sample == 1:
        # sample test-case
        A = [[1. , 1.],
             [1. , 1.],
             [1. , 0.]]
        b =  [1, 2, 3]
        """
        Q = [[0.5773502691896258, 0.4082482904638628], 
             [0.5773502691896258, 0.4082482904638628], 
             [0.5773502691896258, -0.8164965809277263]]
        R = [[1.7320508075688776, 1.1547005383792517],
             [-6.661338147750939e-16, 0.8164965809277256]]
        soln = [3.00, -1.50]
        """
        return A, b
    random.seed(seed)
    A = [[random.random() for i in range(c)] for j in range(r)]
    b = [random.random() for i in range(r)]
    return A, b

In [4]:
def leastSquare(A, b):
    """
    solve the least squares problem using QR decomposition followed by back-substitution
    Here, A is the matrix and b is the column vector
    """
    ### YOUR CODE HERE 
    m = len(A)
    n = len(A[0])
    Q = [[0.0] * n for i in range(m)]
    R = [[0.0] * n for i in range(n)]
    A = list(map(list,zip(*A)))
    newQ = list(map(list,zip(*Q)))
    
    for j in range(n):
        v = A[j]
        for i in range(j):
            q = newQ[i]
            R[i][j] = sum([q[k]*v[k] for k in range(len(v))])
            mid = [ R[i][j]*k for k in q ]
            v = [x1 - x2 for (x1, x2) in zip(v, mid)]

        norm = (sum([x**2 for x in v])) ** 0.5
        newQ[j] = [i/norm for i in v]
        R[j][j] = norm
        
    Q = list(map(list,zip(*newQ)))
    ##complete the following block of code
    #perform QR decomposition and obtain the "Q" and "R" matrices
    
    yield Q, R

    ### YOUR CODE HERE 
    #Rx=QTb
    #bn = multiply(newQ,b)
    bn = []                    #matrix vector multiplication
    for i in range(len(newQ)): #this loops through rows of the matrix
        total = 0
        for j in range(len(b)): #this loops through vector coordinates & rows of matrix
            total += b[j] * newQ[i][j]
        bn.append(total)
        
    #soln = backsub(R,bn)
    n = len(bn)
    xcomp = [0]*n
    for i in range(n-1, -1, -1):      #RnnXn = bn upto 1
        tmp = bn[i]
        for j in range(n-1, i, -1):
            tmp -= xcomp[j]*R[i][j]
        xcomp[i] = tmp / R[i][i]
    soln = xcomp
    ##complete the following block of code
    #perform back-substitution and obtain the least square solution, "soln" 
    
    yield soln
    

In [5]:
def main():
    seed = 42
    ## use sample = 1 to use the sample test-case
    A, b, = generate_dataset(3, 2, seed, sample=0)

    iterator =  leastSquare(A, b)
    
    Q, R = next(iterator)
    print("Q \n {}".format(Q))
    print("R \n {}".format(R))

    soln = next(iterator)
    print("Solution \n {}".format(soln))

In [6]:
main()

Q 
 [[0.6309969542971976, -0.7741874703998282], [0.2714034861408441, 0.16020067612520555], [0.7267619908733923, 0.6123475353465072]]
R 
 [[1.013359563946034, 0.5681613495503163], [0.0, 0.43077076114488627]]
Solution 
 [1.4260202529741812, -0.9713375053353163]
