# Choose Columns

Notebook on how to choose collumns of a matrix to reduce the rank/ reduce the nuclear norm

In [None]:
import numpy as np
import scipy.stats 
import matplotlib.pyplot as plt
#%matplotlib notebook
plt.rcParams['figure.dpi'] = 100

In [None]:
N = 10
A = np.random.rand(N,2*N)
#A = np.random.rand(N,2*N)
plt.matshow(A)

Start with baseline

In [None]:
U,s_base,Vt = np.linalg.svd(A)
plt.plot(s_base,'x')

# Predict singular values

Instead of calculating the singular values for every combination we try to predict how the singular values change if we remove or add a column

## Derivative of singular values


This is done by calcualting

$$\mathrm{D}_a \lambda_i(A)$$

where $a$ is the last column of $A$.

First we construct 
$$\mathrm{D}_{a'} [A|a+a']^\top [A|a+a'] = \begin{bmatrix}0 &A^\top a \\ a^\top A& 2a^\top a\end{bmatrix}$$

Then we can get the derivative
$$\mathrm{D}_{a'} \lambda_i( [A|a+a']^\top [A|a+a'] )
=
v_i^\top \begin{bmatrix}0 &A^\top a \\ a^\top A& 2a^\top a\end{bmatrix} v_i$$
where $v_i$ is the accoridng eigenvalue (or in our case the right singular vector)



In [None]:
A_ = A.copy()
#A_[:,-1]= 0.5*A_[:,-1] #some place for changes
a = A_[:,-1:]
a_prime = -A_[:,-1:]
U,s,Vt = np.linalg.svd(A_)
G = np.block([
    [np.zeros((2*N-1,2*N-1)),A[:,:-1].T@a_prime],
    [a_prime.T@A[:,:-1],2*a.T@a_prime]
    ])

#estimate D lambda
d = np.zeros(N)
for i in range(N):
    v = Vt[i,:]
    d[i]= v@G@v
    
#calcualte D sigma
d = d/(2*s) #comes from dervative of sqrt
d_right=d
s_estimate = s+d

# alterantive tests
#s_estimate = s_base-2*d


#go over the eigenvalues
#l_est = s**2-d
#s_estimate = np.sqrt(l_est)

In [None]:
U_new,s_new,Vt_new = np.linalg.svd(A[:,:-1])
plt.plot(s_base,'1',label='old')
plt.plot(s_new,'2',label='new')
plt.plot(s_estimate,'+',label='estimate')
plt.legend()

In [None]:
d

### Add column

completly adding a column can not be cacluated using this approach, as for this $a$ is zero which means that the calcaulated derivative is 0.
Instead we scale $a$ to a small value and go on from there

But apparently this approach sucks as the rescaling basically removes the deriative, one might rescale it back but it is hard to give a justified solution...

In [None]:
A_ = A.copy()
g = 0.25
a_orig = A_[:,-1]
A_[:,-1]= g*a_orig #some place for changes
a = A_[:,-1:]
a_prime = ((1-g)*a_orig).reshape(-1,1)
U,s,Vt = np.linalg.svd(A_)

G = a_prime@a.T + a@a_prime.T

#estimate D lambda
d = np.zeros(N)
for i in range(N):
    u = U[:,i]
    d[i]= u@G@u
    
#calcualte D sigma
d = d/(2*s) #comes from dervative of sqrt
d_left = d
s_estimate = s+d


In [None]:
#%matplotlib notebook
U_old,s_old,Vt_old = np.linalg.svd(A[:,:-1])
U_new,s_new,Vt_new = np.linalg.svd(A)
plt.plot(s_old,'1',label='old')
plt.plot(s_new,'2',label='new')
plt.plot(s_estimate,'+',label='estimate')
plt.plot(s,'x',label="interemdiate")
plt.legend()

## Aproximate singular values using cutting approximation

The gernal ideas it that we have the matrix 

$$U \Sigma [V^\top \rho] = [A a]$$

Then we cut the last collumn of the matrix.
Thsi is equivalent to cutting the $a$ or $\rho$.

This destroßs the orthonormality of $V$.
We restore the normality by reacaling the rows of $V$ to be 1.
Thsi gives 

$$A = U \Sigma S \bar{V}^\top$$
where $S$ is a diagonal matrix where $s_i = \frac{1}{\|v_i\|}$

This gives new estimated singluar values

$$\bar{\sigma}_i = \sigma_i s_i$$

In [None]:
A_ = A.copy()
#A_[:,-1]= 0.5*A_[:,-1] #some place for changes

U,sigma,Vt = np.linalg.svd(A_,full_matrices=False)

s = np.linalg.norm(Vt[:,:-1],axis=1)

s_estimate = sigma*s


In [None]:
U_old,s_old,Vt_old = np.linalg.svd(A)
U_new,s_new,Vt_new = np.linalg.svd(A[:,:-1])
plt.plot(s_old,'1',label='old')
plt.plot(s_new,'2',label='new')
plt.plot(s_estimate,'+',label='estimate')
plt.legend()
#plt.gca().set_yscale('log',basey=10)


### Add column

Here we want to add a column uisng a similar strategy:
We start with the SVD

$$A = U\Sigma V^\top$$

The idea is that we use the relation

$$[A|a] = [U\Sigma V^\top|a] = U \Sigma [V^\top|\Sigma^{-1} U^\top a] + \bar{a} [0,\dots,0, 1]$$

And then again rescale the new $V^\top$

The vector $\bar{a}$ is the preojector on the orthogonal complement of the range of the original matrix.
The vector $[0,\dots,0, 1]$ is a new vector in $V$. If parts of $a$ are in the range of $A$, then this is inherintly non orthogonal, but we probably have to live with it....

In [None]:
A_ = A.copy()
a = A_[:,-1]
A_[:,-1]= 0 #some place for changes

U,sigma,Vt = np.linalg.svd(A_,full_matrices=False)

#attach the new vector in v
r = np.count_nonzero(sigma>1e-16)
v_prime = np.zeros_like(Vt[:,-1])
v_prime[:r] = (U[:,:r].T@a)/sigma[:r]

s = np.linalg.norm(np.hstack([Vt[:,:-1],v_prime.reshape(-1,1)]),axis=1)

s_estimate =sigma*s
if r <min(A.shape):
    sigma_new = np.linalg.norm(U[:,r:].T@a)
    if len(sigma) >r:
    
        s_estimate[r]=sigma_new
    else:
        s_estimate = np.append(sigma*s,sigma_new)


In [None]:
%matplotlib notebook
U_old,s_old,Vt_old = np.linalg.svd(A[:,:-1])
U_new,s_new,Vt_new = np.linalg.svd(A)
plt.plot(s_old,'1',label='old')
plt.plot(s_new,'2',label='new')
plt.plot(s_estimate,'+',label='estimate')
plt.legend()
#plt.gca().set_yscale('log',basey=10)
