## Reduced Memory Multi-Pass (RMMP) Algorithm
*"Algorithms for Sparse Linear Classifiers in the Massive Data Setting"*, 2008 

![rmmp pseudocode](<images/rmmp-pseudocode.png>)

**modified shooting algorithm**

- goal: $\beta$ that satisfies $ \max_\beta (\beta ^ T \Psi \beta + \beta ^ T \theta - \gamma ||\beta||_1) $

- "The vector $\Omega$ in the algorithm is defined as $\Omega = 2 \Psi ' \beta + \theta$, where $\Psi '$ is the matrix $\Psi$ with its diagonal entries set to zero. This vector is related to the gradient of the differentiable part of the objective function and consequently can be used for optimality checking."

- "While one can think of numerous stopping criteria for the algorithm, in this paper we stop when successive iterates are sufficiently close to each other (relatively, and with respect to the L2). 
More precisely, we declare convergence whenever
$||\beta_i - \beta_{i-1}||_2 / ||\beta_{i-1}||_2$
is less than some user specified tolerance. Note that $\beta_i$ is the parameter vector at iteration $i$, which is obtained after cycling through and updating all $d$ components once.


![modified shooting pseudocode](images/shooting-pseudocode.png)

for $y_i=1$

![ai for yi=1](images/ai-for-yi-1.png)

![bi for yi=1](images/bi-for-yi-1.png)

for $y_i=0$

![ai and bi for yi=0](images/ai-bi-for-yi-0.png)

$c$^ $= \beta_{i-1}^T x_i$

$\Phi$ is the link function, either logistic or probit.

In [None]:
import numpy as np

# TODO
def quad_approximation(y, parameters, d):    
    ai = 0 # scalar representing the quadratic weight
    bi = np.zeros(d) # vector of size d representing the linear weight
    return ai, bi


# TODO
def modified_shooting(hessian_approx, linear_term_sum, parameters, tolerance):
    new_parameters = np.zeros_like(parameters)
    return new_parameters


def rmmp(X, y, selection_threshold, max_iters=1000, tolerance=1e-6):
    # number of examples and dimension
    t, d = X.shape
    
    parameters = np.zeros(d) # parameters of the regression model
    active_set = set() # components that are either non-zero and optimal or not optimal
    counter = 1 
    
    while counter <= max_iters:
        linear_term_sum = np.zeros(d)
        hessian_approx = np.zeros((d, d)) 
        
        for i in range(t):
            xi = X[i].reshape(-1, 1)
            yi = y[i]
            
            ai, bi = quad_approximation(yi, parameters, d)
            
            linear_term_sum += ai * (xi @ xi.T)
            hessian_approx += bi * xi
            importance_scores = np.abs(xi.T @ parameters) # TODO
            
            
        new_parameters = modified_shooting(hessian_approx, linear_term_sum, parameters, tolerance)
        active_set = {j for j in range(d) if importance_scores[j] >= selection_threshold}
        
        # TODO: check for convergence
        if np.linalg.norm(new_parameters - parameters) < tolerance:
            break
        
        parameters = new_parameters
        z += 1
    
    return parameters, active_set
            
   
