online_psp offers a collection of computationally efficient algorithms for online subspace learning and principal component analysis, described in more detail in the
accompanying paper . There is an analogous MATLAB package with a subset of the provided functionality.
The module can be installed by adding the parent directory
online_psp to the Python path
and compiling the Cython portion by running
python setup.py build_ext --inplace
scikit-learn, matplotlib, numpy, pylab, cython
online_psp module contains implementations of four different algorithms
for streaming principal subspace projection:
- Incremental PCA (IPCA) 
- Candid, Covariance-free IPCA (CCIPCA) 
- Similarity Matching (SM) 
- Fast Similarity Matching (FSM) 
The implementations of these may be found in the
with accompanying demos in the
demo folder, e.g.,
The examples from the accompanying paper  are found in the
ex directory. These can take a long time to run.
Briefly, each algorithm is implemented as a class with a
fit_next method that
takes an additional data point and updates the estimate of the principal subspace.
For example, given a centered data matrix
X as a numpy array of size
(N,D) consisting of
N data points of size
D, we can get an estimate of the
K-dimensional principal subspace via
from online_psp.fast_similarity_matching import FSM fsm = FSM(K, D) # Fit subspace for x in X: fsm.fit_next(x) # Get vectors spanning subspace U = fsm.get_components()
While there are various heuristic initialization schemes implemented for each class, it is typically better to specify initialization, see the demos.
Additionally, if the default learning rates are used then we find it useful to scale the data such that the average L2 norm of a data point is one.
To aid in this, we provide a method
online_psp/util.py implement a simple framework for
evaluating the different algorithms by running them on both simulated data and real data in various configurations,
demo/demo_online_psp_simulations.py for a demo. Essentially,
contains a driver method
run_simulation(simulation_options, generator_options, algorithm_options)
where each parameter is a dictionary of specified options used to guide the simulation, the data generation, and the algorithm.
The primary purpose of these methods are to ease reproduction of the results in the accompanying paper  (see
but they are also easily extensible to incorporate, e.g., other algorithms one might want to test.
The following example datasets have been included in this package.
- Yale face dataset, originally from the Yale vision group but obtained in processed form elsewhere (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html)
- ORL database of faces from ATT Laboratories, Cambridge (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html)
- MNIST digits 
 Pehlevan, C., Sengupta, A. and Chklovskii, D.B. "Why do similarity matching objectives lead to Hebbian/anti-Hebbian networks?." Neural computation 30, no. 1 (2018): 84-124.
 Cardot, H., and Degras, D. "Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?." arXiv preprint arXiv:1511.03688 (2015).
 Arora, R., Cotter, A., Livescu, K. and Srebro, N., 2012, October. Stochastic optimization for PCA and PLS. In Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on (pp. 861-868). IEEE.
 Weng, J., Zhang, Y. and Hwang, W.S., 2003. Candid covariance-free incremental principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8), pp.1034-1040.
 LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998 “Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324.
 A forthcoming document to accompany this software.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.