You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Write to @jmcq89 or @mmp2 if you would like to contribute
This list is only very slightly prioritized (i.e we think the first two tasks are the most important currently). There are almost no dependencies between tasks, so any task can be undertaken at any time.
lazy R-metric evaluation (small to moderate) (Xiao Wang, UW)
selecting the neighborhood radius (moderate to large) (@jmcq89)
Requires: notions of high dimensional geometry, local weighted PCA, writing some visualization tools
Resources: matlab code already written, can be "directly" ported to python (Potential for a publication too)
directed graph embedding (moderate or possibly small, if no visualization is included)
This is a fun project, especially if you add the specific visualization tools that would make the results shine. There exists a matlab implementation and a published paper. All the tools needed are already in megaman.
Principal curves and surfaces after Ozertem and Erdogmus JMLR (moderate if no scalability required, large otherwise) Matlab code exists
applications of manifold learning to various data sets and problems (small, moderate or large). Below is a sample list.
spectra of galaxies
representations obtained by deep neural networks
musical recordings
brain activity recordings
hand movement data (possibly other robotic data)
outputs of MCMC runs
BYOD
applications related to the other tasks below, e.g GP regression, directed embedding, spectral clustering of networks
Nystrom extension embedding new points into the existing coordinate system (moderate) (Xiao Wang)
Requires - linear algebra, some reading
dimension estimation (moderate to large)
This is more than an implementation task, although just implementing existing methods is a possibility. Best done in conjunction with reading research papers. High potential for resulting in a publication.
manifold represented by *patches *(large, probably a research project)
implement distance and area computations (moderate)
Shortest path distances on the graph, corrected by the Rmetric. Some matlab code exists. Some independence and experimentation required as there are subtle aspects to this shortest path problem.
Area computation would be nice but is secondary, could be a separate project.
implement gaussian process regression on a manifold (large)
Matlab code exists. Good understanding of math and computational linear algebra necessary. Also some basics of machine learning, e.g semi-supervised learning; these could be acquired.
To investigate if one can use existing GP packages (george) or implement from scratch (using computational linear algebra tools)
spectral clustering for millions of points (moderate) (Xiao Wang)
Requires: using k-means (from sklearn), some understanding of spectral clustering (there are tutorials), and of k-means. (Possible extension, not done yet: build a small library of similarity functions.)
k-means initializations K-log K initialization, kmeans++ (Hui Pang)
Visualization tools (some are related to various tasks above) - small if otherwise noted
covar_plotter3 a 3D covar_plotter to display the R-metric with 3D embeddings
locally isometric visualization (rescale the data so that R-metric is identity at one fixed point, display it)
display a vector field on a manifold
display a point cloud without outliers
The text was updated successfully, but these errors were encountered:
Write to @jmcq89 or @mmp2 if you would like to contribute
This list is only very slightly prioritized (i.e we think the first two tasks are the most important currently). There are almost no dependencies between tasks, so any task can be undertaken at any time.
Requires: notions of high dimensional geometry, local weighted PCA, writing some visualization tools
Resources: matlab code already written, can be "directly" ported to python (Potential for a publication too)
This is a fun project, especially if you add the specific visualization tools that would make the results shine. There exists a matlab implementation and a published paper. All the tools needed are already in megaman.
Requires - linear algebra, some reading
This is more than an implementation task, although just implementing existing methods is a possibility. Best done in conjunction with reading research papers. High potential for resulting in a publication.
Shortest path distances on the graph, corrected by the Rmetric. Some matlab code exists. Some independence and experimentation required as there are subtle aspects to this shortest path problem.
Area computation would be nice but is secondary, could be a separate project.
Matlab code exists. Good understanding of math and computational linear algebra necessary. Also some basics of machine learning, e.g semi-supervised learning; these could be acquired.
To investigate if one can use existing GP packages (george) or implement from scratch (using computational linear algebra tools)
Requires: using k-means (from sklearn), some understanding of spectral clustering (there are tutorials), and of k-means. (Possible extension, not done yet: build a small library of similarity functions.)
The text was updated successfully, but these errors were encountered: