-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LSE to Pipeline #817
Merged
Merged
Add LSE to Pipeline #817
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
4f08ccc
Adding LSE
daxpryce 4d5ef1d
Code formatter and import sorting
daxpryce 85d2513
Made sure the docs referenced lse and fixed the package structure to …
daxpryce 2e3d309
Form should have been passed through pipeline lse to core lse
daxpryce 576ba5b
Update graspologic/pipeline/embed/laplacian_spectral_embedding.py
daxpryce 014790a
Addressing PR feedback: updated typehints on graph to include ordered…
daxpryce 1dbbc61
Merge branch 'lse-pipeline' of github.com:microsoft/graspologic into …
daxpryce 3d47aa1
Merge branch 'dev' into lse-pipeline
daxpryce aab2007
Beartype was just added to dev so I pulled it in to LSE as well
daxpryce 5784034
Adding ValueError entry for the Raises section in the docs
daxpryce e9fc529
Updated the errors returned in Embeddings if you use the wrong types …
daxpryce f332e1f
Formatting
daxpryce File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
230 changes: 230 additions & 0 deletions
230
graspologic/pipeline/embed/laplacian_spectral_embedding.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT license. | ||
|
||
import numbers | ||
import warnings | ||
from typing import Optional, Union | ||
|
||
import networkx as nx | ||
import numpy as np | ||
from beartype import beartype | ||
|
||
from graspologic.embed import LaplacianSpectralEmbed | ||
from graspologic.preconditions import check_argument, is_real_weighted | ||
from graspologic.utils import is_fully_connected, pass_to_ranks, remove_loops | ||
|
||
from . import __SVD_SOLVER_TYPES # from the module init | ||
from ._elbow import _index_of_elbow | ||
from .embeddings import Embeddings | ||
|
||
__FORMS = ["DAD", "I-DAD", "R-DAD"] | ||
|
||
|
||
@beartype | ||
def laplacian_spectral_embedding( | ||
graph: Union[nx.Graph, nx.OrderedGraph, nx.DiGraph, nx.OrderedDiGraph], | ||
form: str = "R-DAD", | ||
dimensions: int = 100, | ||
elbow_cut: Optional[int] = None, | ||
svd_solver_algorithm: str = "randomized", | ||
svd_solver_iterations: int = 5, | ||
svd_seed: Optional[int] = None, | ||
weight_attribute: str = "weight", | ||
regularizer: Optional[numbers.Real] = None, | ||
) -> Embeddings: | ||
""" | ||
Given a directed or undirected networkx graph (*not* multigraph), generate an | ||
Embeddings object. | ||
|
||
The laplacian spectral embedding process is similar to the adjacency spectral | ||
embedding process, with the key differentiator being that the LSE process looks | ||
further into the latent space when it captures changes, whereas the ASE process | ||
is egocentric and focused on immediate differentiators in a node's periphery. | ||
|
||
All weights will be rescaled based on their relative rank in the graph, | ||
which is beneficial in minimizing anomalous results if some edge weights are | ||
extremely atypical of the rest of the graph. | ||
|
||
Parameters | ||
---------- | ||
graph : Union[nx.Graph, nx.OrderedGraph, nx.DiGraph, nx.OrderedDiGraph] | ||
An undirected or directed graph. The graph **must**: | ||
|
||
- be fully numerically weighted (every edge must have a real, numeric weight | ||
or else it will be treated as an unweighted graph) | ||
- be a basic graph (meaning it should not be a multigraph; if you have a | ||
multigraph you must first decide how you want to handle the weights of the | ||
edges between two nodes, whether summed, averaged, last-wins, | ||
maximum-weight-only, etc) | ||
form : str (default="R-DAD") | ||
Specifies the type of Laplacian normalization to use. Allowed values are: | ||
{ "DAD", "I-DAD", "R-DAD" } | ||
dimensions : int (default=100) | ||
Dimensions to use for the svd solver. | ||
For undirected graphs, if ``elbow_cut==None``, you will receive an embedding | ||
that has ``nodes`` rows and ``dimensions`` columns. | ||
For directed graphs, if ``elbow_cut==None``, you will receive an embedding that | ||
has ``nodes`` rows and ``2*dimensions`` columns. | ||
If ``elbow_cut`` is specified to be not ``None``, we will cut the embedding at | ||
``elbow_cut`` elbow, but the provided ``dimensions`` will be used in the | ||
creation of the SVD. | ||
elbow_cut : Optional[int] (default=None) | ||
Using a process described by Zhu & Ghodsi in their paper "Automatic | ||
dimensionality selection from the scree plot via the use of profile likelihood", | ||
truncate the dimensionality of the return on the ``elbow_cut``-th elbow. | ||
By default this value is ``None`` but can be used to reduce the dimensionality | ||
of the returned tensors. | ||
svd_solver_algorithm : str (default="randomized") | ||
allowed values: {'randomized', 'full', 'truncated'} | ||
|
||
SVD solver to use: | ||
|
||
- 'randomized' | ||
Computes randomized svd using | ||
:func:`sklearn.utils.extmath.randomized_svd` | ||
- 'full' | ||
Computes full svd using :func:`scipy.linalg.svd` | ||
Does not support ``graph`` input of type scipy.sparse.csr_matrix | ||
- 'truncated' | ||
Computes truncated svd using :func:`scipy.sparse.linalg.svds` | ||
svd_solver_iterations : int (default=5) | ||
Number of iterations for randomized SVD solver. Not used by 'full' or | ||
'truncated'. The default is larger than the default in randomized_svd | ||
to handle sparse matrices that may have large slowly decaying spectrum. | ||
svd_seed : Optional[int] (default=None) | ||
Used to seed the PRNG used in the ``randomized`` svd solver algorithm. | ||
weight_attribute : str (default="weight") | ||
The edge dictionary key that contains the weight of the edge. | ||
regularizer : Optional[numbers.Real] (default=None) | ||
Only used when form="R-DAD". Must be None or nonnegative. | ||
Constant to be added to the diagonal of degree matrix. If None, average | ||
node degree is added. If int or float, must be >= 0. | ||
|
||
Returns | ||
------- | ||
Embeddings | ||
|
||
Raises | ||
------ | ||
beartype.roar.BeartypeCallHintPepParamException if parameters do not match type hints | ||
ValueError if values are not within appropriate ranges or allowed values | ||
|
||
See Also | ||
-------- | ||
graspologic.pipeline.embed.Embeddings | ||
graspologic.embed.LaplacianSpectralEmbed | ||
graspologic.embed.select_svd | ||
graspologic.utils.to_laplacian | ||
|
||
Notes | ||
----- | ||
The singular value decomposition: | ||
|
||
.. math:: A = U \Sigma V^T | ||
|
||
is used to find an orthonormal basis for a matrix, which in our case is the | ||
Laplacian matrix of the graph. These basis vectors (in the matrices U or V) are | ||
ordered according to the amount of variance they explain in the original matrix. | ||
By selecting a subset of these basis vectors (through our choice of dimensionality | ||
reduction) we can find a lower dimensional space in which to represent the graph. | ||
|
||
References | ||
---------- | ||
.. [1] Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A | ||
Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," | ||
Journal of the American Statistical Association, Vol. 107(499), 2012. | ||
|
||
.. [2] Von Luxburg, Ulrike. "A tutorial on spectral clustering," Statistics | ||
and computing, Vol. 17(4), pp. 395-416, 2007. | ||
|
||
.. [3] Rohe, Karl, Sourav Chatterjee, and Bin Yu. "Spectral clustering and | ||
the high-dimensional stochastic blockmodel," The Annals of Statistics, | ||
Vol. 39(4), pp. 1878-1915, 2011. | ||
|
||
.. [4] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the | ||
scree plot via the use of profile likelihood. Computational Statistics & Data | ||
Analysis, 51(2), pp.918-930. | ||
|
||
""" | ||
check_argument( | ||
form in __FORMS, f"form must be one of the values in {','.join(__FORMS)}" | ||
) | ||
|
||
check_argument(dimensions >= 1, "dimensions must be positive") | ||
|
||
check_argument(elbow_cut is None or elbow_cut >= 1, "elbow_cut must be positive") | ||
|
||
check_argument( | ||
svd_solver_algorithm in __SVD_SOLVER_TYPES, | ||
f"svd_solver_algorithm must be one of the values in {','.join(__SVD_SOLVER_TYPES)}", | ||
) | ||
|
||
check_argument(svd_solver_iterations >= 1, "svd_solver_iterations must be positive") | ||
|
||
check_argument( | ||
svd_seed is None or 0 <= svd_seed <= 2 ** 32 - 1, | ||
"svd_seed must be a nonnegative, 32-bit integer", | ||
) | ||
|
||
check_argument( | ||
regularizer is None or regularizer >= 0, "regularizer must be nonnegative" | ||
) | ||
|
||
check_argument( | ||
not graph.is_multigraph(), | ||
"Multigraphs are not supported; you must determine how to represent at most " | ||
"one edge between any two nodes, and handle the corresponding weights " | ||
"accordingly", | ||
) | ||
|
||
if not is_real_weighted(graph, weight_attribute=weight_attribute): | ||
warnings.warn( | ||
f"Graphs with edges that do not have a real numeric weight set for every " | ||
f"{weight_attribute} attribute on every edge are treated as an unweighted " | ||
f"graph - which presumes all weights are `1.0`. If this is incorrect, " | ||
f"please add a '{weight_attribute}' attribute to every edge with a real, " | ||
f"numeric value (e.g. an integer or a float) and call this function again." | ||
) | ||
weight_attribute = None # this supercedes what the user said, because | ||
# not all of the weights are real numbers, if they exist at all | ||
# this weight=1.0 treatment actually happens in nx.to_scipy_sparse_matrix() | ||
|
||
node_labels = np.array(list(graph.nodes())) | ||
|
||
graph_as_csr = nx.to_scipy_sparse_matrix( | ||
graph, weight=weight_attribute, nodelist=node_labels | ||
) | ||
|
||
if not is_fully_connected(graph): | ||
warnings.warn("More than one connected component detected") | ||
|
||
graph_sans_loops = remove_loops(graph_as_csr) | ||
|
||
ranked_graph = pass_to_ranks(graph_sans_loops) | ||
|
||
embedder = LaplacianSpectralEmbed( | ||
form=form, | ||
n_components=dimensions, | ||
n_elbows=None, # in the short term, we do our own elbow finding | ||
algorithm=svd_solver_algorithm, | ||
n_iter=svd_solver_iterations, | ||
svd_seed=svd_seed, | ||
concat=False, | ||
) | ||
results = embedder.fit_transform(ranked_graph) | ||
|
||
if elbow_cut is None: | ||
if graph.is_directed(): | ||
results = np.concatenate(results, axis=1) | ||
else: | ||
column_index = _index_of_elbow(embedder.singular_values_, elbow_cut) | ||
if graph.is_directed(): | ||
left, right = results | ||
left = left[:, :column_index] | ||
right = right[:, :column_index] | ||
results = np.concatenate((left, right), axis=1) | ||
else: | ||
results = results[:, :column_index] | ||
|
||
embeddings = Embeddings(node_labels, results) | ||
return embeddings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I forgot about this.