Merge branch 'dev' into ldt-parallel

graspologic-org · May 19, 2021 · bcc4466 · bcc4466
2 parents ad04089 + 4d97391
commit bcc4466
Show file tree

Hide file tree

Showing 6 changed files with 375 additions and 143 deletions.
diff --git a/docs/tutorials/embedding/OutOfSampleEmbed.ipynb b/docs/tutorials/embedding/OutOfSampleEmbed.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Suppose we've embedded the nodes of a graph into Euclidean space using Adjacency Spectral Embedding (ASE).  \n",
+    "Suppose we've embedded the nodes of a graph into Euclidean space using Adjacency Spectral Embedding (ASE) or Laplacian Spectral Embeding (LSE).  \n",
     "Then, suppose we gain access to new nodes not seen in the original graph. We sometimes wish to determine their latent positions without the computationally-expensive task of re-embedding an entirely new adjacency matrix.\n",
     "\n",
     "More formally, suppose we have computed the embedding $\\hat{X} \\in \\textbf{R}^{n \\times d}$ from some adjacency matrix $A \\in \\textbf{R}^{n \\times n}$.  \n",
@@ -21,7 +21,7 @@
     "\n",
     "$W \\in \\textbf{R}^{m \\times n}$ is a matrix with each row being an adjacency vector, for $m$ out-of-sample vertices.\n",
     "\n",
-    "We can obtain this estimation with ASE's `transform` method.  \n",
+    "We can obtain this estimation with ASE's or LSE's `transform` method.  \n",
     "Running through the Adjacency Spectral Embedding tutorial is recommended prior to this tutorial."
    ]
   },
@@ -40,10 +40,11 @@
     "\n",
     "from graspologic.simulations import sbm\n",
     "from graspologic.embed import AdjacencySpectralEmbed as ASE\n",
+    "from graspologic.embed import LaplacianSpectralEmbed as LSE\n",
     "from graspologic.plot import heatmap, pairplot\n",
     "from graspologic.utils import remove_vertices\n",
     "\n",
-    "np.random.seed(9002)\n",
+    "np.random.seed(1234)\n",
     "import warnings\n",
     "warnings.filterwarnings('ignore')"
    ]
@@ -92,7 +93,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Embedding"
+    "### Embedding (ASE)"
    ]
   },
   {
@@ -110,11 +111,11 @@
    "source": [
     "# Generate an embedding with ASE\n",
     "ase = ASE(n_components=2)\n",
-    "X_hat = ase.fit_transform(A)\n",
+    "X_hat_ase = ase.fit_transform(A)\n",
     "\n",
     "# predicted latent positions\n",
-    "w = ase.transform(a)\n",
-    "w"
+    "w_ase = ase.transform(a)\n",
+    "w_ase"
    ]
   },
   {
@@ -159,7 +160,39 @@
     "    return plot\n",
     "\n",
     "# Plot all latent positions\n",
-    "plot_oos(X_hat, w, labels=labels, oos_labels=[0], title=\"Out-of-Sample Embeddings (2-block SBM)\");"
+    "plot_oos(X_hat_ase, w_ase, labels=labels, oos_labels=[0], title=\"ASE Out-of-Sample Embeddings (2-block SBM)\");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Embeding (LSE)\n",
+    "Similarly, we can also use Laplacian Spectral Embedding (LSE). We generate an embedding with its transform method to determine our best estimate for the latent position of the out-of-sample vertex."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate an embedding with ASE\n",
+    "lse = LSE(n_components=2)\n",
+    "X_hat_lse = lse.fit_transform(A)\n",
+    "\n",
+    "# predicted latent positions\n",
+    "w_lse = lse.transform(a)\n",
+    "w_lse"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_oos(X_hat_lse, w_lse, labels=labels, oos_labels=[0], title=\"LSE Out-of-Sample Embeddings (2-block SBM)\");"
    ]
   },
   {
@@ -200,15 +233,34 @@
    "source": [
     "# Generate an embedding with ASE\n",
     "ase = ASE(n_components=2)\n",
-    "X_hat = ase.fit_transform(A)\n",
+    "X_hat_ase = ase.fit_transform(A)\n",
+    "\n",
+    "# predicted latent positions\n",
+    "w_ase = ase.transform(a)\n",
+    "print(f\"The out-of-sample prediction output has dimensions {w_ase.shape}\\n\")\n",
+    "\n",
+    "# Plot all latent positions\n",
+    "plot_oos(X_hat_ase, w_ase, labels, oos_labels=oos_labels,\n",
+    "         title=\"ASE Out-of-Sample Embeddings (2-block SBM)\");"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate an embedding with LSE\n",
+    "lse = LSE(n_components=2)\n",
+    "X_hat_lse = lse.fit_transform(A)\n",
     "\n",
     "# predicted latent positions\n",
-    "w = ase.transform(a)\n",
-    "print(f\"The out-of-sample prediction output has dimensions {w.shape}\\n\")\n",
+    "w_lse = lse.transform(a)\n",
+    "print(f\"The out-of-sample prediction output has dimensions {w_lse.shape}\\n\")\n",
     "\n",
     "# Plot all latent positions\n",
-    "plot_oos(X_hat, w, labels, oos_labels=oos_labels,\n",
-    "         title=\"Out-of-Sample Embeddings (2-block SBM)\");"
+    "plot_oos(X_hat_lse, w_lse, labels, oos_labels=oos_labels,\n",
+    "         title=\"LSE Out-of-Sample Embeddings (2-block SBM)\");"
    ]
   },
   {
@@ -252,20 +304,20 @@
    "outputs": [],
    "source": [
     "# Fit our directed graph\n",
-    "X_hat, Y_hat = ase.fit_transform(A)\n",
+    "X_hat_ase, Y_hat_ase = ase.fit_transform(A)\n",
     "\n",
     "# predicted latent positions\n",
-    "w = ase.transform(a)\n",
-    "print(f\"output of `ase.transform(a)` is {type(w)}\", \"\\n\")\n",
-    "print(f\"out latent positions: \\n{w[0]}\\n\")\n",
-    "print(f\"in latent positions: \\n{w[1]}\")"
+    "w_ase = ase.transform(a)\n",
+    "print(f\"output of `ase.transform(a)` is {type(w_ase)}\", \"\\n\")\n",
+    "print(f\"out latent positions: \\n{w_ase[0]}\\n\")\n",
+    "print(f\"in latent positions: \\n{w_ase[1]}\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Plotting directed latent predictions"
+    "### Plotting directed ASE latent predictions"
    ]
   },
   {
@@ -274,8 +326,41 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plot_oos(X_hat, w[0], labels, oos_labels=oos_labels, title=\"Out Latent Predictions\")\n",
-    "plot_oos(Y_hat, w[1], labels, oos_labels=oos_labels, title=\"In Latent Predictions\")"
+    "plot_oos(X_hat_ase, w_ase[0], labels, oos_labels=oos_labels, title=\"ASE Out Latent Predictions\")\n",
+    "plot_oos(Y_hat_ase, w_ase[1], labels, oos_labels=oos_labels, title=\"ASE In Latent Predictions\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fit our directed graph\n",
+    "X_hat_lse, Y_hat_lse = lse.fit_transform(A)\n",
+    "\n",
+    "# predicted latent positions\n",
+    "w_lse = lse.transform(a)\n",
+    "print(f\"output of `ase.transform(a)` is {type(w_lse)}\", \"\\n\")\n",
+    "print(f\"out latent positions: \\n{w_lse[0]}\\n\")\n",
+    "print(f\"in latent positions: \\n{w_lse[1]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Plotting directed LSE latent predictions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_oos(X_hat_lse, w_lse[0], labels, oos_labels=oos_labels, title=\"LSE Out Latent Predictions\")\n",
+    "plot_oos(Y_hat_lse, w_lse[1], labels, oos_labels=oos_labels, title=\"LSE In Latent Predictions\")"
    ]
   },
   {
@@ -320,12 +405,27 @@
    "outputs": [],
    "source": [
     "# Embed and transform\n",
-    "X_hat, Y_hat = ase.fit_transform(A)\n",
-    "w = ase.transform(a)\n",
+    "X_hat_ase, Y_hat_ase = ase.fit_transform(A)\n",
+    "w_ase = ase.transform(a)\n",
+    "\n",
+    "# Plot\n",
+    "plot_oos(X_hat_ase, w_ase[0], labels, oos_labels=oos_labels, title=\"ASE Out Latent Predictions\")\n",
+    "plot_oos(Y_hat_ase, w_ase[1],labels, oos_labels=oos_labels, title=\"ASE In Latent Predictions\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Embed and transform\n",
+    "X_hat_lse, Y_hat_lse = lse.fit_transform(A)\n",
+    "w_lse = lse.transform(a)\n",
     "\n",
     "# Plot\n",
-    "plot_oos(X_hat, w[0], labels, oos_labels=oos_labels, title=\"Out Latent Predictions\")\n",
-    "plot_oos(Y_hat, w[1],labels, oos_labels=oos_labels, title=\"In Latent Predictions\")"
+    "plot_oos(X_hat_lse, w_lse[0], labels, oos_labels=oos_labels, title=\"LSE Out Latent Predictions\")\n",
+    "plot_oos(Y_hat_lse, w_lse[1],labels, oos_labels=oos_labels, title=\"LSE In Latent Predictions\")"
    ]
   }
  ],
@@ -345,9 +445,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.6.13"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/graspologic/embed/ase.py b/graspologic/embed/ase.py
@@ -1,19 +1,8 @@
 # Copyright (c) Microsoft Corporation and contributors.
 # Licensed under the MIT License.
 
-import warnings
-import numpy as np
-from sklearn.utils.validation import check_is_fitted
-import networkx as nx
-
 from .base import BaseSpectralEmbed
-from ..utils import (
-    import_graph,
-    is_fully_connected,
-    augment_diagonal,
-    pass_to_ranks,
-    is_unweighted,
-)
+from ..utils import augment_diagonal
 
 
 class AdjacencySpectralEmbed(BaseSpectralEmbed):
@@ -70,14 +59,16 @@ class AdjacencySpectralEmbed(BaseSpectralEmbed):
         to the ground truth.
 
     concat : bool, optional (default False)
-        If graph is directed, whether to concatenate left and right (out and in) latent positions along axis 1.
+        If graph is directed, whether to concatenate left and right (out and in) latent
+        positions along axis 1.
 
 
 
     Attributes
     ----------
     n_features_in_: int
-        Number of features passed to the :func:`~graspologic.embed.AdjacencySpectralEmbed.fit` method.
+        Number of features passed to the
+        :func:`~graspologic.embed.AdjacencySpectralEmbed.fit` method.
     latent_left_ : array, shape (n_samples, n_components)
         Estimated left latent positions of the graph.
     latent_right_ : array, shape (n_samples, n_components), or None
@@ -161,93 +152,23 @@ def fit(self, graph, y=None):
 
         self._reduce_dim(A)
 
-        # for out-of-sample
-        inv_eigs = np.diag(1 / self.singular_values_)
-        self._pinv_left = self.latent_left_ @ inv_eigs
-        if self.latent_right_ is not None:
-            self._pinv_right = self.latent_right_ @ inv_eigs
-
         self.is_fitted_ = True
+
         return self
 
-    def transform(self, X):
+    def _compute_oos_prediction(self, X, directed):
         """
-        Obtain latent positions from an adjacency matrix or matrix of out-of-sample
-        vertices. For more details on transforming out-of-sample vertices, see the
-        :ref:`tutorials <embed_tutorials>`. For mathematical background, see [2].
-
+        Computes the out-of-sample latent position estimation.
         Parameters
         ----------
-        X : array-like or tuple, original shape or (n_oos_vertices, n_vertices).
-
-            The original fitted matrix ("graph" in fit) or new out-of-sample data.
-            If ``X`` is the original fitted matrix, returns a matrix close to
-            ``self.fit_transform(X)``.
-
-            If ``X`` is an out-of-sample matrix, n_oos_vertices is the number
-            of new vertices, and n_vertices is the number of vertices in the
-            original graph. If tuple, graph is directed and ``X[0]`` contains
-            edges from out-of-sample vertices to in-sample vertices.
-
+        X: np.ndarray
+            Input to do oos embedding on.
+        directed: bool
+            Indication if graph is directed or undirected
         Returns
         -------
-        array_like or tuple, shape (n_oos_vertices, n_components)
-        or (n_vertices, n_components).
-
-            Array of latent positions. Transforms the fitted matrix if it was passed
-            in.
-
-            If ``X`` is an array or tuple containing adjacency vectors corresponding to
-            new nodes, returns the estimated latent positions for the new out-of-sample
-            adjacency vectors.
-            If undirected, returns array.
-            If directed, returns ``(X_out, X_in)``, where ``X_out`` contains
-            latent positions corresponding to nodes with edges from out-of-sample
-            vertices to in-sample vertices.
-
-        Notes
-        -----
-        If the matrix was diagonally augmented (e.g., ``self.diag_aug`` was True), ``fit``
-        followed by ``transform`` will produce a slightly different matrix than
-        ``fit_transform``.
-
-        To get the original embedding, using ``fit_transform`` is recommended. In the
-        directed case, if A is the original in-sample adjacency matrix, the tuple
-        (A.T, A) will need to be passed to ``transform`` if you do not wish to use
-        ``fit_transform``.
+        out : array_like or tuple, shape
         """
-
-        # checks
-        check_is_fitted(self, "is_fitted_")
-        if isinstance(X, nx.classes.graph.Graph):
-            X = import_graph(X)
-        directed = self.latent_right_ is not None
-
-        # correct types?
-        if directed and not isinstance(X, tuple):
-            if X.shape[0] == X.shape[1]:  # in case original matrix was passed
-                msg = """A square matrix A was passed to ``transform`` in the directed case. 
-                If this was the original in-sample matrix, either use ``fit_transform`` 
-                or pass a tuple (A.T, A). If this was an out-of-sample matrix, directed
-                graphs require a tuple (X_out, X_in)."""
-                raise TypeError(msg)
-            else:
-                msg = "Directed graphs require a tuple (X_out, X_in) for out-of-sample transforms."
-                raise TypeError(msg)
-        if not directed and not isinstance(X, np.ndarray):
-            raise TypeError("Undirected graphs require array input")
-
-        # correct shape in y?
-        latent_rows = self.latent_left_.shape[0]
-        _X = X[0] if directed else X
-        X_cols = _X.shape[-1]
-        if _X.ndim > 2:
-            raise ValueError("out-of-sample vertex must be 1d or 2d")
-        if latent_rows != X_cols:
-            msg = "out-of-sample vertex must be shape (n_oos_vertices, n_vertices)"
-            raise ValueError(msg)
-
-        # workhorse code
         if not directed:
             return X @ self._pinv_left
         elif directed: