Shift casing on ParametricUMAP; import ParametricUMAP and AlignedUMAP…

… to the top level.
lmcinnes · Sep 22, 2020 · 5e79a32 · 5e79a32
1 parent 9c9a094
commit 5e79a32
Show file tree

Hide file tree

Showing 10 changed files with 58 additions and 56 deletions.
diff --git a/doc/parametric_umap.rst b/doc/parametric_umap.rst
@@ -12,12 +12,12 @@ Parametric UMAP replaces the second step, minimizing the same objective function
 
 .. image:: images/pumap-only.png
 
-Parametric UMAP is simply a subclass of UMAP, so it can be used just like nonparametric UMAP, replacing :python:`umap.UMAP` with :python:`parametric_umap.parametricUMAP`. The most basic usage of parametric UMAP would be to simply replace UMAP with parametricUMAP in your code:
+Parametric UMAP is simply a subclass of UMAP, so it can be used just like nonparametric UMAP, replacing :python:`umap.UMAP` with :python:`parametric_umap.ParametricUMAP`. The most basic usage of parametric UMAP would be to simply replace UMAP with ParametricUMAP in your code:
 
 .. code:: python3
 
-    from umap.parametric_umap import parametricUMAP
-    embedder = parametricUMAP()
+    from umap.parametric_umap import ParametricUMAP
+    embedder = ParametricUMAP()
     embedding = embedder.fit_transform(my_data)
 
 In this implementation, we use Keras and Tensorflow as a backend to train that neural network. The added complexity of a learned embedding presents a number of configurable settings available in addition to those in non-parametric UMAP. A set of Jupyter notebooks walking you through these parameters are available on the  `GitHub repository <http://github.com/lmcinnes/umap/notebooks/parametric_umap/>`_
@@ -26,7 +26,7 @@ In this implementation, we use Keras and Tensorflow as a backend to train that n
 Defining your own network
 ---------------------------
 
-By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural network. To extend Parametric UMAP to use a more complex architecture, like a convolutional neural network, we simply need to define the network and pass it in as an argument to parametricUMAP. This can be done easliy, using tf.keras.Sequential. Here's an example for MNIST:
+By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural network. To extend Parametric UMAP to use a more complex architecture, like a convolutional neural network, we simply need to define the network and pass it in as an argument to ParametricUMAP. This can be done easliy, using tf.keras.Sequential. Here's an example for MNIST:
 
 .. code:: python3
     
@@ -49,7 +49,7 @@ By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural netwo
     ])
     encoder.summary()
    
-To load pass the data into parametricUMAP, we first need to flatten it from 28x28x1 images to a 784-dimensional vector. 
+To load pass the data into ParametricUMAP, we first need to flatten it from 28x28x1 images to a 784-dimensional vector.
 
 .. code:: python3    
 
@@ -59,12 +59,12 @@ To load pass the data into parametricUMAP, we first need to flatten it from 28x2
     test_images = test_images.reshape((test_images.shape[0], -1))/255.
 
 
-We can then the network into parametricUMAP and train:
+We can then the network into ParametricUMAP and train:
 
 .. code:: python3 
 
-    # pass encoder network to parametricUMAP
-    embedder = parametricUMAP(encoder=encoder, dims=dims)
+    # pass encoder network to ParametricUMAP
+    embedder = ParametricUMAP(encoder=encoder, dims=dims)
     embedding = embedder.fit_transform(train_images)
 
 If you are unfamilar with Tensorflow/Keras and want to train your own model, we reccomend that you take a look at the `Tensorflow documentation <https://www.tensorflow.org/>`_. 
@@ -83,8 +83,8 @@ You can then load parametric UMAP elsewhere:
 
 .. code:: python3
 
-    from umap.parametric_umap import load_parametricUMAP
-    embedder = load_parametricUMAP('/your/path/here')
+    from umap.parametric_umap import load_ParametricUMAP
+    embedder = load_ParametricUMAP('/your/path/here')
 
 This loads both the UMAP object and the parametric networks it contains.
 
@@ -105,10 +105,10 @@ Parametric UMAP monitors loss during training using Keras. That loss will be pri
 
 Parametric inverse_transform (reconstruction)
 ---------------------------------------------
-To use a second neural network to learn an inverse mapping between data and embeddings, we simply need to pass `parametric_reconstruction= True` to the parametricUMAP. 
+To use a second neural network to learn an inverse mapping between data and embeddings, we simply need to pass `parametric_reconstruction= True` to the ParametricUMAP.
 
 
-Like the encoder, a custom decoder can also be passed to parametricUMAP, e.g. 
+Like the encoder, a custom decoder can also be passed to ParametricUMAP, e.g.
 
 .. code:: python3
 
@@ -134,12 +134,12 @@ In addition, validation data can be used to test reconstruction loss on out-of-d
 
     validation_images = test_images.reshape((test_images.shape[0], -1))/255.
 
-Finally, we can pass the validation data and the networks to parametricUMAP and train:
+Finally, we can pass the validation data and the networks to ParametricUMAP and train:
 
 
 .. code:: python3
 
-            embedder = parametricUMAP(
+            embedder = ParametricUMAP(
                 encoder=encoder,
                 decoder=decoder,
                 dims=dims,
@@ -154,12 +154,12 @@ Autoencoding UMAP
 -----------------
 
 
-In the example above, the encoder is trained to minimize UMAP loss, and the decoder is trained to minimize reconstruction loss. To train the encoder jointly on both UMAP loss and reconstruction loss, pass :python:`autoencoder_loss = True` into the parametricUMAP.  
+In the example above, the encoder is trained to minimize UMAP loss, and the decoder is trained to minimize reconstruction loss. To train the encoder jointly on both UMAP loss and reconstruction loss, pass :python:`autoencoder_loss = True` into the ParametricUMAP.
 
 
 .. code:: python3
 
-            embedder = parametricUMAP(
+            embedder = ParametricUMAP(
                 encoder=encoder,
                 decoder=decoder,
                 dims=dims,
@@ -173,7 +173,7 @@ In the example above, the encoder is trained to minimize UMAP loss, and the deco
 Early stopping and Keras callbacks
 ----------------------------------
 
-It can sometimes be useful to train the embedder until some plateau in training loss is met. In deep learning, early stopping is one way to do this. Keras provides custom `callbacks <https://keras.io/api/callbacks/>`_ that allow you to implement checks during training, such as early stopping. We can use callbacks, such as early stopping, with parametricUMAP to stop training early based on a predefined training threshold, using the :python:`keras_fit_kwargs` argument:
+It can sometimes be useful to train the embedder until some plateau in training loss is met. In deep learning, early stopping is one way to do this. Keras provides custom `callbacks <https://keras.io/api/callbacks/>`_ that allow you to implement checks during training, such as early stopping. We can use callbacks, such as early stopping, with ParametricUMAP to stop training early based on a predefined training threshold, using the :python:`keras_fit_kwargs` argument:
 
 .. code:: python3
 
@@ -186,7 +186,7 @@ It can sometimes be useful to train the embedder until some plateau in training
         )
     ]}
 
-    embedder = parametricUMAP(
+    embedder = ParametricUMAP(
         verbose=True,
         keras_fit_kwargs = keras_fit_kwargs,
         n_training_epochs=20
@@ -199,9 +199,9 @@ We also passed in :python:`n_training_epochs = 20`, allowing early stopping to e
 Additional important parameters
 -------------------------------
 
-* **batch_size:** parametricUMAP in trained over batches of edges randomly sampled from the UMAP graph, and then trained via gradient descent.  parametricUMAP defaults to a batch size of 1000 edges, but can be adjusted to a value that fits better on your GPU or CPU. 
+* **batch_size:** ParametricUMAP in trained over batches of edges randomly sampled from the UMAP graph, and then trained via gradient descent.  ParametricUMAP defaults to a batch size of 1000 edges, but can be adjusted to a value that fits better on your GPU or CPU.
 * **loss_report_frequency:** If set to 1, an epoch in in the Keras embedding refers to a single iteration over the graph computed in UMAP. Setting :python:`loss_report_frequency` to 10, would split up that epoch into 10 seperate epochs, for more frequent reporting. 
-* **n_training_epochs:** The number of epochs over the UMAP graph to train for (irrespective of :python:`loss_report_frequency`). Training the network for multiple epochs will result in better embeddings, but take longer. This parameter is different than :python:`n_epochs` in the base UMAP class, which corresponds to the maximum number of times an edge is trained in a single parametricUMAP epoch. 
+* **n_training_epochs:** The number of epochs over the UMAP graph to train for (irrespective of :python:`loss_report_frequency`). Training the network for multiple epochs will result in better embeddings, but take longer. This parameter is different than :python:`n_epochs` in the base UMAP class, which corresponds to the maximum number of times an edge is trained in a single ParametricUMAP epoch.
 * **optimizer:** The optimizer used to train the neural network. by default Adam (:python:`tf.keras.optimizers.Adam(1e-3)`) is used. You might be able to speed up or improve training by using a different optimizer.
 * **parametric_embedding:** If set to false, a non-parametric embedding is learned, using the same code as the parametric embedding, which can serve as a direct comparison between parametric and non-parametric embedding using the same optimizer. The parametric embeddings are performed over the entire dataset simultaneously. 
 

diff --git a/notebooks/Parametric_UMAP/01.0-parametric-umap-mnist-embedding-basic.ipynb b/notebooks/Parametric_UMAP/01.0-parametric-umap-mnist-embedding-basic.ipynb
@@ -55,7 +55,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -69,7 +69,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(verbose=True)"
+    "embedder = ParametricUMAP(verbose=True)"
    ]
   },
   {
@@ -87,7 +87,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f8ffc2894a8>)\n",
+      "ParametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f8ffc2894a8>)\n",
       "Construct fuzzy simplicial set\n",
       "Tue Sep  1 15:49:42 2020 Finding Nearest Neighbors\n",
       "Tue Sep  1 15:49:42 2020 Building RP forest with 17 trees\n",
@@ -278,7 +278,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import load_parametricUMAP"
+    "from umap.parametric_umap import load_ParametricUMAP"
    ]
   },
   {
@@ -297,7 +297,7 @@
      "text": [
       "Keras encoder model saved to /tmp/model/encoder\n",
       "Keras full model saved to /tmp/model/parametric_model\n",
-      "Pickle of parametricUMAP model saved to /tmp/model/model.pkl\n"
+      "Pickle of ParametricUMAP model saved to /tmp/model/model.pkl\n"
      ]
     }
    ],
@@ -340,7 +340,7 @@
     }
    ],
    "source": [
-    "embedder = load_parametricUMAP('/tmp/model')"
+    "embedder = load_ParametricUMAP('/tmp/model')"
    ]
   },
   {

diff --git a/notebooks/Parametric_UMAP/02.0-parametric-umap-mnist-embedding-convnet.ipynb b/notebooks/Parametric_UMAP/02.0-parametric-umap-mnist-embedding-convnet.ipynb
@@ -118,7 +118,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -132,7 +132,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(encoder=encoder, dims=dims, n_training_epochs=5, verbose=True)"
+    "embedder = ParametricUMAP(encoder=encoder, dims=dims, n_training_epochs=5, verbose=True)"
    ]
   },
   {
@@ -149,7 +149,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(dims=(28, 28, 1),\n",
+      "ParametricUMAP(dims=(28, 28, 1),\n",
       "               encoder=<tensorflow.python.keras.engine.sequential.Sequential object at 0x7f9b902396a0>,\n",
       "               n_training_epochs=5,\n",
       "               optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f9968e0ef60>)\n",

diff --git a/...ks/Parametric_UMAP/03.0-parametric-umap-mnist-embedding-convnet-with-reconstruction.ipynb b/...ks/Parametric_UMAP/03.0-parametric-umap-mnist-embedding-convnet-with-reconstruction.ipynb
@@ -179,7 +179,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -193,7 +193,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(\n",
+    "embedder = ParametricUMAP(\n",
     "    encoder=encoder,\n",
     "    decoder=decoder,\n",
     "    dims=dims,\n",
@@ -219,7 +219,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(n_training_epochs=5,\n",
+      "ParametricUMAP(n_training_epochs=5,\n",
       "               optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fdb3052e080>,\n",
       "               parametric_reconstruction=True,\n",
       "               reconstruction_validation=array([[0., 0., 0., ..., 0., 0., 0.],\n",

diff --git a/.../Parametric_UMAP/04.0-parametric-umap-mnist-embedding-convnet-with-autoencoder-loss.ipynb b/.../Parametric_UMAP/04.0-parametric-umap-mnist-embedding-convnet-with-autoencoder-loss.ipynb
@@ -179,7 +179,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -193,7 +193,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(\n",
+    "embedder = ParametricUMAP(\n",
     "    encoder=encoder,\n",
     "    decoder=decoder,\n",
     "    dims=dims,\n",
@@ -219,7 +219,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(autoencoder_loss=True, n_training_epochs=5,\n",
+      "ParametricUMAP(autoencoder_loss=True, n_training_epochs=5,\n",
       "               optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f92a05514a8>,\n",
       "               parametric_reconstruction=True,\n",
       "               reconstruction_validation=array([[0., 0., 0., ..., 0., 0., 0.],\n",

diff --git a/notebooks/Parametric_UMAP/05.0-parametric-umap-with-callback.ipynb b/notebooks/Parametric_UMAP/05.0-parametric-umap-with-callback.ipynb
@@ -55,7 +55,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -90,7 +90,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(\n",
+    "embedder = ParametricUMAP(\n",
     "    verbose=True,\n",
     "    keras_fit_kwargs = keras_fit_kwargs,\n",
     "    n_training_epochs=20\n",
@@ -111,7 +111,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(keras_fit_kwargs={'callbacks': [<tensorflow.python.keras.callbacks.EarlyStopping object at 0x7fd30942f7b8>]},\n",
+      "ParametricUMAP(keras_fit_kwargs={'callbacks': [<tensorflow.python.keras.callbacks.EarlyStopping object at 0x7fd30942f7b8>]},\n",
       "               n_training_epochs=20,\n",
       "               optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fd30942f668>)\n",
       "Construct fuzzy simplicial set\n",

diff --git a/notebooks/Parametric_UMAP/06.0-nonparametric-umap.ipynb b/notebooks/Parametric_UMAP/06.0-nonparametric-umap.ipynb
@@ -55,7 +55,7 @@
    },
    "outputs": [],
    "source": [
-    "from umap.parametric_umap import parametricUMAP"
+    "from umap.parametric_umap import ParametricUMAP"
    ]
   },
   {
@@ -69,7 +69,7 @@
    },
    "outputs": [],
    "source": [
-    "embedder = parametricUMAP(parametric_embedding=False, verbose=True)"
+    "embedder = ParametricUMAP(parametric_embedding=False, verbose=True)"
    ]
   },
   {
@@ -87,7 +87,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "parametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fa94d08b978>,\n",
+      "ParametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fa94d08b978>,\n",
       "               parametric_embedding=False)\n",
       "Construct fuzzy simplicial set\n",
       "Sun Aug 16 18:34:39 2020 Finding Nearest Neighbors\n",

diff --git a/umap/__init__.py b/umap/__init__.py
@@ -1,4 +1,6 @@
 from .umap_ import UMAP
+from .parametric_umap import ParametricUMAP
+from .aligned_umap import AlignedUMAP
 
 # Workaround: https://github.com/numba/numba/issues/3341
 import numba
@@ -8,4 +10,4 @@
 try:
     __version__ = pkg_resources.get_distribution("umap-learn").version
 except pkg_resources.DistributionNotFound:
-    __version__ = "0.4-dev"
+    __version__ = "0.5-dev"
diff --git a/umap/parametric_umap.py b/umap/parametric_umap.py
@@ -39,7 +39,7 @@
     raise ImportError("umap.parametric_umap requires Tensorflow >= 2.0") from None
 
 
-class parametricUMAP(UMAP):
+class ParametricUMAP(UMAP):
     def __init__(
         self,
         optimizer=None,
@@ -147,7 +147,7 @@ def transform(self, X):
             )
         else:
             warn(
-                "Embedding new data is not supported by parametricUMAP. \
+                "Embedding new data is not supported by ParametricUMAP. \
                 Using original embedder."
             )
             return super().transform(X)
@@ -407,7 +407,7 @@ def save(self, save_location, verbose=True):
             with open(model_output, "wb") as output:
                 pickle.dump(self, output, pickle.HIGHEST_PROTOCOL)
             if verbose:
-                print("Pickle of parametricUMAP model saved to {}".format(model_output))
+                print("Pickle of ParametricUMAP model saved to {}".format(model_output))
 
 
 def get_graph_elements(graph_, n_epochs):
@@ -873,7 +873,7 @@ def should_pickle(key, val):
     return True
 
 
-def load_parametricUMAP(save_location, verbose=True):
+def load_ParametricUMAP(save_location, verbose=True):
     """
     Load a parametric UMAP model consisting of a umap-learn UMAP object 
     and corresponding keras models. 
@@ -887,11 +887,11 @@ def load_parametricUMAP(save_location, verbose=True):
 
     Returns
     -------
-    parametric_umap.parametricUMAP
+    parametric_umap.ParametricUMAP
         Parametric UMAP objects
     """
 
-    ## Loads a parametricUMAP model and its related keras models
+    ## Loads a ParametricUMAP model and its related keras models
 
     model_output = os.path.join(save_location, "model.pkl")
     model = pickle.load((open(model_output, "rb")))