Skip to content

Commit

Permalink
Shift casing on ParametricUMAP; import ParametricUMAP and AlignedUMAP…
Browse files Browse the repository at this point in the history
… to the top level.
  • Loading branch information
lmcinnes committed Sep 22, 2020
1 parent 9c9a094 commit 5e79a32
Show file tree
Hide file tree
Showing 10 changed files with 58 additions and 56 deletions.
40 changes: 20 additions & 20 deletions doc/parametric_umap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ Parametric UMAP replaces the second step, minimizing the same objective function

.. image:: images/pumap-only.png

Parametric UMAP is simply a subclass of UMAP, so it can be used just like nonparametric UMAP, replacing :python:`umap.UMAP` with :python:`parametric_umap.parametricUMAP`. The most basic usage of parametric UMAP would be to simply replace UMAP with parametricUMAP in your code:
Parametric UMAP is simply a subclass of UMAP, so it can be used just like nonparametric UMAP, replacing :python:`umap.UMAP` with :python:`parametric_umap.ParametricUMAP`. The most basic usage of parametric UMAP would be to simply replace UMAP with ParametricUMAP in your code:

.. code:: python3
from umap.parametric_umap import parametricUMAP
embedder = parametricUMAP()
from umap.parametric_umap import ParametricUMAP
embedder = ParametricUMAP()
embedding = embedder.fit_transform(my_data)
In this implementation, we use Keras and Tensorflow as a backend to train that neural network. The added complexity of a learned embedding presents a number of configurable settings available in addition to those in non-parametric UMAP. A set of Jupyter notebooks walking you through these parameters are available on the `GitHub repository <http://github.com/lmcinnes/umap/notebooks/parametric_umap/>`_
Expand All @@ -26,7 +26,7 @@ In this implementation, we use Keras and Tensorflow as a backend to train that n
Defining your own network
---------------------------

By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural network. To extend Parametric UMAP to use a more complex architecture, like a convolutional neural network, we simply need to define the network and pass it in as an argument to parametricUMAP. This can be done easliy, using tf.keras.Sequential. Here's an example for MNIST:
By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural network. To extend Parametric UMAP to use a more complex architecture, like a convolutional neural network, we simply need to define the network and pass it in as an argument to ParametricUMAP. This can be done easliy, using tf.keras.Sequential. Here's an example for MNIST:

.. code:: python3
Expand All @@ -49,7 +49,7 @@ By default, parametric UMAP uses 3-layer 100-neuron fully-connected neural netwo
])
encoder.summary()
To load pass the data into parametricUMAP, we first need to flatten it from 28x28x1 images to a 784-dimensional vector.
To load pass the data into ParametricUMAP, we first need to flatten it from 28x28x1 images to a 784-dimensional vector.

.. code:: python3
Expand All @@ -59,12 +59,12 @@ To load pass the data into parametricUMAP, we first need to flatten it from 28x2
test_images = test_images.reshape((test_images.shape[0], -1))/255.
We can then the network into parametricUMAP and train:
We can then the network into ParametricUMAP and train:

.. code:: python3
# pass encoder network to parametricUMAP
embedder = parametricUMAP(encoder=encoder, dims=dims)
# pass encoder network to ParametricUMAP
embedder = ParametricUMAP(encoder=encoder, dims=dims)
embedding = embedder.fit_transform(train_images)
If you are unfamilar with Tensorflow/Keras and want to train your own model, we reccomend that you take a look at the `Tensorflow documentation <https://www.tensorflow.org/>`_.
Expand All @@ -83,8 +83,8 @@ You can then load parametric UMAP elsewhere:

.. code:: python3
from umap.parametric_umap import load_parametricUMAP
embedder = load_parametricUMAP('/your/path/here')
from umap.parametric_umap import load_ParametricUMAP
embedder = load_ParametricUMAP('/your/path/here')
This loads both the UMAP object and the parametric networks it contains.

Expand All @@ -105,10 +105,10 @@ Parametric UMAP monitors loss during training using Keras. That loss will be pri

Parametric inverse_transform (reconstruction)
---------------------------------------------
To use a second neural network to learn an inverse mapping between data and embeddings, we simply need to pass `parametric_reconstruction= True` to the parametricUMAP.
To use a second neural network to learn an inverse mapping between data and embeddings, we simply need to pass `parametric_reconstruction= True` to the ParametricUMAP.


Like the encoder, a custom decoder can also be passed to parametricUMAP, e.g.
Like the encoder, a custom decoder can also be passed to ParametricUMAP, e.g.

.. code:: python3
Expand All @@ -134,12 +134,12 @@ In addition, validation data can be used to test reconstruction loss on out-of-d
validation_images = test_images.reshape((test_images.shape[0], -1))/255.
Finally, we can pass the validation data and the networks to parametricUMAP and train:
Finally, we can pass the validation data and the networks to ParametricUMAP and train:


.. code:: python3
embedder = parametricUMAP(
embedder = ParametricUMAP(
encoder=encoder,
decoder=decoder,
dims=dims,
Expand All @@ -154,12 +154,12 @@ Autoencoding UMAP
-----------------


In the example above, the encoder is trained to minimize UMAP loss, and the decoder is trained to minimize reconstruction loss. To train the encoder jointly on both UMAP loss and reconstruction loss, pass :python:`autoencoder_loss = True` into the parametricUMAP.
In the example above, the encoder is trained to minimize UMAP loss, and the decoder is trained to minimize reconstruction loss. To train the encoder jointly on both UMAP loss and reconstruction loss, pass :python:`autoencoder_loss = True` into the ParametricUMAP.


.. code:: python3
embedder = parametricUMAP(
embedder = ParametricUMAP(
encoder=encoder,
decoder=decoder,
dims=dims,
Expand All @@ -173,7 +173,7 @@ In the example above, the encoder is trained to minimize UMAP loss, and the deco
Early stopping and Keras callbacks
----------------------------------

It can sometimes be useful to train the embedder until some plateau in training loss is met. In deep learning, early stopping is one way to do this. Keras provides custom `callbacks <https://keras.io/api/callbacks/>`_ that allow you to implement checks during training, such as early stopping. We can use callbacks, such as early stopping, with parametricUMAP to stop training early based on a predefined training threshold, using the :python:`keras_fit_kwargs` argument:
It can sometimes be useful to train the embedder until some plateau in training loss is met. In deep learning, early stopping is one way to do this. Keras provides custom `callbacks <https://keras.io/api/callbacks/>`_ that allow you to implement checks during training, such as early stopping. We can use callbacks, such as early stopping, with ParametricUMAP to stop training early based on a predefined training threshold, using the :python:`keras_fit_kwargs` argument:

.. code:: python3
Expand All @@ -186,7 +186,7 @@ It can sometimes be useful to train the embedder until some plateau in training
)
]}
embedder = parametricUMAP(
embedder = ParametricUMAP(
verbose=True,
keras_fit_kwargs = keras_fit_kwargs,
n_training_epochs=20
Expand All @@ -199,9 +199,9 @@ We also passed in :python:`n_training_epochs = 20`, allowing early stopping to e
Additional important parameters
-------------------------------

* **batch_size:** parametricUMAP in trained over batches of edges randomly sampled from the UMAP graph, and then trained via gradient descent. parametricUMAP defaults to a batch size of 1000 edges, but can be adjusted to a value that fits better on your GPU or CPU.
* **batch_size:** ParametricUMAP in trained over batches of edges randomly sampled from the UMAP graph, and then trained via gradient descent. ParametricUMAP defaults to a batch size of 1000 edges, but can be adjusted to a value that fits better on your GPU or CPU.
* **loss_report_frequency:** If set to 1, an epoch in in the Keras embedding refers to a single iteration over the graph computed in UMAP. Setting :python:`loss_report_frequency` to 10, would split up that epoch into 10 seperate epochs, for more frequent reporting.
* **n_training_epochs:** The number of epochs over the UMAP graph to train for (irrespective of :python:`loss_report_frequency`). Training the network for multiple epochs will result in better embeddings, but take longer. This parameter is different than :python:`n_epochs` in the base UMAP class, which corresponds to the maximum number of times an edge is trained in a single parametricUMAP epoch.
* **n_training_epochs:** The number of epochs over the UMAP graph to train for (irrespective of :python:`loss_report_frequency`). Training the network for multiple epochs will result in better embeddings, but take longer. This parameter is different than :python:`n_epochs` in the base UMAP class, which corresponds to the maximum number of times an edge is trained in a single ParametricUMAP epoch.
* **optimizer:** The optimizer used to train the neural network. by default Adam (:python:`tf.keras.optimizers.Adam(1e-3)`) is used. You might be able to speed up or improve training by using a different optimizer.
* **parametric_embedding:** If set to false, a non-parametric embedding is learned, using the same code as the parametric embedding, which can serve as a direct comparison between parametric and non-parametric embedding using the same optimizer. The parametric embeddings are performed over the entire dataset simultaneously.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand All @@ -69,7 +69,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(verbose=True)"
"embedder = ParametricUMAP(verbose=True)"
]
},
{
Expand All @@ -87,7 +87,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f8ffc2894a8>)\n",
"ParametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f8ffc2894a8>)\n",
"Construct fuzzy simplicial set\n",
"Tue Sep 1 15:49:42 2020 Finding Nearest Neighbors\n",
"Tue Sep 1 15:49:42 2020 Building RP forest with 17 trees\n",
Expand Down Expand Up @@ -278,7 +278,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import load_parametricUMAP"
"from umap.parametric_umap import load_ParametricUMAP"
]
},
{
Expand All @@ -297,7 +297,7 @@
"text": [
"Keras encoder model saved to /tmp/model/encoder\n",
"Keras full model saved to /tmp/model/parametric_model\n",
"Pickle of parametricUMAP model saved to /tmp/model/model.pkl\n"
"Pickle of ParametricUMAP model saved to /tmp/model/model.pkl\n"
]
}
],
Expand Down Expand Up @@ -340,7 +340,7 @@
}
],
"source": [
"embedder = load_parametricUMAP('/tmp/model')"
"embedder = load_ParametricUMAP('/tmp/model')"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand All @@ -132,7 +132,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(encoder=encoder, dims=dims, n_training_epochs=5, verbose=True)"
"embedder = ParametricUMAP(encoder=encoder, dims=dims, n_training_epochs=5, verbose=True)"
]
},
{
Expand All @@ -149,7 +149,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(dims=(28, 28, 1),\n",
"ParametricUMAP(dims=(28, 28, 1),\n",
" encoder=<tensorflow.python.keras.engine.sequential.Sequential object at 0x7f9b902396a0>,\n",
" n_training_epochs=5,\n",
" optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f9968e0ef60>)\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand All @@ -193,7 +193,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(\n",
"embedder = ParametricUMAP(\n",
" encoder=encoder,\n",
" decoder=decoder,\n",
" dims=dims,\n",
Expand All @@ -219,7 +219,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(n_training_epochs=5,\n",
"ParametricUMAP(n_training_epochs=5,\n",
" optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fdb3052e080>,\n",
" parametric_reconstruction=True,\n",
" reconstruction_validation=array([[0., 0., 0., ..., 0., 0., 0.],\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand All @@ -193,7 +193,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(\n",
"embedder = ParametricUMAP(\n",
" encoder=encoder,\n",
" decoder=decoder,\n",
" dims=dims,\n",
Expand All @@ -219,7 +219,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(autoencoder_loss=True, n_training_epochs=5,\n",
"ParametricUMAP(autoencoder_loss=True, n_training_epochs=5,\n",
" optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f92a05514a8>,\n",
" parametric_reconstruction=True,\n",
" reconstruction_validation=array([[0., 0., 0., ..., 0., 0., 0.],\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand Down Expand Up @@ -90,7 +90,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(\n",
"embedder = ParametricUMAP(\n",
" verbose=True,\n",
" keras_fit_kwargs = keras_fit_kwargs,\n",
" n_training_epochs=20\n",
Expand All @@ -111,7 +111,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(keras_fit_kwargs={'callbacks': [<tensorflow.python.keras.callbacks.EarlyStopping object at 0x7fd30942f7b8>]},\n",
"ParametricUMAP(keras_fit_kwargs={'callbacks': [<tensorflow.python.keras.callbacks.EarlyStopping object at 0x7fd30942f7b8>]},\n",
" n_training_epochs=20,\n",
" optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fd30942f668>)\n",
"Construct fuzzy simplicial set\n",
Expand Down
6 changes: 3 additions & 3 deletions notebooks/Parametric_UMAP/06.0-nonparametric-umap.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
},
"outputs": [],
"source": [
"from umap.parametric_umap import parametricUMAP"
"from umap.parametric_umap import ParametricUMAP"
]
},
{
Expand All @@ -69,7 +69,7 @@
},
"outputs": [],
"source": [
"embedder = parametricUMAP(parametric_embedding=False, verbose=True)"
"embedder = ParametricUMAP(parametric_embedding=False, verbose=True)"
]
},
{
Expand All @@ -87,7 +87,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"parametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fa94d08b978>,\n",
"ParametricUMAP(optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7fa94d08b978>,\n",
" parametric_embedding=False)\n",
"Construct fuzzy simplicial set\n",
"Sun Aug 16 18:34:39 2020 Finding Nearest Neighbors\n",
Expand Down
4 changes: 3 additions & 1 deletion umap/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from .umap_ import UMAP
from .parametric_umap import ParametricUMAP
from .aligned_umap import AlignedUMAP

# Workaround: https://github.com/numba/numba/issues/3341
import numba
Expand All @@ -8,4 +10,4 @@
try:
__version__ = pkg_resources.get_distribution("umap-learn").version
except pkg_resources.DistributionNotFound:
__version__ = "0.4-dev"
__version__ = "0.5-dev"
12 changes: 6 additions & 6 deletions umap/parametric_umap.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
raise ImportError("umap.parametric_umap requires Tensorflow >= 2.0") from None


class parametricUMAP(UMAP):
class ParametricUMAP(UMAP):
def __init__(
self,
optimizer=None,
Expand Down Expand Up @@ -147,7 +147,7 @@ def transform(self, X):
)
else:
warn(
"Embedding new data is not supported by parametricUMAP. \
"Embedding new data is not supported by ParametricUMAP. \
Using original embedder."
)
return super().transform(X)
Expand Down Expand Up @@ -407,7 +407,7 @@ def save(self, save_location, verbose=True):
with open(model_output, "wb") as output:
pickle.dump(self, output, pickle.HIGHEST_PROTOCOL)
if verbose:
print("Pickle of parametricUMAP model saved to {}".format(model_output))
print("Pickle of ParametricUMAP model saved to {}".format(model_output))


def get_graph_elements(graph_, n_epochs):
Expand Down Expand Up @@ -873,7 +873,7 @@ def should_pickle(key, val):
return True


def load_parametricUMAP(save_location, verbose=True):
def load_ParametricUMAP(save_location, verbose=True):
"""
Load a parametric UMAP model consisting of a umap-learn UMAP object
and corresponding keras models.
Expand All @@ -887,11 +887,11 @@ def load_parametricUMAP(save_location, verbose=True):
Returns
-------
parametric_umap.parametricUMAP
parametric_umap.ParametricUMAP
Parametric UMAP objects
"""

## Loads a parametricUMAP model and its related keras models
## Loads a ParametricUMAP model and its related keras models

model_output = os.path.join(save_location, "model.pkl")
model = pickle.load((open(model_output, "rb")))
Expand Down
Loading

0 comments on commit 5e79a32

Please sign in to comment.