Skip to content

Commit

Permalink
Fix bokeh plot; fix code block highlighting
Browse files Browse the repository at this point in the history
  • Loading branch information
lmcinnes committed Jun 24, 2018
1 parent 5198919 commit aaefee0
Show file tree
Hide file tree
Showing 6 changed files with 119 additions and 71 deletions.
48 changes: 24 additions & 24 deletions doc/basic_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ well as the ``train_test_split`` function to divide up data. Finally
we'll need some plotting tools (matplotlib and seaborn) to help us
visualise the results of UMAP, and pandas to make that a littl easier.

.. code:: ipython3
.. code:: python3
import numpy as np
from sklearn.datasets import load_iris, load_digits
Expand All @@ -27,7 +27,7 @@ visualise the results of UMAP, and pandas to make that a littl easier.
import pandas as pd
%matplotlib inline
.. code:: ipython3
.. code:: python3
sns.set(style='white', context='notebook', rc={'figure.figsize':(14,10)})
Expand All @@ -42,7 +42,7 @@ small both in number of points and number of features, and will let us
get an idea of what the dimension reduction is doing. We can load the
iris dataset from sklearn.

.. code:: ipython3
.. code:: python3
iris = load_iris()
print(iris.DESCR)
Expand Down Expand Up @@ -124,7 +124,7 @@ can just to a pairwise feature scatterplot matrix to get an ideas of
what is going on. Seaborn makes this easy (once we get the data into a
pandas dataframe).

.. code:: ipython3
.. code:: python3
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Series(iris.target).map(dict(zip(range(3),iris.target_names)))
Expand All @@ -148,11 +148,11 @@ To use UMAP for this task we need to first construct a UMAP object that
will do the job for us. That is as simple as instantiating the class. So
let's import the umap library and do that.

.. code:: ipython3
.. code:: python3
import umap
.. code:: ipython3
.. code:: python3
reducer = umap.UMAP()
Expand All @@ -163,7 +163,7 @@ day, we are going to want to reduced representation of the data we will
use, instead, the ``fit_transform`` method which first calls ``fit`` and
then returns the transformed data as a numpy array.

.. code:: ipython3
.. code:: python3
embedding = reducer.fit_transform(iris.data)
embedding.shape
Expand All @@ -183,7 +183,7 @@ representation of the corresponding flower. Thus we can plot the
(since it applies to the transformed data which is in the same order as
the original).

.. code:: ipython3
.. code:: python3
plt.scatter(embedding[:, 0], embedding[:, 1], c=[sns.color_palette()[x] for x in iris.target])
plt.gca().set_aspect('equal', 'datalim')
Expand All @@ -209,7 +209,7 @@ Digits data

First we will load the dataset from sklearn.

.. code:: ipython3
.. code:: python3
digits = load_digits()
print(digits.DESCR)
Expand Down Expand Up @@ -269,7 +269,7 @@ We can plot a number of the images to get an idea of what we are looking
at. This just involves matplotlib building a grid of axes and then
looping through them plotting an image into each one in turn.

.. code:: ipython3
.. code:: python3
fig, ax_array = plt.subplots(20, 20)
axes = ax_array.flatten()
Expand Down Expand Up @@ -302,7 +302,7 @@ scatterplot matrix -- in this case just of the first 10 dimensions so
that it is at least plottable, but as you can quickly see that approach
is not going to be sufficient for this data.

.. code:: ipython3
.. code:: python3
digits_df = pd.DataFrame(digits.data[:,:10])
digits_df['digit'] = pd.Series(digits.target).map(lambda x: 'Digit {}'.format(x))
Expand All @@ -319,7 +319,7 @@ data. TO demonstrate more of UMAP we'll go about it differently this
time and simply use the ``fit`` method rather than the ``fit_transform``
approach we used for Iris.

.. code:: ipython3
.. code:: python3
reducer = umap.UMAP(random_state=42)
reducer.fit(digits.data)
Expand All @@ -342,7 +342,7 @@ object, now having trained on the dataset we passed it. To access the
resulting transform we can either look at the ``embedding_`` attribute
of the reducer object, or call transform on the original data.

.. code:: ipython3
.. code:: python3
embedding = reducer.transform(digits.data)
# Verify that the result of calling transform is
Expand All @@ -364,7 +364,7 @@ sample), but only 2 columns. As with the Iris example we can now plot
the resulting embedding, coloring the data points by the class that
theyr belong to (i.e. the digit they represent).

.. code:: ipython3
.. code:: python3
plt.scatter(embedding[:, 0], embedding[:, 1], c=digits.target, cmap='Spectral', s=5)
plt.gca().set_aspect('equal', 'datalim')
Expand All @@ -389,13 +389,15 @@ tooltips of the images.

First we'll need to encode all the images for inclusion in a dataframe.

.. code:: ipython3
.. code:: python3
from io import BytesIO
from PIL import Image
import base64
.. code:: ipython3
.. code:: python3
def embeddable_image(data):
img_data = 255 - 15 * data.astype(np.uint8)
Expand All @@ -405,10 +407,12 @@ First we'll need to encode all the images for inclusion in a dataframe.
for_encoding = buffer.getvalue()
return 'data:image/png;base64,' + base64.b64encode(for_encoding).decode()
Next we need to load up bokeh and the various tools from it that will be
needed to generate a suitable interactive plot.

.. code:: ipython3
.. code:: python3
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import HoverTool, ColumnDataSource, CategoricalColorMapper
Expand All @@ -434,7 +438,7 @@ embeds the image of the digit in question in it, along with the digit
class that the digit is actually from (this can be useful for digits
that are hard even for humans to classify correctly).

.. code:: ipython3
.. code:: python3
digits_df = pd.DataFrame(embedding, columns=('x', 'y'))
digits_df['digit'] = [str(x) for x in digits.target]
Expand Down Expand Up @@ -475,12 +479,8 @@ that are hard even for humans to classify correctly).
show(plot_figure)
.. bokeh-plot:: bokeh_digits_plot.py
:source-position: 'none'



.. raw:: html
:file: basic_usage_bokeh_example.html

As can be seen, the nines that blend between the ones and the sevens are
odd looking nines (that aren't very rounded) and do, indeed, interpolate
Expand Down
48 changes: 48 additions & 0 deletions doc/basic_usage_bokeh_example.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinx.ext.viewcode',
'bokeh.sphinxext.bokeh_plot',
# 'bokeh.sphinxext.bokeh_plot',
'sphinx_gallery.gen_gallery', ]

# Add any paths that contain templates here, relative to this directory.
Expand Down
30 changes: 15 additions & 15 deletions doc/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ for basic array manipulation. Since we will be visualising the results
we will need ``matplotlib`` and ``seaborn``. Finally we will need
``umap`` for doing the dimension reduction itself.

.. code:: ipython3
.. code:: python3
import numpy as np
import matplotlib.pyplot as plt
Expand All @@ -25,7 +25,7 @@ we will need ``matplotlib`` and ``seaborn``. Finally we will need
import umap
%matplotlib inline
.. code:: ipython3
.. code:: python3
sns.set(style='white', context='poster', rc={'figure.figsize':(14,10)})
Expand All @@ -38,7 +38,7 @@ each point can colored according to its 4-dimensional value. For this we
can use ``numpy``. We will fix a random seed for the sake of
consistency.

.. code:: ipython3
.. code:: python3
np.random.seed(42)
data = np.random.rand(800, 4)
Expand All @@ -47,7 +47,7 @@ Now we need to find a low dimensional representation of the data. As in
the Basic Usage documentation, we can do this by using the
:meth:`~umap.umap_.UMAP.fit_transform` method on a :class:`~umap.umap_.UMAP` object.

.. code:: ipython3
.. code:: python3
fit = umap.UMAP()
%time u = fit.fit_transform(data)
Expand All @@ -64,7 +64,7 @@ We can visualise the result by using ``matplotlib`` to draw a scatter
plot of ``u``. We can color each point of the scatter plot by the
associated 4-dimensional color from the source data.

.. code:: ipython3
.. code:: python3
plt.scatter(u[:,0], u[:,1], c=data)
plt.title('UMAP embedding of random colours');
Expand Down Expand Up @@ -93,7 +93,7 @@ in turn. To make exploration simpler we will first write a short utility
function that can fit the data with UMAP given a set of parameter
choices, and plot the result.

.. code:: ipython3
.. code:: python3
def draw_umap(n_neighbors=15, min_dist=0.1, n_components=2, metric='euclidean', title=''):
fit = umap.UMAP(
Expand Down Expand Up @@ -133,7 +133,7 @@ range of ``n_neighbors`` values. The default value of ``n_neighbors``
for UMAP (as used above) is 15, but we will look at values ranging from
2 (a very local view of the manifold) up to 200 (a quarter of the data).

.. code:: ipython3
.. code:: python3
for n in (2, 5, 10, 20, 50, 100, 200):
draw_umap(n_neighbors=n, title='n_neighbors = {}'.format(n))
Expand Down Expand Up @@ -207,7 +207,7 @@ instead.
The default value for ``min_dist`` (as used above) is 0.1. We will look
at a range of values from 0.0 through to 0.99.

.. code:: ipython3
.. code:: python3
for d in (0.0, 0.1, 0.25, 0.5, 0.8, 0.99):
draw_umap(min_dist=d, title='min_dist = {}'.format(d))
Expand Down Expand Up @@ -262,7 +262,7 @@ the data in a line. For visualisation purposes we will randomly
distribute the data on the y-axis to provide some separation between
points.

.. code:: ipython3
.. code:: python3
draw_umap(n_components=1, title='n_components = 1')
Expand All @@ -273,7 +273,7 @@ points.
Now we will try ``n_components=3``. For visualisation we will make use
of ``matplotlib``'s basic 3-dimensional plotting.

.. code:: ipython3
.. code:: python3
draw_umap(n_components=3, title='n_components = 3')
Expand Down Expand Up @@ -328,14 +328,14 @@ metrics as long as those metrics can be compiled in ``nopython`` mode by
numba. For this notebook we will be looking at such custom metrics. To
define such metrics we'll need numba ...

.. code:: ipython3
.. code:: python3
import numba
For our first custom metric we'll define the distance to be the absolute
value of difference in the red channel.

.. code:: ipython3
.. code:: python3
@numba.njit()
def red_channel_dist(a,b):
Expand All @@ -345,7 +345,7 @@ To get more adventurous it will be useful to have some colorspace
conversion -- to keep things simple we'll just use HSL formulas to
extract the hue, saturation, and lightness from an (R,G,B) tuple.

.. code:: ipython3
.. code:: python3
@numba.njit()
def hue(r, g, b):
Expand Down Expand Up @@ -381,7 +381,7 @@ measures the difference in hue, the second measures the euclidean
distance in a combined saturation and lightness space, while the third
measures distance in the full HSL space.

.. code:: ipython3
.. code:: python3
@numba.njit()
def hue_dist(a, b):
Expand Down Expand Up @@ -415,7 +415,7 @@ that ``numba`` provides significant flexibility in what we can do in
defining distance functions. Despite this we retain the high performance
we expect from UMAP even using such custom functions.

.. code:: ipython3
.. code:: python3
for m in ("euclidean", red_channel_dist, sl_dist, hue_dist, hsl_dist):
name = m if type(m) is str else m.__name__
Expand Down

0 comments on commit aaefee0

Please sign in to comment.