Skip to content

Commit

Permalink
Fix code blocks in recent docs
Browse files Browse the repository at this point in the history
  • Loading branch information
lmcinnes committed Sep 17, 2019
1 parent e5cbac0 commit ef4b718
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 31 deletions.
26 changes: 13 additions & 13 deletions doc/inverse_transform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ UMAP has some support for inverse transforms -- generating a high
dimensional data sample given a location in the low dimensional
embedding space. To start let's load all the relavent libraries.

.. code:: ipython3
.. code:: python3
import numpy as np
import matplotlib.pyplot as plt
Expand All @@ -23,7 +23,7 @@ dimension of this dataset to something small, and then see if we can
generate new digits by sampling points from the embedding space. To load
the MNIST dataset we'll make use of sklearn's ``fetch_openml`` function.

.. code:: ipython3
.. code:: python3
data, labels = sklearn.datasets.fetch_openml('mnist_784', version=1, return_X_y=True)
Expand All @@ -33,15 +33,15 @@ This is straightforward with umap, but in this case rather than using
trained model for later generating new digits based on samples from the
embedding space.

.. code:: ipython3
.. code:: python3
mapper = umap.UMAP(random_state=42).fit(data)
To ensure that things worked correctly we can plot the data (since we
reduced it to two dimensions). We'll use the ``umap.plot`` functionality
to do this.

.. code:: ipython3
.. code:: python3
umap.plot.points(mapper, labels=labels)
Expand All @@ -58,7 +58,7 @@ four corner points. To make out selection interesting we'll carefully
choose the corners to span over the dataset, and sample different digits
so that we can better see the transitions.

.. code:: ipython3
.. code:: python3
corners = np.array([
[-5, -10], # 1
Expand All @@ -84,7 +84,7 @@ trained model and passing it the set of test points that we want to
convert into high dimensional representations. Be warned that this can
be quite expensive computationally.

.. code:: ipython3
.. code:: python3
inv_transformed_points = mapper.inverse_transform(test_pts)
Expand All @@ -99,7 +99,7 @@ and finally a grid of the images we generated (converting the inverse
transformed vectors into images is just a matter of reshaping them back
to 28 by 28 pixel grids and using ``imshow``).

.. code:: ipython3
.. code:: python3
# Set up the grid
fig = plt.figure(figsize=(12,6))
Expand Down Expand Up @@ -151,19 +151,19 @@ the bounds about the embedding you will likely get strange results
Let's continue the demonstration by looking at the Fashion MNIST
dataset. As before we can load this through sklearn.

.. code:: ipython3
.. code:: python3
data, labels = sklearn.datasets.fetch_openml('Fashion-MNIST', version=1, return_X_y=True)
Again we can fit this data with UMAP and get a mapper object.

.. code:: ipython3
.. code:: python3
mapper = umap.UMAP(random_state=42).fit(data)
Let's plot the embedding to see what we got as a result:

.. code:: ipython3
.. code:: python3
umap.plot.points(mapper, labels=labels)
Expand All @@ -178,7 +178,7 @@ between four corners. As before we'll select the corners so that we can
stay within the convex hull of the embedding points and ensure nothign
to strange happens with the inverse transforms.

.. code:: ipython3
.. code:: python3
corners = np.array([
[-2, -6], # bags
Expand All @@ -198,14 +198,14 @@ Now we simply apply the inverse transform just as before. Again, be
warned, this is quite expensive computationally and may take some time
to complete.

.. code:: ipython3
.. code:: python3
inv_transformed_points = mapper.inverse_transform(test_pts)
And now we can use similar code as above to set up out plot of the
embedding with test points overlaid, and the generated images.

.. code:: ipython3
.. code:: python3
# Set up the grid
fig = plt.figure(figsize=(12,6))
Expand Down
36 changes: 18 additions & 18 deletions doc/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ and we'll use sklearn for that (specifically
``sklearn.feature_extraction.text``). Beyond that we'll need umap, and
plotting tools.

.. code:: ipython3
.. code:: python3
import numpy as np
import scipy.sparse
Expand Down Expand Up @@ -56,7 +56,7 @@ single call to ``primerange``. We'll also need a dictionary mapping the
different primes to the column number they correspond to in our data
structure; effectively we'll just be enumerating the primes.

.. code:: ipython3
.. code:: python3
primes = list(sympy.primerange(2, 110000))
prime_to_column = {p:i for i, p in enumerate(primes)}
Expand Down Expand Up @@ -87,7 +87,7 @@ to insert into a matrix. Since we are only concerned with divisibility
this will simply be a one in every non-zero entry, so we can just add a
list of ones of the appropriate length for each row.

.. code:: ipython3
.. code:: python3
%%time
lil_matrix_rows = []
Expand Down Expand Up @@ -115,7 +115,7 @@ to be the corresponding structure of values (all ones). The result is a
sparse matrix data structure which can then be easily manipulated and
converted into other sparse matrix formats easily.

.. code:: ipython3
.. code:: python3
factor_matrix = scipy.sparse.lil_matrix((len(lil_matrix_rows), len(primes)), dtype=np.float32)
factor_matrix.rows = np.array(lil_matrix_rows)
Expand Down Expand Up @@ -144,7 +144,7 @@ straightforward -- we just hand it directly to the fit method. Just like
other sklearn estimators that can handle sparse input UMAP will detect
the sparse matrix and just do the right thing.

.. code:: ipython3
.. code:: python3
%%time
mapper = umap.UMAP(metric='cosine', random_state=42, low_memory=True).fit(factor_matrix)
Expand All @@ -158,7 +158,7 @@ the sparse matrix and just do the right thing.
That was easy! But is it really working? We can easily plot the results:

.. code:: ipython3
.. code:: python3
umap.plot.points(mapper, values=np.arange(100000), theme='viridis')
Expand All @@ -177,7 +177,7 @@ that we'll need some more data. Fortunately there are more integers.
We'll grab the next 10,000 and put them in a sparse matrix, much as we
did for the first 100,000.

.. code:: ipython3
.. code:: python3
%%time
lil_matrix_rows = []
Expand All @@ -194,7 +194,7 @@ did for the first 100,000.
Wall time: 222 ms
.. code:: ipython3
.. code:: python3
new_data = scipy.sparse.lil_matrix((len(lil_matrix_rows), len(primes)), dtype=np.float32)
new_data.rows = np.array(lil_matrix_rows)
Expand All @@ -215,15 +215,15 @@ To map the new data we generated we can simply hand it to the
``transform`` method of our trained model. This is a little slow, but it
does work.

.. code:: ipython3
.. code:: python3
new_data_embedding = mapper.transform(new_data)
And we can plot the results. Since we just got the locations of the
points this time (rather than a model) we'll have to resort to
matplotlib for plotting.

.. code:: ipython3
.. code:: python3
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(111)
Expand Down Expand Up @@ -274,7 +274,7 @@ easily fetch the data, and, in fact, we can fetch a pre-vectorized
version to save us the trouble of running ``CountVectorizer`` ourselves.
We'll grab both the training set, and the test set for later use.

.. code:: ipython3
.. code:: python3
news_train = sklearn.datasets.fetch_20newsgroups_vectorized(subset='train')
news_test = sklearn.datasets.fetch_20newsgroups_vectorized(subset='test')
Expand All @@ -283,7 +283,7 @@ If we look at the actual data we have pulled back, we'll see that
sklearn has run a ``CountVectorizer`` and produced the data is sparse
matrix format.

.. code:: ipython3
.. code:: python3
news_train.data
Expand Down Expand Up @@ -325,7 +325,7 @@ their associated columns up-weighted. We can apply this transformation
to both the train and test sets (using the same transformer trained on
the training set).

.. code:: ipython3
.. code:: python3
tfidf = sklearn.feature_extraction.text.TfidfTransformer(norm='l1').fit(news_train.data)
train_data = tfidf.transform(news_train.data)
Expand All @@ -334,7 +334,7 @@ the training set).
The result is still a sparse matrix, since TF-IDF doesn't change the
zero elements at all, nor the number of features.

.. code:: ipython3
.. code:: python3
train_data
Expand All @@ -355,7 +355,7 @@ need to use other techniques to reduce the data to be able to be
represented as a dense ``numpy`` array; we can work directly on the
130,000 dimensional sparse matrix.

.. code:: ipython3
.. code:: python3
%%time
mapper = umap.UMAP(metric='hellinger', random_state=42).fit(train_data)
Expand All @@ -370,7 +370,7 @@ represented as a dense ``numpy`` array; we can work directly on the
Now we can plot the results, with labels according to the target
variable of the data -- which newsgroup the posting was drawn from.

.. code:: ipython3
.. code:: python3
umap.plot.points(mapper, labels=news_train.target)
Expand All @@ -387,14 +387,14 @@ many of the different newsgroups.
We can now attempt to add the test data to the same space using the
``transform`` method.

.. code:: ipython3
.. code:: python3
test_embedding = mapper.transform(test_data)
While this is somewhat expensive computationally, it does work, and we
can plot the end result:

.. code:: ipython3
.. code:: python3
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(111)
Expand Down
1 change: 1 addition & 0 deletions umap/umap_.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ def nearest_neighbors(
random_state=random_state,
n_trees=n_trees,
n_iters=n_iters,
n_jobs=-1,
max_candidates=60,
low_memory=low_memory,
verbose=verbose,
Expand Down

0 comments on commit ef4b718

Please sign in to comment.