Fix code blocks in recent docs

lmcinnes · Sep 17, 2019 · ef4b718 · ef4b718
1 parent e5cbac0
commit ef4b718
Show file tree

Hide file tree

Showing 3 changed files with 32 additions and 31 deletions.
diff --git a/doc/inverse_transform.rst b/doc/inverse_transform.rst
@@ -6,7 +6,7 @@ UMAP has some support for inverse transforms -- generating a high
 dimensional data sample given a location in the low dimensional
 embedding space. To start let's load all the relavent libraries.
 
-.. code:: ipython3
+.. code:: python3
 
     import numpy as np
     import matplotlib.pyplot as plt
@@ -23,7 +23,7 @@ dimension of this dataset to something small, and then see if we can
 generate new digits by sampling points from the embedding space. To load
 the MNIST dataset we'll make use of sklearn's ``fetch_openml`` function.
 
-.. code:: ipython3
+.. code:: python3
 
     data, labels = sklearn.datasets.fetch_openml('mnist_784', version=1, return_X_y=True)
 
@@ -33,15 +33,15 @@ This is straightforward with umap, but in this case rather than using
 trained model for later generating new digits based on samples from the
 embedding space.
 
-.. code:: ipython3
+.. code:: python3
 
     mapper = umap.UMAP(random_state=42).fit(data)
 
 To ensure that things worked correctly we can plot the data (since we
 reduced it to two dimensions). We'll use the ``umap.plot`` functionality
 to do this.
 
-.. code:: ipython3
+.. code:: python3
 
     umap.plot.points(mapper, labels=labels)
 
@@ -58,7 +58,7 @@ four corner points. To make out selection interesting we'll carefully
 choose the corners to span over the dataset, and sample different digits
 so that we can better see the transitions.
 
-.. code:: ipython3
+.. code:: python3
 
     corners = np.array([
         [-5, -10],  # 1
@@ -84,7 +84,7 @@ trained model and passing it the set of test points that we want to
 convert into high dimensional representations. Be warned that this can
 be quite expensive computationally.
 
-.. code:: ipython3
+.. code:: python3
 
     inv_transformed_points = mapper.inverse_transform(test_pts)
 
@@ -99,7 +99,7 @@ and finally a grid of the images we generated (converting the inverse
 transformed vectors into images is just a matter of reshaping them back
 to 28 by 28 pixel grids and using ``imshow``).
 
-.. code:: ipython3
+.. code:: python3
 
     # Set up the grid
     fig = plt.figure(figsize=(12,6))
@@ -151,19 +151,19 @@ the bounds about the embedding you will likely get strange results
 Let's continue the demonstration by looking at the Fashion MNIST
 dataset. As before we can load this through sklearn.
 
-.. code:: ipython3
+.. code:: python3
 
     data, labels = sklearn.datasets.fetch_openml('Fashion-MNIST', version=1, return_X_y=True)
 
 Again we can fit this data with UMAP and get a mapper object.
 
-.. code:: ipython3
+.. code:: python3
 
     mapper = umap.UMAP(random_state=42).fit(data)
 
 Let's plot the embedding to see what we got as a result:
 
-.. code:: ipython3
+.. code:: python3
 
     umap.plot.points(mapper, labels=labels)
 
@@ -178,7 +178,7 @@ between four corners. As before we'll select the corners so that we can
 stay within the convex hull of the embedding points and ensure nothign
 to strange happens with the inverse transforms.
 
-.. code:: ipython3
+.. code:: python3
 
     corners = np.array([
         [-2, -6],  # bags
@@ -198,14 +198,14 @@ Now we simply apply the inverse transform just as before. Again, be
 warned, this is quite expensive computationally and may take some time
 to complete.
 
-.. code:: ipython3
+.. code:: python3
 
     inv_transformed_points = mapper.inverse_transform(test_pts)
 
 And now we can use similar code as above to set up out plot of the
 embedding with test points overlaid, and the generated images.
 
-.. code:: ipython3
+.. code:: python3
 
     # Set up the grid
     fig = plt.figure(figsize=(12,6))

diff --git a/doc/sparse.rst b/doc/sparse.rst
@@ -22,7 +22,7 @@ and we'll use sklearn for that (specifically
 ``sklearn.feature_extraction.text``). Beyond that we'll need umap, and
 plotting tools.
 
-.. code:: ipython3
+.. code:: python3
 
     import numpy as np
     import scipy.sparse
@@ -56,7 +56,7 @@ single call to ``primerange``. We'll also need a dictionary mapping the
 different primes to the column number they correspond to in our data
 structure; effectively we'll just be enumerating the primes.
 
-.. code:: ipython3
+.. code:: python3
 
     primes = list(sympy.primerange(2, 110000))
     prime_to_column = {p:i for i, p in enumerate(primes)}
@@ -87,7 +87,7 @@ to insert into a matrix. Since we are only concerned with divisibility
 this will simply be a one in every non-zero entry, so we can just add a
 list of ones of the appropriate length for each row.
 
-.. code:: ipython3
+.. code:: python3
 
     %%time
     lil_matrix_rows = []
@@ -115,7 +115,7 @@ to be the corresponding structure of values (all ones). The result is a
 sparse matrix data structure which can then be easily manipulated and
 converted into other sparse matrix formats easily.
 
-.. code:: ipython3
+.. code:: python3
 
     factor_matrix = scipy.sparse.lil_matrix((len(lil_matrix_rows), len(primes)), dtype=np.float32)
     factor_matrix.rows = np.array(lil_matrix_rows)
@@ -144,7 +144,7 @@ straightforward -- we just hand it directly to the fit method. Just like
 other sklearn estimators that can handle sparse input UMAP will detect
 the sparse matrix and just do the right thing.
 
-.. code:: ipython3
+.. code:: python3
 
     %%time
     mapper = umap.UMAP(metric='cosine', random_state=42, low_memory=True).fit(factor_matrix)
@@ -158,7 +158,7 @@ the sparse matrix and just do the right thing.
 
 That was easy! But is it really working? We can easily plot the results:
 
-.. code:: ipython3
+.. code:: python3
 
     umap.plot.points(mapper, values=np.arange(100000), theme='viridis')
 
@@ -177,7 +177,7 @@ that we'll need some more data. Fortunately there are more integers.
 We'll grab the next 10,000 and put them in a sparse matrix, much as we
 did for the first 100,000.
 
-.. code:: ipython3
+.. code:: python3
 
     %%time
     lil_matrix_rows = []
@@ -194,7 +194,7 @@ did for the first 100,000.
     Wall time: 222 ms
 
 
-.. code:: ipython3
+.. code:: python3
 
     new_data = scipy.sparse.lil_matrix((len(lil_matrix_rows), len(primes)), dtype=np.float32)
     new_data.rows = np.array(lil_matrix_rows)
@@ -215,15 +215,15 @@ To map the new data we generated we can simply hand it to the
 ``transform`` method of our trained model. This is a little slow, but it
 does work.
 
-.. code:: ipython3
+.. code:: python3
 
     new_data_embedding = mapper.transform(new_data)
 
 And we can plot the results. Since we just got the locations of the
 points this time (rather than a model) we'll have to resort to
 matplotlib for plotting.
 
-.. code:: ipython3
+.. code:: python3
 
     fig = plt.figure(figsize=(12,12))
     ax = fig.add_subplot(111)
@@ -274,7 +274,7 @@ easily fetch the data, and, in fact, we can fetch a pre-vectorized
 version to save us the trouble of running ``CountVectorizer`` ourselves.
 We'll grab both the training set, and the test set for later use.
 
-.. code:: ipython3
+.. code:: python3
 
     news_train = sklearn.datasets.fetch_20newsgroups_vectorized(subset='train')
     news_test = sklearn.datasets.fetch_20newsgroups_vectorized(subset='test')
@@ -283,7 +283,7 @@ If we look at the actual data we have pulled back, we'll see that
 sklearn has run a ``CountVectorizer`` and produced the data is sparse
 matrix format.
 
-.. code:: ipython3
+.. code:: python3
 
     news_train.data
 
@@ -325,7 +325,7 @@ their associated columns up-weighted. We can apply this transformation
 to both the train and test sets (using the same transformer trained on
 the training set).
 
-.. code:: ipython3
+.. code:: python3
 
     tfidf = sklearn.feature_extraction.text.TfidfTransformer(norm='l1').fit(news_train.data)
     train_data = tfidf.transform(news_train.data)
@@ -334,7 +334,7 @@ the training set).
 The result is still a sparse matrix, since TF-IDF doesn't change the
 zero elements at all, nor the number of features.
 
-.. code:: ipython3
+.. code:: python3
 
     train_data
 
@@ -355,7 +355,7 @@ need to use other techniques to reduce the data to be able to be
 represented as a dense ``numpy`` array; we can work directly on the
 130,000 dimensional sparse matrix.
 
-.. code:: ipython3
+.. code:: python3
 
     %%time
     mapper = umap.UMAP(metric='hellinger', random_state=42).fit(train_data)
@@ -370,7 +370,7 @@ represented as a dense ``numpy`` array; we can work directly on the
 Now we can plot the results, with labels according to the target
 variable of the data -- which newsgroup the posting was drawn from.
 
-.. code:: ipython3
+.. code:: python3
 
     umap.plot.points(mapper, labels=news_train.target)
 
@@ -387,14 +387,14 @@ many of the different newsgroups.
 We can now attempt to add the test data to the same space using the
 ``transform`` method.
 
-.. code:: ipython3
+.. code:: python3
 
     test_embedding = mapper.transform(test_data)
 
 While this is somewhat expensive computationally, it does work, and we
 can plot the end result:
 
-.. code:: ipython3
+.. code:: python3
 
     fig = plt.figure(figsize=(12,12))
     ax = fig.add_subplot(111)

diff --git a/umap/umap_.py b/umap/umap_.py
@@ -273,6 +273,7 @@ def nearest_neighbors(
                 random_state=random_state,
                 n_trees=n_trees,
                 n_iters=n_iters,
+                n_jobs=-1,
                 max_candidates=60,
                 low_memory=low_memory,
                 verbose=verbose,