Using NumPy’s `svd()` function to obtain all the principal
components of the training set, then extracts the two unit vectors that define the first
two PCs.

In [None]:
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered)
c1 = Vt.T[:, 0]
c2 = Vt.T[:, 1]

Projecting the training set onto the plane defined by the first
two principal components.

In [None]:
W2 = Vt.T[:, :2]
X2D = X_centered.dot(W2)

Applying PCA to reduce the dimensionality
of the dataset down to two dimensions.

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)

Displaying the explained variance ratios of the first two
components of the 3D dataset.

In [None]:
>>> pca.explained_variance_ratio_

Performing PCA without reducing dimensionality, then computes
the minimum number of dimensions required to preserve 95% of the training set’s
variance.

In [None]:
pca = PCA()
pca.fit(X_train)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1

Setting `n_components` to be a float between 0.0 and 1.0, indicating the ratio
of variance you wish to preserve instead of specifying the number of principal components you want to preserve.

In [None]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X_train)

Compressing the MNIST dataset down to 154 dimensions, then
uses the `inverse_transform()` method to decompress it back to 784 dimensions.

In [None]:
pca = PCA(n_components = 154)
X_reduced = pca.fit_transform(X_train)
X_recovered = pca.inverse_transform(X_reduced)

Setting the `svd_solver` hyperparameter to "randomized", Scikit-Learn uses a stochastic algorithm called Randomized PCA that quickly finds an approximation of the
first d principal components.

In [None]:
rnd_pca = PCA(n_components=154, svd_solver="randomized")
X_reduced = rnd_pca.fit_transform(X_train)

Splitting the MNIST dataset into 100 mini-batches (using NumPy’s
`array_split()` function) and feeds them to Scikit-Learn’s IncrementalPCA class5
to
reduce the dimensionality of the MNIST dataset down to 154 dimensions.

In [None]:
from sklearn.decomposition import IncrementalPCA

n_batches = 100
inc_pca = IncrementalPCA(n_components=154)
for X_batch in np.array_split(X_train, n_batches):
    inc_pca.partial_fit(X_batch)

X_reduced = inc_pca.transform(X_train)

Using NumPy’s `memmap` class, which allows you to manipulate a
large array stored in a binary file on disk as if it were entirely in memory; the class
loads only the data it needs in memory, when it needs it.

In [None]:
X_mm = np.memmap(filename, dtype="float32", mode="readonly", shape=(m, n))

batch_size = m // n_batches
inc_pca = IncrementalPCA(n_components=154, batch_size=batch_size)
inc_pca.fit(X_mm)

Using Scikit-Learn’s `KernelPCA` class to perform kPCA with an RBF
kernel.

In [None]:
from sklearn.decomposition import KernelPCA

rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.04)
X_reduced = rbf_pca.fit_transform(X)

Creating a twostep pipeline, first reducing dimensionality to two dimensions using kPCA, then
applying Logistic Regression for classification. Then it uses GridSearchCV to find the
best kernel and gamma value for kPCA in order to get the best classification accuracy
at the end of the pipeline.

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

clf = Pipeline([
        ("kpca", KernelPCA(n_components=2)),
        ("log_reg", LogisticRegression())
    ])

param_grid = [{
        "kpca__gamma": np.linspace(0.03, 0.05, 10),
        "kpca__kernel": ["rbf", "sigmoid"]
    }]

grid_search = GridSearchCV(clf, param_grid, cv=3)
grid_search.fit(X, y)

In [None]:
>>> print(grid_search.best_params_)

Train a
supervised regression model, with the projected instances as the training set and the
original instances as the targets that Scikit-Learn will do this automatically if you set
`fit_inverse_transform=True`.

In [None]:
rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.0433,
 fit_inverse_transform=True)
X_reduced = rbf_pca.fit_transform(X)
X_preimage = rbf_pca.inverse_transform(X_reduced)

Compute the reconstruction pre-image error.

In [None]:
>>> from sklearn.metrics import mean_squared_error
>>> mean_squared_error(X, X_preimage)

Using Scikit-Learn’s LocallyLinearEmbedding class to unroll the
Swiss roll.

In [None]:
from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced = lle.fit_transform(X)