1. The benefits of reducing the dimensions of a dataset is that training speeds are sped up and it makes it possible to graph high deimension data. The higher the dimensions (# of features) the greater the risk of overfitting. The main drawbacks are that information is lost (akin to compressing files) which means the system will perform slightly worse, and the code gets more complex because of larger pipelines. 

2. The **curse of dimensionality** is that the higher the dimensions, the greater the risk of overfitting is. 

3. It is almost impossible to perfectly revert back to the original dataset prior to dimensionality reduction because of loss of information when reducing dimensions. PCA has a procedure that reverses dim-reduction.

4. Yes it can because the goal of PCA is to get rid of the useless dimensions without losing too much information. 

5. ~~Around 150 dimensions accroding to the graph on figure 8-8 explained variance vs # of dims~~ The number of dimensions required to preserve a certain percentage of the variance varies with the dataset. For example, one dimension could be enough to preserve 95% of the variance on a particular dataset, while on another dataset with data in perfectly random points, there it would take 95% of the dataset's instances to preserve 95% variance. 

6. 
    - regular PCA: the default option. Use if dataset fits in memory
    - Incremental PCA: use if dataset **does not** fit in memory. Basically online PCA. Slower that regular PCA
    - Randomized PCA: use when to goal dimension is significantly lower than the original and there is enough memory for the dataset
    - Kernel PCA: for nonlinear datasets
    
7. To evaluate dimensionality reduction, compute the MSE of the reconstruction pre-image of the reduced dataset with the original dataset (reconstruction error) if the dim-reduce technique has reconstruction methods. If not, then train a model using the reduced dataset and measure its performance.

8. Chaining different dim-reduction algorithms makes sense. For example, PCA can be used to initially prune out the useless dimensions and then a much slower dim-reduction algo like LLE. This will speed up the time to reduce the dataset can still preserve the same variance as just solely using LLE.

### 9. 

In [113]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X, y = digits.data, digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.1)
X_train.shape

(1617, 64)

In [114]:
import time

rf_clf1 = RandomForestClassifier(max_depth=10, random_state=42)

start = time.time()
rf_clf1.fit(X_train, y_train)
stop = time.time()
print(f"Training time: {stop - start}s")

Training time: 0.2374420166015625s


In [115]:
rf_clf1.score(X_test, y_test)

0.9833333333333333

In [116]:
from sklearn.decomposition import PCA

pca_reducer = PCA(n_components=0.95)

In [117]:
X_train_new = pca_reducer.fit_transform(X_train)
X_train_new.shape

(1617, 29)

In [118]:
rf_clf2 = RandomForestClassifier(max_depth=10, random_state=42)

start = time.time()
rf_clf2.fit(X_train_new, y_train)
stop = time.time()
print(f"Training time: {stop - start}s")

Training time: 0.4412369728088379s


Training time is actually almost twice as slow!!! This is an example that **dimensionality reduction does not always lead to faster training time.** *It really depends on the dataset*

In [119]:
X_test_new = pca_reducer.transform(X_test)
print(X_test_new.shape)
rf_clf2.score(X_test_new, y_test)

(180, 29)


0.9722222222222222

lower accuracy because of loss of information from dimensionality reduction

#### dimensionality can make models train faster depending on the dataset. Here's an example of speed improvement from dim-reduction using PCA

In [120]:
from sklearn.linear_model import LogisticRegression

soft_clf1 = LogisticRegression(multi_class="multinomial", random_state=42)

start = time.time()
soft_clf1.fit(X_train, y_train)
stop = time.time()
print(f"Training time: {stop - start}s")

Training time: 0.08647704124450684s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [121]:
soft_clf1.score(X_test, y_test)

0.9722222222222222

In [122]:
soft_clf2 = LogisticRegression(multi_class="multinomial", random_state=42)

start = time.time()
soft_clf2.fit(X_train_new, y_train)
stop = time.time()
print(f"Training time: {stop - start}s")

Training time: 0.08101677894592285s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [123]:
soft_clf2.score(X_test_new, y_test)

0.9333333333333333