## Principal Component Analysis
### Redução de dimensionalidade utilizando extração de features

É comum termos acesso a milhares de características (features). E.g. uma imagem colorida de tamanho 256 x 256 pixels pode ser transformada em um vetor de 196,608 features.

Felizmente, nem todas as features são necessárias e o objetido da extração de características para redução de dimensionalidade é transformar nosso conjunto de features originais em um novo conjunto de features menor, mas mantendo a maior parte da informação contida neles. Ou seja, reduzir o número de features com apenas uma pequena perda de informação.

#### Importando bibliotecas

In [1]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets

In [2]:
# Load the data
digits = datasets.load_digits()

In [3]:
# Standardize the feature matrix
features = StandardScaler().fit_transform(digits.data)

In [4]:
# Create a PCA that will retain 99% of variance
pca = PCA(n_components = 0.99)

In [5]:
# Conduct PCA
features_pca = pca.fit_transform(features)

In [6]:
# Show results
print("Original number of features: ", features.shape[1])
print("Reduced number of features: ", features_pca.shape[1])

Original number of features:  64
Reduced number of features:  54


## Linear Discriminant Analysis

In [10]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

In [11]:
# Load Iris flower dataset
iris = datasets.load_iris()
features = iris.data
target = iris.target

In [12]:
# Create and run an LDA, then use it to transform the features
lda = LinearDiscriminantAnalysis(n_components = 1)
features_lda = lda.fit(features, target).transform(features)

In [13]:
# Print the number of features
print("Original number of features: ", features.shape[1])
print("Reduced number of features: ", features_lda.shape[1])

Original number of features:  4
Reduced number of features:  1


#### Variancia explicada por cada componente

In [14]:
lda.explained_variance_ratio_

array([0.9912126])

In [15]:
pca.explained_variance_ratio_

array([0.12033916, 0.09561054, 0.08444415, 0.06498408, 0.04860155,
       0.0421412 , 0.03942083, 0.03389381, 0.02998221, 0.02932003,
       0.02781805, 0.02577055, 0.02275303, 0.0222718 , 0.02165229,
       0.01914167, 0.01775547, 0.01638069, 0.0159646 , 0.01489191,
       0.0134797 , 0.01271931, 0.01165837, 0.01057647, 0.00975316,
       0.00944559, 0.00863014, 0.00836643, 0.00797693, 0.00746471,
       0.00725582, 0.00691911, 0.00653909, 0.00640793, 0.00591384,
       0.00571162, 0.00523637, 0.00481808, 0.00453719, 0.00423163,
       0.00406053, 0.00397085, 0.00356493, 0.00340787, 0.00327835,
       0.00311032, 0.00288575, 0.00276489, 0.00259175, 0.00234483,
       0.00218257, 0.00203598, 0.00195512, 0.00183318])