### PCA and SVM

Previous example: [/examples/shallow/random_forests.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/random_forests.ipynb)  
Next example: [/examples/shallow/grid_search_and_knn.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/grid_search_and_knn.ipynb)

In [1]:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

In [2]:
num_of_samples = 2 ** 11
num_of_real_features = 3
num_of_thrash_features = 5
num_of_classes = 4
common_std = 3

In [3]:
data_real, gt = make_blobs(n_samples=num_of_samples, n_features=num_of_real_features, centers=num_of_classes, cluster_std=common_std)
data_thrash = np.random.normal(0, common_std, (num_of_samples, num_of_thrash_features))
data_aug = np.hstack((data_real, data_thrash))

def print_general_info(input, msg): print(f'{msg:15}: {type(input)}, {input.shape}, {input.dtype}')
def print_stats(data, msg): print(f'{msg:15}: Mean: {np.mean(data, axis=0)}, STD: {np.std(data, axis=0)}')

print_general_info(data_real, 'data_real')
print_general_info(data_thrash, 'data_thrash')
print_general_info(data_aug, 'data_aug')
print_general_info(gt, 'gt')

data_real      : <class 'numpy.ndarray'>, (2048, 3), float64
data_thrash    : <class 'numpy.ndarray'>, (2048, 5), float64
data_aug       : <class 'numpy.ndarray'>, (2048, 8), float64
gt             : <class 'numpy.ndarray'>, (2048,), int64


In [4]:
print('Real:')
X_train_real, X_test_real, y_train_real, y_test_real = train_test_split(data_real, gt, test_size=.85, random_state=1)
print_general_info(X_train_real, 'X_train_real')
print_general_info(X_test_real, 'X_test_real')
print_general_info(y_train_real, 'y_train_real')
print_general_info(y_test_real, 'y_test_real')
print_stats(X_train_real, 'X_train_real')

print('\nThrash:')
X_train_thrash, X_test_thrash, y_train_thrash, y_test_thrash = train_test_split(data_thrash, gt, test_size=.85, random_state=1)
print_general_info(X_train_thrash, 'X_train_thrash')
print_general_info(X_test_thrash, 'X_test_thrash')
print_general_info(y_train_thrash, 'y_train_thrash')
print_general_info(y_test_thrash, 'y_test_thrash')
print_stats(X_train_thrash, 'X_train_thrash')

print('\nAugmented:')
X_train_aug, X_test_aug, y_train_aug, y_test_aug = train_test_split(data_aug, gt, test_size=.85, random_state=1)
print_general_info(X_train_aug, 'X_train_aug')
print_general_info(X_test_aug, 'X_test_aug')
print_general_info(y_train_aug, 'y_train_aug')
print_general_info(y_test_aug, 'y_test_aug')
print_stats(X_train_aug, 'X_train_aug')

Real:
X_train_real   : <class 'numpy.ndarray'>, (307, 3), float64
X_test_real    : <class 'numpy.ndarray'>, (1741, 3), float64
y_train_real   : <class 'numpy.ndarray'>, (307,), int64
y_test_real    : <class 'numpy.ndarray'>, (1741,), int64
X_train_real   : Mean: [ 3.40590754 -4.17690549 -0.1857369 ], STD: [7.61220695 5.13348313 6.29651449]

Thrash:
X_train_thrash : <class 'numpy.ndarray'>, (307, 5), float64
X_test_thrash  : <class 'numpy.ndarray'>, (1741, 5), float64
y_train_thrash : <class 'numpy.ndarray'>, (307,), int64
y_test_thrash  : <class 'numpy.ndarray'>, (1741,), int64
X_train_thrash : Mean: [0.18822775 0.2655373  0.14434949 0.21399522 0.06651125], STD: [2.92809975 3.1019975  2.97671555 3.05662955 3.0982358 ]

Augmented:
X_train_aug    : <class 'numpy.ndarray'>, (307, 8), float64
X_test_aug     : <class 'numpy.ndarray'>, (1741, 8), float64
y_train_aug    : <class 'numpy.ndarray'>, (307,), int64
y_test_aug     : <class 'numpy.ndarray'>, (1741,), int64
X_train_aug    : Mean: [ 3

In [5]:
pca_to4 = PCA(n_components=4)
pca_to3 = PCA(n_components=3)
pca_to2 = PCA(n_components=2)
pca_to1 = PCA(n_components=1)

pca_to4.fit(X_train_aug)
pca_to3.fit(X_train_aug)
pca_to2.fit(X_train_aug)
pca_to1.fit(X_train_aug)

X_train_reduced_to4 = pca_to4.transform(X_train_aug)
X_test_reduced_to4 = pca_to4.transform(X_test_aug)
X_train_reduced_to3 = pca_to3.transform(X_train_aug)
X_test_reduced_to3 = pca_to3.transform(X_test_aug)
X_train_reduced_to2 = pca_to2.transform(X_train_aug)
X_test_reduced_to2 = pca_to2.transform(X_test_aug)
X_train_reduced_to1 = pca_to1.transform(X_train_aug)
X_test_reduced_to1 = pca_to1.transform(X_test_aug)

In [6]:
def disp_results(func, gt, pred, msg): print(f'{func.__name__} of {msg}:\n{func(gt, pred)}')

def analyse(X_train, X_test, y_train, y_test):
    model = SVC()
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    disp_results(classification_report, y_train, y_train_pred, 'training data')
    disp_results(confusion_matrix, y_train, y_train_pred, 'training data')
    print()
    disp_results(classification_report, y_test, y_test_pred, 'test data')
    disp_results(confusion_matrix, y_test, y_test_pred, 'test data')

In [7]:
print('Real:'); analyse(X_train_real, X_test_real, y_train_real, y_test_real)

Real:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.66      0.73      0.69        84
           2       0.72      0.68      0.70        78
           3       0.91      0.86      0.89        72

    accuracy                           0.81       307
   macro avg       0.82      0.82      0.82       307
weighted avg       0.82      0.81      0.81       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 61 19  4]
 [ 0 23 53  2]
 [ 0  8  2 62]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.75      0.72      0.73       428
           2       0.76      0.83      0.79       434
           3       0.90      0.87      0.89       440

    accuracy                           0.85      1741
   macro avg       0.85      0.85      0.85      1741
weight

In [8]:
print('Thrash:'); analyse(X_train_thrash, X_test_thrash, y_train_thrash, y_test_thrash)

Thrash:
classification_report of training data:
              precision    recall  f1-score   support

           0       0.49      0.27      0.35        73
           1       0.45      0.68      0.54        84
           2       0.55      0.59      0.57        78
           3       0.53      0.42      0.47        72

    accuracy                           0.50       307
   macro avg       0.51      0.49      0.48       307
weighted avg       0.50      0.50      0.49       307

confusion_matrix of training data:
[[20 27 14 12]
 [ 6 57 13  8]
 [ 5 20 46  7]
 [10 22 10 30]]

classification_report of test data:
              precision    recall  f1-score   support

           0       0.20      0.11      0.14       439
           1       0.23      0.38      0.29       428
           2       0.25      0.32      0.28       434
           3       0.28      0.16      0.20       440

    accuracy                           0.24      1741
   macro avg       0.24      0.24      0.23      1741
weig

In [9]:
print('Augmented:'); analyse(X_train_aug, X_test_aug, y_train_aug, y_test_aug)

Augmented:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.75      0.79      0.77        84
           2       0.77      0.74      0.76        78
           3       0.92      0.90      0.91        72

    accuracy                           0.85       307
   macro avg       0.86      0.86      0.86       307
weighted avg       0.85      0.85      0.85       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 66 14  4]
 [ 0 18 58  2]
 [ 0  4  3 65]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.75      0.73      0.74       428
           2       0.77      0.82      0.79       434
           3       0.91      0.85      0.88       440

    accuracy                           0.85      1741
   macro avg       0.85      0.85      0.85      1741
w

In [10]:
print('Reduced to 4:'); analyse(X_train_reduced_to4, X_test_reduced_to4, y_train_aug, y_test_aug)

Reduced to 4:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.68      0.75      0.71        84
           2       0.74      0.69      0.72        78
           3       0.93      0.88      0.90        72

    accuracy                           0.82       307
   macro avg       0.84      0.83      0.83       307
weighted avg       0.83      0.82      0.83       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 63 17  4]
 [ 0 23 54  1]
 [ 0  7  2 63]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.72      0.71      0.71       428
           2       0.75      0.80      0.78       434
           3       0.91      0.87      0.89       440

    accuracy                           0.85      1741
   macro avg       0.85      0.85      0.85      174

In [11]:
print('Reduced to 3:'); analyse(X_train_reduced_to3, X_test_reduced_to3, y_train_aug, y_test_aug)

Reduced to 3:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.65      0.75      0.70        84
           2       0.72      0.64      0.68        78
           3       0.91      0.86      0.89        72

    accuracy                           0.81       307
   macro avg       0.82      0.81      0.82       307
weighted avg       0.81      0.81      0.81       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 63 17  4]
 [ 0 26 50  2]
 [ 0  8  2 62]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.72      0.70      0.71       428
           2       0.75      0.80      0.77       434
           3       0.90      0.86      0.88       440

    accuracy                           0.84      1741
   macro avg       0.84      0.84      0.84      174

In [12]:
print('Reduced to 2:'); analyse(X_train_reduced_to2, X_test_reduced_to2, y_train_aug, y_test_aug)

Reduced to 2:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.66      0.68      0.67        84
           2       0.70      0.69      0.70        78
           3       0.90      0.88      0.89        72

    accuracy                           0.80       307
   macro avg       0.81      0.81      0.81       307
weighted avg       0.81      0.80      0.81       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 57 21  6]
 [ 0 23 54  1]
 [ 0  7  2 63]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.70      0.72      0.71       428
           2       0.74      0.79      0.77       434
           3       0.91      0.83      0.87       440

    accuracy                           0.84      1741
   macro avg       0.84      0.83      0.84      174

In [13]:
print('Reduced to 1:'); analyse(X_train_reduced_to1, X_test_reduced_to1, y_train_aug, y_test_aug)

Reduced to 1:
classification_report of training data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        73
           1       0.66      0.75      0.70        84
           2       0.50      0.36      0.42        78
           3       0.58      0.67      0.62        72

    accuracy                           0.69       307
   macro avg       0.69      0.69      0.69       307
weighted avg       0.68      0.69      0.68       307

confusion_matrix of training data:
[[73  0  0  0]
 [ 0 63  9 12]
 [ 0 27 28 23]
 [ 0  5 19 48]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       439
           1       0.63      0.65      0.64       428
           2       0.37      0.33      0.35       434
           3       0.56      0.59      0.57       440

    accuracy                           0.65      1741
   macro avg       0.64      0.64      0.64      174

Previous example: [/examples/shallow/random_forests.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/random_forests.ipynb)  
Next example: [/examples/shallow/grid_search_and_knn.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/grid_search_and_knn.ipynb)