# Support Vector Machine
Support vector machines are supervised machine learning methods that can be used for classification and regression. Currently cuML supports binary classification.

The SVC classifier can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames/Series as the input.

For information on converting your dataset to cuDF documentation: https://docs.rapids.ai/api/cudf/stable/

For more information about cuML's Support Vector Classifier: https://docs.rapids.ai/api/cuml/stable/

In [None]:
import numpy as np
import cuml.svm 
import sklearn.svm

from sklearn.datasets.samples_generator import make_gaussian_quantiles
from sklearn.model_selection import train_test_split

## Generate data

In [None]:
n_samples = 20000
n_features = 200

X, y = make_gaussian_quantiles(n_samples=n_samples, n_features=n_features, n_classes=2)

#X, y = make_classification(
#   n_rows, n_cols, n_informative=2, n_redundant=0,
#   n_classes=n_classes, n_clusters_per_class=2, shuffle=False)

X_train, X_test, y_train, y_test = train_test_split(X, y)

## Define parameters

In [None]:
C = 1
tol = 1e-3
kernel = 'rbf'
gamma = 'scale'

## cuML Model

In [None]:
%%time
cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)
cumlSVC.fit(X_train, y_train)

## Scikit-learn Model

In [None]:
%%time
sklSVC = sklearn.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)
sklSVC.fit(X_train, y_train)

## Prediction

In [None]:
%%time
cuml_pred = cumlSVC.predict(X_test)

In [None]:
%%time
skl_pred = sklSVC.predict(X_test)

## Compare Accuracy

In [None]:
cuml_accuracy = np.sum(cuml_pred.to_array()==y_test) / y_test.shape[0] * 100
skl_accuracy = np.sum(skl_pred==y_test) / y_test.shape[0] * 100
print("Accuracy: cumlSVC {}%, sklSVC {}%".format(cuml_accuracy, skl_accuracy))

## Notes
- The time measurements will be inaccurate for the first run. You can re-run the cells to get a better estimate of the execution time.

- Currently the output of the prediction is a cuDF Series object. You can use the `to_array()` method to create a numpy array.

- The training algorithm uses a cache in GPU memory to accelerate training. You can specify the size (in MiB) using the cache_size argument. This is more relevant for training with larger input size.

- Similar to other cuML algorithms, cuML SVC is optimized both for single and double precision input data. If your problem allows it, then using single precision input can improve the execution time

In [None]:
%%time
cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma='scale', cache_size=2000)
cumlSVC.fit(X_train.astype(np.float32), y_train)