## svm.SVR from SAS® Viya® on Bonus
### Data Preparation
#### About the data set
This data set contains the bonuses for 10 job positions at a hypothetical company.

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import warnings

# Filter out UserWarning messages
warnings.filterwarnings("ignore", category=UserWarning)
# Filter out DeprecationWarning messages
warnings.filterwarnings("ignore", category=DeprecationWarning)

#### Importing the data set

In [None]:
workspace=f'{os.path.abspath("")}/../data/'
dataset = pd.read_csv(workspace + "bonus.csv")

X = dataset.iloc[:,1:2].values.astype(float)
y = dataset.iloc[:,2:3].values.astype(float)

In [None]:
dataset

### Building the Support Vector Regression Model

For details about using the `SVR` class of the `sasviya` package, see the [SVR documentation](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=p14qlscxhb7i70n196xmpynf7lay.htm).

The kernel can be a number of options, but we will select poly due to the non-linear features in the data.

In [None]:
from sasviya.ml.svm import SVR
regressor = SVR(kernel='poly')
regressor.fit(X, y.ravel())

In [None]:
regressor.get_params()

### Evaluating the Model
In order to get a sense of the model, we will predict the expected bonus for a job that appears between Job 2 and Job 3 by using a value of 2.5.

In [None]:
y_pred = regressor.predict(np.array([2.5]).reshape(1, -1))

In [None]:
y_pred

The prediction output is about 3059.  The model seems to be overpredicting bonuses for lower job classifications. 

In [None]:
plt.scatter(X, y, color = 'magenta')
plt.plot(X, regressor.predict(X), color = 'green')
plt.title('Predicted Bonus vs Observed (SVR)')
plt.xlabel('Job Code')
plt.ylabel('Bonus')
plt.show()

### Updating the Model

#### Processing the data
By default, `SVR` automatically uses min-max scaling to scale the features to [0,1] range. Let's try using the sklearn `StandardScaler` class to normalize features prior to modeling. 

In [None]:
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

#### Creating the model with transformed data

In [None]:
regressor = SVR(kernel='poly', scale = False)
regressor.fit(X, y.ravel())

In [None]:
plt.scatter(X, y, color = 'magenta')
plt.plot(X, regressor.predict(X), color = 'green')
plt.title('Predicted Bonus vs Observed (SVR)')
plt.xlabel('Job Code')
plt.ylabel('Bonus')
plt.show()

In [None]:
# Predicting a new result for the input value 2.5
input_value = np.array([[2.5]])  # Reshape the input to a 2D array with shape (1, 1)
input_value_scaled = sc_X.transform(input_value.reshape(-1, 1))
y_pred_scaled = regressor.predict(input_value_scaled)

# Convert y_pred_scaled from Series to NumPy array and reshape
y_pred_scaled_array = y_pred_scaled.reshape(-1, 1)

# Inverse transform the scaled prediction to the original scale
y_pred = sc_y.inverse_transform(y_pred_scaled_array)

In [None]:
y_pred

By using `.transform()` and `.inverse_transform()` methods to convert the scaled values into the original ranges, the new prediction value is about 2744.  This value is between the bonuses of Job 2 and Job 3, so it is likely a better fit for the original data.