### Regression Metrics

We cannot calculate **the accuracy for a regression model.**

The skill or performance of a regression model must be reported as an error in those predictions.

This makes sense if you think about it. If you are predicting a numeric value like a height or a dollar amount, you don’t want to know if the model predicted the value exactly (this might be intractably difficult in practice); instead, we want to know how close the predictions were to the expected values.

1. **Mean Absolute Error(MAE)**

MAE calculates the **absolute** difference between actual and predicted values.

To better understand, let’s take an example you have input data and output data and use Linear Regression, which draws a best-fit line.

Now you have to find the MAE of your model which is basically a mistake made by the model known as an error. Now find the difference between the actual value and predicted value that is an absolute error but we have to find the mean absolute of the complete dataset.

so, sum all the errors and divide them by a total number of observations And this is MAE. And we aim to get a minimum MAE because this is a loss.


![Mean Absolute Error](./img/mae.png)

```
from sklearn.metrics import mean_absolute_error
print("MAE",mean_absolute_error(y_test,y_pred))
```

2) **Mean Squared Error(MSE)**

MSE is a most used and very simple metric with a little bit of change in mean absolute error. Mean squared error states that finding the squared difference between actual and predicted value.

So, above we are finding the absolute difference and here we are finding the squared difference.

What actually the MSE represents? It represents the squared distance between actual and predicted values. we perform squared to avoid the cancellation of negative terms and it is the benefit of MSE.

![Mean Squared Error](./img/mse.png)

```
from sklearn.metrics import mean_squared_error
print("MSE",mean_squared_error(y_test,y_pred))
```

3) **Root Mean Squared Error(RMSE)**

As RMSE is clear by the name itself, that it is a simple square root of mean squared error.

![Root Mean Squared Error](./img/rmse.png)

To implement RMSE, we use the NumPy square root function:

```
"RMSE" = np.sqrt(mean_squared_error(y_test,y_pred))
```

4) **R Squared (R2)**

R2 score is a metric that tells the performance of your model, not the loss in an absolute sense that how many wells did your model perform.

In contrast, MAE and MSE depend on the context as we have seen whereas the R2 score is independent of context.

So, with help of R squared we have a baseline model to compare a model which none of the other metrics provides. The same we have in classification problems which we call a threshold which is fixed at 0.5. So basically R2 squared calculates how must regression line is better than a mean line.

Hence, R2 squared is also known as Coefficient of Determination or sometimes also known as Goodness of fit.

![R Squared](./img/r2.png)

Now, how will you interpret the R2 score? suppose If the R2 score is zero then the above regression line by mean line is equal means 1 so 1-1 is zero. So, in this case, both lines are overlapping means model performance is worst, It is not capable to take advantage of the output column.

Now the second case is when the R2 score is 1, it means when the division term is zero and it will happen when the regression line does not make any mistake, it is perfect. In the real world, it is not possible.

So we can conclude that as our regression line moves towards perfection, R2 score move towards one. And the model performance improves.

The normal case is when the R2 score is between zero and one like 0.8 which means your model is capable to explain 80 per cent of the variance of data.

```
from sklearn.metrics import r2_score
r2 = r2_score(y_test,y_pred)
print(r2)
```

5) **Adjusted R Squared**

The disadvantage of the R2 score is while adding new features in data the R2 score starts increasing or remains constant but it never decreases because It assumes that while adding more data variance of data increases.

But the problem is when we add an irrelevant feature in the dataset then at that time R2 sometimes starts increasing which is incorrect.

Hence, To control this situation Adjusted R Squared came into existence.

![Adjusted R Squared](./img/adjusted-r-squared.png)

Now as K increases by adding some features so the denominator will decrease, n-1 will remain constant. 

R2 score will remain constant or will increase slightly so the complete answer will increase and when we subtract this from one then the resultant score will decrease. so this is the case when we add an irrelevant feature in the dataset.

And if we add a relevant feature then the R2 score will increase and 1-R2 will decrease heavily and the denominator will also decrease so the complete term decreases, and on subtracting from one the score increases.

**There's also the RMSPE -- Root Mean Square Percentage Error.**
**MAPE -- Mean Absolute Percentage Error**


Source: Analytics Vidhya

### Support Vector Machines

“Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges. 

However,  it is mostly used in classification problems. 

In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. 

Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.

![Support Vector Machines](./img/SVM_1.png)

**Note: The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.**

Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. 

Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.

#### How does SVM Work?

**Identify the right hyper-plane (Scenario-1):** Here, we have three hyper-planes (A, B, and C). Now, identify the right hyper-plane to classify stars and circles.

![Scenario 1](./img/SVM_1_1.png)

You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane which segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this job.

**Identify the right hyper-plane (Scenario-2):** Here, we have three hyper-planes (A, B, and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane?

![Scenario 2](./img/SVM_3.png)

Above, you can see that the margin for hyper-plane C is high as compared to both A and B. 

Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.

**Identify the right hyper-plane (Scenario-3):** Hint: Use the rules as discussed in previous section to identify the right hyper-plane

![Scenario 3](./img/SVM_5.png)

Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.

**Can we classify two classes (Scenario-4)?:** Below, I am unable to segregate the two classes using a straight line, as one of the stars lies in the territory of other(circle) class as an outlier. 

![Scenario 4](./img/SVM_61.png)

As I have already mentioned, one star at other end is like an outlier for star class. The SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we can say, SVM classification is robust to outliers.

![Scenario 4](./img/SVM_71.png)

**Find the hyper-plane to segregate to classes (Scenario-5):** In the scenario below, we can’t have linear hyper-plane between the two classes, so how does SVM classify these two classes? Till now, we have only looked at the linear hyper-plane.

![Scenario 5](./img/SVM_8.png)

SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:

![Scenario 6](./img/SVM_9.png)

#### Pros and Cons associated with SVM

Pros:

- It works really well with a clear margin of separation
- It is effective in high dimensional spaces.
- It is effective in cases where the number of dimensions is greater than the number of samples.
- It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.


Cons:

- It doesn’t perform well when we have large data set because the required training time is higher
- It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping
- SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. - It is included in the related SVC method of Python scikit-learn library.

##### Implementing SVM's in Python

In [3]:
#classification

#import the needed libraries
from sklearn.svm import SVC  #SVM-classifier
from sklearn.metrics import accuracy_score
from sklearn import datasets        ## imports datasets from scikit-learn - the boston dataset
from sklearn.model_selection import train_test_split


##Load and return the iris dataset
X, y = datasets.load_iris(return_X_y=True)

#split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



clf = SVC(kernel='linear')  #instantiate the Support Vector Classifier
clf.fit(X_train,y_train)


y_pred = clf.predict(X_test)
print(accuracy_score(y_test,y_pred))

1.0


In [9]:
#Regression

from sklearn.svm import SVR    #SVM Regression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn import datasets        ## imports datasets from scikit-learn - the boston dataset
from sklearn.model_selection import train_test_split


##Load and return the boston house-prices dataset (regression).
X, y = datasets.load_boston(return_X_y=True)

#split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


clr = SVR(kernel='linear')    #instantiate the Support Vector Regressor
clr.fit(X_train,y_train)


y_pred = clr.predict(X_test)
print(mean_absolute_error(y_test,y_pred))

3.1404341259560358


In [8]:
print(mean_squared_error(y_test,y_pred))

29.435908618391455


In [10]:
print(r2_score(y_test,y_pred))

0.5986037082794649


In [11]:
from sklearn.metrics import mean_absolute_percentage_error
print(mean_absolute_percentage_error(y_test,y_pred))

0.167713306373673


#### Multiple Model Completion and Comparison

In [None]:
#Here we compare the Logistic Regression to the Support Vector machines.

#### Exercise:

- Load the promotions dataset. 
- clean the data!
- Split the data on train,test split on 80-20 ratio.
- set the recurrence-events column as the dependent column.
- Build a Logistic Regression model and Support Vector Machine Classifier to predict the dependent column - 'promoted or not!
- calculate and compare the F1 score of your models