# **Regression Models**

Regression models are undoubtedly the most used of all learning algorithms. Developed from statistical analysis, regression models have quickly spread in ML and in AI in general. The most known and used regression model is linear regression, thanks to the simplicity of its implementation and the good predictive capacity that it allows us to achieve in many practical cases (such as estimating the level of house prices in relation to changes in interest rates).

Alongside the linear model, there is also the logistic regression model, especially useful in the most complex cases, where the linear model proves to be too rigid for the data to be treated. Both models, therefore, represent the tools of choice for analysts and algorithm developers.

In this section, we will analyze the characteristics and advantages of regression models, and their possible uses in the field of spam detection. Let's start our analysis with the simplest model, the linear regression model, which will help us make comparisons with the logistic regression model.

# **Introducing linear regression models**
The linear regression model is characterized by the fact that the data is represented as sums of features, leading to a straight line in the Cartesian plane.In formal terms, linear regression can be described by the following formula:
![alt text](http://vbehzadan.com/AISec/linreg1.png)

Here, y represents the predicted values, which are the result of the linear combination of the single features (represented by the X matrix) to which a weight vector is applied (represented by the w vector), and by the addition of a constant (β), which represents the default predicted value when all features assume the value of zero (or simply are missing).The β constant can also be interpreted as the systematic distortion of the model, and corresponds graphically with the intercept value on the vertical axis of the Cartesian plane (that is to say, the point where the regression line meets the vertical axis).Obviously, the linear model can be extended to cases in which there is more than just one feature. In this case, the mathematical formalization assumes the following aspect:
![alt text](http://vbehzadan.com/AISec/linreg1.png)

The geometric representation of the previous formula will correspond to a hyperplane in the n-dimensional space, rather than a straight line in the Cartesian plane. We have mentioned the importance of the  constant as the default predictive value of the model in the case in which the features assume a value equal to zero.

The individual w_i​​ values within the vector of the weights, , can be interpreted as a measure of the intensity of the corresponding features, x_i.

In practice, if the value of the w_i weight is close to zero, the corresponding x_i feature assumes a minimum importance (or none at all) in the determination of predicted values. If, instead, the w_i weight assumes positive values, it will amplify the final value returned by the regression model.

If, on the other hand, w_i assumes negative values, it will help to reverse the direction of the model's predictions, as the value of the x_i feature increases, it will correspond to a decrease in the value estimated by the regression. Hence, it is important to consider the impacts of the weights on the x_i features, as they are determinant in the correctness of the predictions that we can derive from the regression model.

**# Logistic Regression**
We have seen that one of the limits of linear regression is that it cannot be used to solve classification problems:

In fact, in case we wanted to use linear regression to classify the samples within two classes (as is the case in spam detection) whose labels are represented by numerical values ​​(for example, -1 for spam, and +1 for ham), the linear regression model will try to identify the result that is closest to the target value (that is, linear regression has the purpose of minimizing forecasting errors). The negative side effect of this behavior is that it leads to greater classification errors. With respect to the Perceptron, linear regression does not give us good results in terms of classification accuracy, precisely because linear regression works better with continuous intervals of values, rather than with classes of discrete values ​​(as is the case in classification).

An alternative strategy, most useful for the purposes of classification, consists of estimating the probability of the samples belonging to individual classes. This is the strategy adopted by logistic regression (which, in spite of the name, constitutes a classification algorithm, rather than a regression model).The mathematical formulation of logistic regression is as follows:
![alt text](http://vbehzadan.com/AISec/logistic1.png)

Where:
![alt text](http://vbehzadan.com/AISec/logistic2.png)

P(y=c|x) therefore measures the conditional probability that a given sample falls into the c class, given the x_i features.

In [13]:
import pandas as pd
import numpy as np
from sklearn import *
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings 
warnings.simplefilter('ignore')


In [14]:
phishing_dataset = np.genfromtxt('phishing_dataset.csv', delimiter=',', dtype=np.int32)
samples = phishing_dataset[:,:-1]
targets = phishing_dataset[:, -1]

In [15]:
from sklearn.model_selection import train_test_split

training_samples, testing_samples, training_targets, testing_targets = train_test_split(
         samples, targets, test_size=0.2, random_state=0)

In [16]:
log_classifier = LogisticRegression()

In [17]:
log_classifier.fit(training_samples, training_targets)

LogisticRegression()

In [18]:
predictions = log_classifier.predict(testing_samples)

In [19]:
accuracy = 100.0 * accuracy_score(testing_targets, predictions)
print ("Logistic Regression accuracy: " + str(accuracy))

Logistic Regression accuracy: 91.67797376752601
