# Generalized Linear Models Review

## What's the difference between (OLS) Linear Regression and (Binomial) Logistic Regression?

* In the linear regression model the dependent variable y is considered continuous, whereas in logistic regression it is categorical, i.e., discrete. 


* In application, the former is used in regression settings while the latter is used for binary classification or multi-class classification (where it is called multinomial logistic regression). 

#### Example:

* Regression Problem: Predict sale price of houses based on size, where X is the area in square feet of houses, and Y is the corresponding price.


* Classification Problem: Predict, based on size, whether a house would sell for more than 200K, where X is the area in square feet of houses and Y can take one of two possible values-- 1 for Yes, the house will sell for more than 200K, or 0 for No, the house will not.

## Other of Differences


#### Equation

Linear Regression gives an equation which is of the form Y = mX + C, means equation with degree 1.However, Logistic Regression gives an equation which is of the form Y = e^X/1 + e^-X


#### Coefficient interpretation

In linear regression, the coefficient interpretation of independent variables are quite straight forward (i.e. holding all other variables constant, with an unit increase in this variable, the dependent variable is expected to increase/decrease by xxx). However in logistic regression, depends on the family (binomial, poisson, etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different.

#### Error Minimization Technique

Linear Regression uses Ordinary Least Squares method to minimise the errors and arrive at a best possible fit while Logistic regression uses maximum likelihood method to arrive at the solution.

Linear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically. Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotic constant.

Consider linear regression on a categorical {0,1} outcomes to see why this is a problem. If your model predicts the outcome is 38 when truth is 1, you've lost nothing. Linear regression would try to reduce that 38, logistic wouldn't (as much) 

## Is Logistic Regression a type of Linear Regression?

Logistic regression called a generalized linear model not because the estimated probability of the response event is linear, but because the logit of the estimated probability response is a linear function of the parameters.



## Compare and Contrast: 

|  | Linear Regression | Logistic Regression |
|  | Supervised <br> Parametric <br> **Regression** | Supervised <br> Parametric <br> **Classification** |
| Loss Function (How wrong is model, pt-by-pt?) | Mean squared error <br> Also absolute loss (not often used)<br> Regularization (AKA shrinkage):Ridge Regression (closed-form) LASSO (no closed-form) | Log loss |
| Parameters | **Parameters**: Coefficients (i) <br> (change in response for every one-unit change in predictor) <br> ** Hyperparamters** : Penalty factor λ(if using regularization) | **Parameters**: Coefficients (𝛽i) <br> Represent log odds (log of the odds ratio) |
| Solver/Opt strats | Closed form... Or: Gradient descent | Gradient descent |
| Pros | Computationally inexpensive  <br> Easy to implement <br> Easy to interpret results | Computationally inexpensive <br> Easy to implement <br> Easy to interpret (if you understand log loss) |
| Cons | Poorly models nonlinear data If doing inference, need to check violation of **assumptions**: <br> Normal distribution of residuals <br> constant variance of error terms <br> ind of error terms from predictors <br> I.I.D. (random sample), <br> endogenous predictors | Prone to underfitting <br>May have low accuracy (?) <br> If doing inference, need to check violation of **assumptions** (not exactly the same as linear models) <br>  |

### Assumptions that do not apply to logistic regresssion:
* linear relationship between the dependent and independent variables.
* independent variables do not need to be multivariate normal – although multivariate normality yields a more stable solution. 
* homoscedasticity is not needed.
* independent variables do not need to be metric (interval or ratio scaled)

### Assumptions that do apply to logistic regression:
* dependent variable must be binary (or ordinal)
* error terms need to be independent
* little or no multicollinearity
* independent variables are linearly related to the log odds.

logit(pi)=ln(pi1−pi)=β0+β1x1,i+β2x2,i+⋯+βpxp,

## Regularization

Can think of regularization as adding (or increasing the) bias if our model suffers from (high) variance (i.e., it overfits the training data). On the other hand, too much bias will result in underfitting (a characteristic indicator of high bias is that the model shows a "bad" performance for both the training and test dataset). We know that our goal in an unregularized model is to minimize the cost function, i.e., we want to find the feature weights that correspond to the global cost minimum (remember that the logistic cost function is convex).  
  
  Now, if we regularize the cost function (e.g., via L2 regularization), we add an additional to our cost function (J) that increases as the value of your parameter weights (w) increase; keep in mind that the regularization we add a new hyperparameter, lambda, to control the regularization strength.

In [15]:
import numpy as np
import plotly.plotly
import plotly.graph_objs as go
import scipy.stats as stats
from sklearn import linear_model

%matplotlib inline

In [16]:
# Learn about API authentication here: https://plot.ly/python/getting-started
# Find your api_key here: https://plot.ly/settings/api
xi = np.arange(0,9)
A = np.array([ xi, np.ones(9)])

# (Almost) linear sequence
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]

# Generated linear fit
slope, intercept, r_value, p_value, std_err = stats.linregress(xi,y)
line = slope*xi+intercept

# Creating the dataset, and generating the plot
trace1 = go.Scatter(
                  x=xi,
                  y=y,
                  mode='markers',
                  marker=go.Marker(color='rgb(255, 127, 14)'),
                  name='Data'
                  )

trace2 = go.Scatter(
                  x=xi,
                  y=line,
                  mode='lines',
                  marker=go.Marker(color='rgb(31, 119, 180)'),
                  name='Fit'
                  )

annotation = go.Annotation(
                  x=3.5,
                  y=24.5,
                  text='$R^2 = 0.9551,\\Y = 0.716X + 19.18$',
                  showarrow=False,
                  font=go.Font(size=16)
                  )
layout = go.Layout(
                title='Linear Fit in Python',
                plot_bgcolor='rgb(229, 229, 229)',
                  xaxis=go.XAxis(zerolinecolor='rgb(255,255,255)', gridcolor='rgb(255,255,255)'),
                  yaxis=go.YAxis(zerolinecolor='rgb(255,255,255)', gridcolor='rgb(255,255,255)'),
                  annotations=[annotation]
                )

data = [trace1, trace2]
fig = go.Figure(data=data, layout=layout)

plotly.plotly.iplot(fig, filename='Linear-Fit-in-python')


plotly.graph_objs.Marker is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Marker
  - plotly.graph_objs.histogram.selected.Marker
  - etc.



plotly.graph_objs.Font is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.layout.Font
  - plotly.graph_objs.layout.hoverlabel.Font
  - etc.



plotly.graph_objs.Annotation is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.layout.Annotation
  - plotly.graph_objs.layout.scene.Annotation



plotly.graph_objs.XAxis is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.layout.XAxis
  - plotly.graph_objs.layout.scene.XAxis



plotly.graph_objs.YAxis is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.layout.YAxis
  - plotly.graph_objs.layout.scene.YAxis




In [17]:
# this is our test set, it's just a straight line with some
# Gaussian noise
xmin, xmax = -5, 5
n_samples = 100
np.random.seed(0)
X = np.random.normal(size=n_samples)
y = (X > 0).astype(np.float)
X[X > 0] *= 4
X += .3 * np.random.normal(size=n_samples)

X = X[:, np.newaxis]
# run the classifier
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, y)





LogisticRegression(C=100000.0, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [18]:
p1 = go.Scatter(x=X.ravel(), y=y, 
                mode='markers',
                marker=dict(color='black'),
                showlegend=False
               )
X_test = np.linspace(-5, 10, 300)

def model(x):
    return 1 / (1 + np.exp(-x))

loss = model(X_test * clf.coef_ + clf.intercept_).ravel()

p2 = go.Scatter(x=X_test, y=loss, 
                mode='lines',
                line=dict(color='red', width=3),
                name='Logistic Regression Model')

ols = linear_model.LinearRegression()
ols.fit(X, y)

p3 = go.Scatter(x=X_test, y=ols.coef_ * X_test + ols.intercept_, 
                mode='lines',
                line=dict(color='blue', width=1),
                name='Linear Regression Model'
                )
p4 = go.Scatter(x=[-4, 10], y=2*[.5],
                mode='lines',
                line=dict(color='gray', width=1),
                showlegend=False
               )

layout = go.Layout(xaxis=dict(title='x', range=[-4, 10],
                              zeroline=False),
                   yaxis=dict(title='y', range=[-0.25, 1.25],
                              zeroline=False))

fig = go.Figure(data=[p1, p2, p3, p4], layout=layout)

In [19]:
py.iplot(fig)


Consider using IPython.display.IFrame instead



It's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. Logistic regression exists because a standard linear regression could produce probabilities less than 0 or larger than 1. Also, linear regression is more sensitive to outliers than logistic regression.
