# Generalized Linear Model


## Introduction

Ordinary linear regression predicts the expected value of a given unknown quantity (the response variable, a random variable) as a linear combination of a set of observed values (predictors). This implies that a constant change in a predictor leads to a constant change in the response variable (i.e. a linear-response model). 

However, these assumptions are inappropriate for some types of response variables. For example, in cases where the response variable is expected to be always positive and varying over a wide range, constant input changes lead to geometrically varying, rather than constantly varying, output changes.

Generalized linear models cover all these situations by allowing for response variables that have arbitrary distributions (rather than simply normal distributions), and for an arbitrary function of the response variable (the link function) to vary linearly with the predicted values (rather than assuming that the response itself must vary linearly).

## Overview

In a Generalized Linear Model (GLM), each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family.


**Distributions:**
- Gaussian regression
- Poisson regression
- Binomial regression (classification)
- Quasibinomial regression
- Multinomial classification
- Gamma regression
- Ordinal regression
- Negative Binomial regression
- Tweedie distribution

\begin{align}
E(Y) = \mu = g^{-1}(X\beta) \\
\end{align}

where E(Y) is the expected value of Y; Xβ is the linear predictor, a linear combination of unknown parameters β; g is the link function.

There are three components in generalized linear models.

1. Linear predictor
2. Link function
        Linear Model relate to the response variable (g)

3. Probability distribution


> Why to use Generalized Linear Model (GLM)?
> The relationship between X and Y does not look linear. It’s more likely to be exponential.
The variance of Y does not look constant with regard to X. Here, the variance of Y seems to increase when X increases.
As Y represents the number of products, it always has to be a positive integer. In other words, Y is a discrete variable. However, the normal distribution used for linear regression assumes continuous variables. This also means the prediction by linear regression can be negative. It’s not appropriate for this kind of count data.



### Resources:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html

https://en.wikipedia.org/wiki/Generalized_linear_model

https://towardsdatascience.com/generalized-linear-models-9cbf848bb8ab
