# Logistic Regression for a Binomially-distributed R.V.


## Goal

Our goal is to learn how to apply logistic regression to count data where the number of "1"s is restricted.
Like logistic regression for a Bernoulli distributed variable, we'll explore Binomially-distributed random variables and how to model the probability of 'success' conditional on covariates.

The logistic function will map an inner product of coefficients and covariates to the interval between 0 and 1.

We will also find the likelihood for this model and show how it generalizes our model of logistic regression for a Bernoulli distributed random variable.


## Binomial distributed random variable

A random variable $Y$ is Binomially distributed if, given a set of independent **trials** the probability of observing $s$ successes (sometimes denoted 1s) is

$$
p(Y=s; N,p) = \binom{N}{s} p^{s}(1-p)^{N-s}
$$

The expression

$$
p^{s}
$$

is the probability of observing a string of $s$ 1s with probability $p$.

The expression

$$
(1-p)^{N-s}
$$

is the probability of observing a string of $N-s$ 0s with probability $1-p$.

Then, the probability of observing a string of $s$ 1s and $N-s$ 0s equals

$$
p^{s} (1-p)^{N-s}
$$

But the binomial distribution does not consider the order of 1s and 0s.
Instead, the binomial distribution computes the probability of observing $s$ 1s among $N$ trials in any order.

The Binomial operator counts the number of ways $s$ 1s can occur among $N$ trials--$\binom{N}{s}$.
The Binomial operator equals

$$
\binom{N}{s} = \frac{N!}{(N-s)!} \times \frac{1}{s!}
$$

The first term counts the number of ways $s$ 1s can occupy any of $N$ different positions.
For example, given $N=10$ different possibilites and $s=4$ 1s, the first 1 can occupy the 4th position, secon 1 can occupy the 1st position, third 1 can occupy the 8th position, and fourth 1 can occupy the 7th position.
In general, the first 1 has N different options.
After the first 1 occupies one of the N different spaces, the second one now has only N-1 spaces to choose from.
Finally, the s$^{th}$ 1 has $N-s$ spaces to choose from.
We can write this as 

$$
    N \times N-1 \times \cdots \times N-(s-1)
$$

or 

$$
    \frac{N!}{(N-s)!}
$$

where $!$ is the factorial.
The factorial of an integer $N! = N \times N-1 \times \cdots \times 1$.

But the above ratio of factorial counts the number of positions each distinct 1 can occupy.
We are not interested in the distinction between the first, second, or fourth 1.
For every way to occupy spaces above, we could have chosen any of the $s$ 1s.

If we denote $C$ to be the number of indistinct ways to occupy spaces with 1s, then 

$$
    C(N,s) \times s! = \frac{N!}{(N-s)!}
$$

We could have chosen any one of the $s$ 1s to occupy a position.
Then we can choose any of the $s-1$ remaining 1s to occupy the next position.

There are $s!$ ways to places the ordered 1s into the same sequence of spaces.
Then the number of ways to place 1s into these spaces without caring about the ordering of 1s equals

$$
    C(N,s) = \binom{N}{s} = \frac{N!}{(N-s)!s!}
$$

The expectation of a Binomial r.v. is 

\begin{align}
    E(Y) &= \sum_{i=1}^{N} i \times p(Y=i;N,p) \\
         &= \sum_{i=1}^{N} i \times \binom{N}{i} p^{i}(1-p)^{N-i} \\
         &= \sum_{i=1}^{N} i \times \frac{N!}{(N-i)!i!} p^{i}(1-p)^{N-i} \\
         &= \sum_{i=1}^{N} \frac{N!}{(N-i)!(i-1)!} p^{i}(1-p)^{N-i} \\
         &= \sum_{i=1}^{N} N \frac{(N-1)!}{(N-i)!(i-1)!} p^{i}(1-p)^{N-i} \\
         &= \sum_{i=1}^{N} Np \frac{(N-1)!}{(N-i)!(i-1)!} p^{i-1}(1-p)^{N-i} \\
         &= Np \sum_{i=1}^{N} \frac{(N-1)!}{(N-i)!(i-1)!} p^{i-1}(1-p)^{N-i} \\
         &= Np \times 1 \text{       Why?}\\
         &= Np
\end{align}

Similar manipulations show that the variance of a Binomial random variable equals

$$
Var(Y) = Np(1-p)
$$

We can already see a similarity between Bernoulli and Binomial random variables.
A binomial distribution with a single trial is the same as a Bernoulli random variable.

If Y is a bionomial r.v.
\begin{align}
    p(Y|N=1,p) &= \binom{N}{s} p^{s}(1-p)^{N-s}\\
               &= \binom{1}{s} p^{s}(1-p)^{1-s}\\
               &= \frac{1!}{(1-s)!s!} p^{s}(1-p)^{1-s}\\
\end{align}

If $s=0$ then $\frac{1!}{(1)!0!}=1$ and if $s=1$ then $\frac{1!}{(1-1)!0!}=1$ so
\begin{align}
    p(Y|N=1,p) = p(Y|p) &= p^{s}(1-p)^{1-s}\\
\end{align}

$Y$ equals $1$ with probability $p^{1}(1-p)^{1-1}=p$ and zero with probability $p^{0}(1-p)^{1-0}=(1-p)$.
These probabilities correspond exactly to a Bernoulli distributed random variable.

A Bernoulli distributed random variable is equivalent to a Binomial distributed random variable with a single trial.


## Binomial data and covariates

Throughout, our goal in regression is to estimate the conditional probability of one random variable given a set of fixed covariates, $p(Y|X)$.

Consider as our data a set of Binomial random variables.
Every data point we receive has the number of trials, the number of 1s, and a corresponding vector of $x$ data

|N|Number of 1s|$x_{1}$|$x_{2}$|$x_{3}$|
|---|---|---|---|---|
10|0|2.2|1|0.3|
4|4|3|0|0.001|
6|1|1/4|1|0.5|
21|4|4|1|0.6|
14|10|5.6|0|0.9|

Covariates will only influence the probability $p$ shared by every observation.
We treat the number of trials $N$ as a fixed covariate and do not need to estimate a general $N$.
The number of trials will be assumed given.

## Likelihood

The probability of a set of $N$ independent Binomial random variables is 

$$
P(y_{1},y_{2},y_{3},\cdots,y_{n} | p) = \prod_{i=1}^{N} \binom{N_{i}}{s_{i}} p^{s_{i}}(1-p)^{N_{i}-s_{i}}
$$

where $N_{i}$ is the number of trials for the i$^{th}$ observation and $s_{i}$ the number of 1s for the i$^{th}$ observation.

We take the same approach to estimating $p$ as we did for a sequence of Bernoulli random variables.
Define the logistic function on $p$

$$
    f(p|\beta,x) = \dfrac{e^{\beta'x}}{1+e^{\beta'x}}
$$

and substitute this function into the above probability model.

$$
    P(y_{1},y_{2},y_{3},\cdots,y_{n} | \beta,x) = \prod_{i=1}^{N} \binom{N_{i}}{s_{i}} \left(\dfrac{e^{\beta'x}}{1+e^{\beta'x}}\right)^{s_{i}} \left(\dfrac{1}{1+e^{\beta'x}}\right)^{N_{i}-s_{i}}
$$

The likelihood considers the $y$ and $x$ data fixed and treats the above model as a function of $\beta$.

$$
   \ell(\beta) = P(\beta| y,x) = \prod_{i=1}^{N} \binom{N_{i}}{s_{i}} \left(\dfrac{e^{\beta'x}}{1+e^{\beta'x}}\right)^{s_{i}} \left(\dfrac{1}{1+e^{\beta'x}}\right)^{N_{i}-s_{i}}
$$

The above likelihood generalizes our previous Bernoulli-distributed random variables.
Assume every Binomial variable in the above likelihood had only one trial.
The likelihood reduces to 
$$
   \ell(\beta) = \prod_{i=1}^{N} \left(\dfrac{e^{\beta'x}}{1+e^{\beta'x}}\right)^{s_{i}} \left(\dfrac{1}{1+e^{\beta'x}}\right)^{1-s_{i}}
$$

where $s_{i}$ equals either $0$ or $1$. 
This is the exact likelihood we derived for a sequence of Bernoulli-distributed random variables.
The Binomial model generalizes logistic regression for a Bernoulli-distributed random variable.

## Example Data


