In [2]:
import arviz as az
import numpy as np
import pandas as pd
import pymc3 as pm

from matplotlib import pylab as plt

In [3]:
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)

%config InlineBackend.figure_format = 'retina'
%load_ext watermark
az.style.use("arviz-darkgrid")
az.rcParams["stats.hdi_prob"] = 0.89

# Chapter 11 - GOD SPIKED THE INTEGERS

For 11E1-E3, see page 337 of 2nd edition, Overthinking box.

## 11E1. 

**If an event has probability 0.35, what are the log-odds of this event?**

### Answer 


```
odds = p / (1 - p)
log-odds = log(odds)
         = log(0.35 / 0.65)
```

In [5]:
np.log(0.35 / 0.65)

-0.6190392084062235

## 11E2.

**If an event has log-odds 3.2, what is the probability of this event?**

### Answer
Take the same approach as above and solve for p.

```

odds = p / (1 - p)

odds * (1 - p) = p
odds - odds * p = p

Rearrange
0 = p + odds * p - odds 
0 = p(1 + odds) - odds
odds = p(1 + odds)

p = odds / (1 + odds)
              
exp(log-odds) = odds

p = exp(log-odds) / (1 + exp(log-odds))

```

In [12]:
np.exp(3.2) / (1 + np.exp(3.2))

0.9608342772032357

## 11E3.

**Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?**

### Answer

After exponentiating the coefficient, this value represents the factor to multiply by the odds by when a 1 unit increase in the predictor variable occurs. In this problem, a coefficient of 1.7 means that the proportional odds with a 1 unit change leads to multiplying the odds by $\text{exp}(1.7) \approx 5.474$. It is important to emphasize that this is relative number.

Standardizing the feature (if continuous) facilitiates interpretation; a 1 unit change in this case means 1 standard deviation change. If the variable is categorical and dummy-encoded, then 1 unit change represents the shift from the class assigned 0 to the class assigned 1.

In [6]:
np.exp(1.7)

5.4739473917272

## 11E4.

**Why do Poisson regressions sometimes require the use of an offset? Provide an example.**

### Answer
See 2nd edition, page 357, section 11.2.3.

In Poisson regression, $\lambda$ can be interpreted as the expected value of counts and as a rate with regards to time or distance. When seen as a rate, the unit for which the number of events are counted (the exposure) can vary.

The book supposes two different monasteries, one tracking manuscripts over days while another is done over weeks. The log link helps us account for the differences in exposure.

The monstary example involved an exposure involving time. Distance might be another exposure. Imagine two cars traveling down a road that are counting out-of-state license plates. One car might tabulate the number of cars every mile, while the other might keep track every kilometer. The use of an offset would be applicable in this case as well.

# NOTE

Other solutions will be added or contributions welcome.

In [14]:
%watermark -n -u -v -iv -w

Last updated: Tue Jan 04 2022

Python implementation: CPython
Python version       : 3.8.6
IPython version      : 7.20.0

numpy     : 1.20.1
pymc3     : 3.11.0
sys       : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12) 
[Clang 11.0.1 ]
matplotlib: 3.3.4
pandas    : 1.2.1
arviz     : 0.11.1

Watermark: 2.1.0

