$$
\newcommand{\odds}{\textbf{odds}}
\newcommand{\E}{\mathrm{E}}
\newcommand{\P}{\mathbb{P}}
\newcommand{\L}{\mathcal{L}}
$$

## Odds (Ratio)

### Definition

**Odds**[^wiki_odds] are an expression of **relative probabilities** and we define the **odds (in favor)** of an **event** is the **ratio of the probability that the event will happen (success) to the probability that the event will not happen (failure)**. This definition corresponds to a **Bernoulli Trial**.

The formula is given as:

$$
\odds(\E) = \dfrac{\P(\E)}{1 - \P(\E)}
$$

[^wiki_odds]: [https://en.wikipedia.org/wiki/Odds](https://en.wikipedia.org/wiki/Odds)

### Example

Consider rolling a fair six-sided dice. Then:

- Define the **Event** $\E$ to be $\{6\}$.
- The odds of rolling a $6$ (in favor of $6$) is $\odds(\E) = \dfrac{\P(\E)}{1 - \P(\E)} = \dfrac{1}{5} = 1 : 5$

---

- Define the **Event** $\E$ to be $\{2, 6\}$ which means either a $2$ or a $6$.
- The odds of rolling a $2$ or a $6$ is $\odds(\E) = \dfrac{\P(\E)}{1 - \P(\E)} = \dfrac{2}{4} = 2 : 4$.

### Conversion to Probability

One can recover of $\P(\E)$ to be the numerator divide by numerator + denominator (i.e. frequency divide by total).

## Probability vs Likelihood

### Intuition

The below answers (including example) stems from here[^what-is-the-difference-between-likelihood-and-probability].

$\P(x|\theta)$ can be seen from two points of view:

- $\P(x|\theta)$: **As a function of $x$, treating $\theta$ as known/observed.** If $\theta$ is not a random variable, then $P(x|\theta)$ is called the (*parameterized*) probability of $x$ given the model parameters $\theta$, which is sometimes also written as $P(x;\theta)$ or $P_{\theta}(x)$.
If $\theta$ is a random variable, as in Bayesian statistics, then $P(x|\theta)$ is a *conditional* probability, defined as $\dfrac{{P(x\cap\theta)}}{{P(\theta)}}$.

- $\L(\hat\theta|x)$: **As a function of $\theta$, treating $x$ as observed.** For example, when you try to find a certain assignment $\hat\theta$ for $\theta$ that maximizes $P(x|\theta)$, then $P(x|\hat\theta)$ is called the *maximum likelihood* of $\theta$ given the data $x$, sometimes written as $\L(\hat\theta|x)$. So, the term likelihood is just shorthand to refer to the probability $P(x|\theta)$ for some data $x$ that results from assigning different values to $\theta$ (e.g. as one traverses the search space of $\theta$ for a good solution). So, it is often used as an objective function, but also as a performance measure to compare two models as in [Bayesian model comparison](https://en.wikipedia.org/wiki/Bayes_factor). 

[^what-is-the-difference-between-likelihood-and-probability]: [https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability](what-is-the-difference-between-likelihood-and-probability)

### Example

Suppose you have a coin with probability $p$ to land heads and $(1-p)$ to land tails. Let $x=1$ indicate heads and $x=0$ indicate tails. Define $f$ as follows

$$f(x,p)=p^x (1-p)^{1-x}$$

- $f(x,2/3)$ is probability of x given $p=2/3$, 
- $f(1,p)$ is likelihood of $p$ given $x=1$. 
- Basically likelihood vs. probability tells you which parameter of density is considered to be the variable

[![][1]][1]

 [1]: https://i.stack.imgur.com/1Dmpu.png

## MLE

- Likelihood as joint distribution, so it is a sequence of multiplication of probs? So P(x1, x2, x3) is P(x1) x P(x2) x P(x3) happening.
- When write $P(\X | \theta)$ it means $P(x_1 \cap x_2 \cap ... \cap x_n | \theta)$ lol.
- In dafried, we see $p(y_n)$ becomes the PMF of $y_n$ suddenly, not a coincidence by d2l's The short summary is that nothing at all changes, except we replace all the instances of the probability with the probability density. 