# Dirichlet Distributions

To understand dirichlet distributions, it is essential to understand categorical distributions.

## Categorical Distributions

Let's dive deeper into the example of favorite sports team.

**Problem statement:**

Suppose you're a sports enthusiast and you want to analyze the preferences of a group of people regarding their favorite sports team. You have a dataset of 100 people, and each person has answered the question "What is your favorite sports team?" with one of the following options:

* New York Yankees (NY)
* Boston Red Sox (BOS)
* Los Angeles Dodgers (LAD)
* Chicago Cubs (CHC)
* Other (OTH)

You want to model the probability distribution of favorite sports teams among this group of people.

**Categorical distribution:**

In this case, we can model the favorite sports team as a categorical distribution. The categorical distribution is a probability distribution that models the probability of a categorical variable taking on a particular value.

In this example, the categorical variable is the favorite sports team, and the possible values are:

* NY (New York Yankees)
* BOS (Boston Red Sox)
* LAD (Los Angeles Dodgers)
* CHC (Chicago Cubs)
* OTH (Other)

The probability distribution of the categorical variable is defined by the probabilities of each possible value. For example, if 30 people in the dataset prefer the New York Yankees, the probability of NY is 0.3 (30/100).

**Estimating the probabilities:**

To estimate the probabilities of each favorite sports team, you can use the dataset of 100 people. For example, you can count the number of people who prefer each team and divide by the total number of people:

* NY: 30/100 = 0.3
* BOS: 20/100 = 0.2
* LAD: 15/100 = 0.15
* CHC: 10/100 = 0.1
* OTH: 25/100 = 0.25

These probabilities can be used to model the categorical distribution of favorite sports teams.

**Using the categorical distribution:**

Once you have estimated the probabilities, you can use the categorical distribution to make predictions or inferences about the dataset. For example:

* You can use the categorical distribution to predict the favorite sports team of a new person who is not in the dataset.
* You can use the categorical distribution to estimate the probability that a person prefers a particular team.
* You can use the categorical distribution to compare the popularity of different teams.

We shall now define possible values of the categorical variable (favorite sports teams), the probabilities of each value, and create a categorical distribution using the `categorical` function from SciPy.

We will then generates a random sample of 100 people from the distribution using the `rvs` method. The resulting sample is a NumPy array with 100 elements, where each element is one of the five possible teams.

In [2]:
import numpy as np

# Define the possible values of the categorical variable
teams = ['NY', 'BOS', 'LAD', 'CHC', 'OTH']

# Define the probabilities of each value
probabilities = [0.3, 0.2, 0.15, 0.1, 0.25]



# Generate a Random Sample from the Distribution


In [5]:
dist = np.random.choice(teams, size=100, p=probabilities)
print(dist)

['BOS' 'BOS' 'CHC' 'NY' 'BOS' 'OTH' 'OTH' 'OTH' 'OTH' 'OTH' 'BOS' 'CHC'
 'NY' 'BOS' 'OTH' 'BOS' 'OTH' 'LAD' 'BOS' 'OTH' 'LAD' 'LAD' 'BOS' 'OTH'
 'LAD' 'NY' 'BOS' 'OTH' 'NY' 'NY' 'NY' 'NY' 'LAD' 'NY' 'LAD' 'OTH' 'NY'
 'LAD' 'OTH' 'LAD' 'LAD' 'NY' 'BOS' 'NY' 'NY' 'BOS' 'LAD' 'LAD' 'OTH' 'NY'
 'LAD' 'LAD' 'BOS' 'OTH' 'OTH' 'OTH' 'NY' 'OTH' 'CHC' 'OTH' 'NY' 'LAD'
 'NY' 'BOS' 'NY' 'BOS' 'BOS' 'NY' 'BOS' 'OTH' 'NY' 'OTH' 'OTH' 'OTH' 'CHC'
 'OTH' 'OTH' 'NY' 'NY' 'NY' 'CHC' 'LAD' 'CHC' 'LAD' 'NY' 'BOS' 'BOS' 'NY'
 'OTH' 'BOS' 'BOS' 'NY' 'LAD' 'OTH' 'NY' 'OTH' 'OTH' 'OTH' 'LAD' 'CHC']


The data generated is a random sample when is in large size will be in proportion to the probabilities of each team.

## Conjugate Distributions

In Bayesian statistics, a conjugate distribution is a probability distribution that is used to update the prior distribution of a parameter or a set of parameters, given new data. In other words, a conjugate distribution is a distribution that is "conjugate" to the prior distribution, meaning that it has the same functional form as the prior distribution.

Dirichlet distributions can update the categorical distributions given new data.