# Copula - Multivariate joint distribution

In [None]:
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

from statsmodels.distributions.copula.api import GumbelCopula, CopulaDistribution

When modeling a system, there are often cases where multiple parameters are involved. Each of these parameters could be discribed with a given Probability Density Function (PDF). If would like to be able to generate a new set of parameter values, we need to be able to sample from these discributions-also called marginals. There are mainly two cases: *(i)* PDFs are independants; *(ii)* there is a dependency. One way to model the dependency it to use a **copula**.

## Sampling from a copula

Let's use a bi-variate example and assume first that we have a prior and know how to model the dependence between our 2 variables.

In this case, we are using the Gumbel copula. We can visualize it's 2-dimensional PDF.

In [None]:
copula = GumbelCopula()
_ = copula.plot_pdf()  # returns a matplotlib figure

And we can sample the PDF.

In [None]:
sample = copula.random(10000)
h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind='hex')
h.set_axis_labels('X1', 'X2', fontsize=16)

Let's come back to our 2 variables for a second. In this case we consider them to be gamma and normaly distributed. If they would be independent from each other, we could sample from each PDF individually. Here we use a convenient class to do the same operation.

In [None]:
marginals = [stats.gamma(2), stats.norm]
joint_dist = CopulaDistribution(marginals=marginals, copula=None)
sample = joint_dist.random(512)
h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind='scatter')
h.set_axis_labels('X1', 'X2', fontsize=16)

Now, above we have expressed the dependency between our variables using a copula, we can use this copula to sample a new set of observation with the same convenient class.

In [None]:
joint_dist = CopulaDistribution(marginals=marginals, copula=copula)
sample = joint_dist.random(512)
h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind='scatter')
h.set_axis_labels('X1', 'X2', fontsize=16)

There are two things to note here. *(i)* as in the independant case, the marginals are correctly showing a gamma and normal distribution; *(ii)* the dependence is visible between the two variables.