# Copula Short Course - Introduction

This notebook provides a motivating introduction to dependence modeling. We will explore:

1. Real-world financial data relationships (stock returns)
2. Scientific data relationships (wine characteristics)
3. Visualizing dependence with scatter plots and fitted distributions

By the end of this notebook, you will understand why we need sophisticated tools like copulas to model dependence between variables.

---

## Setup and Imports

In [None]:
# Environment configuration for Jupyter
%load_ext autoreload
%autoreload 2
%matplotlib notebook

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Numerical computing
import numpy as np
from scipy.stats import multivariate_normal as mvn

# Machine learning utilities
from sklearn import linear_model
import sklearn.datasets as toy_datasets

# Financial data
import yfinance as yf

## Example 1: Financial Data - Stock Returns

One of the most important applications of dependence modeling is in finance. Here we examine the relationship between daily returns of two stocks:
- **AAPL** (Apple Inc.) - a technology company
- **XOM** (Exxon Mobil) - an energy company

Understanding how these returns move together (or don't) is crucial for portfolio diversification and risk management.

In [None]:
# Download historical stock data for 2021
data = yf.download(['AAPL', 'XOM'], '2021-01-01', '2021-12-31')

# Remove any missing values
data = data.dropna()

# Calculate daily price changes (first differences)
ticker1 = data['Close']['AAPL'].diff()
ticker2 = data['Close']['XOM'].diff()

# Assign to variables for plotting
xx = ticker1
yy = ticker2

### Scatter Plot of Daily Returns

A scatter plot reveals the joint behavior of the two stock returns. Each point represents one trading day.

In [None]:
# Create scatter plot of daily returns
plt.figure()
sns.scatterplot(x=xx, y=yy)
plt.xlabel('AAPL Daily Return ($)')
plt.ylabel('XOM Daily Return ($)')
plt.title('AAPL vs XOM Daily Price Changes (2021)')

### Joint Distribution with Linear Regression

The joint plot below shows the scatter plot with marginal distributions (histograms on the axes) and a linear regression fit. The regression line gives us a first approximation of the relationship.

In [None]:
# Create joint plot with regression line and marginal distributions
g = sns.jointplot(x=xx, y=yy, kind='reg', truncate=False)

# Fit a linear regression model
regr = linear_model.LinearRegression()
regr.fit(xx.values[1:].reshape(-1, 1), yy.values[1:].reshape(-1, 1))

# Add regression equation as text annotation
props = dict(boxstyle='round', alpha=0.5, color=sns.color_palette()[0])
textstr = 'y = %.2f + %.2fx' % (regr.intercept_, regr.coef_[0])
g.ax_joint.text(0.65, 0.97, textstr, transform=g.ax_joint.transAxes, 
                fontsize=14, bbox=props)

### Fitting a Bivariate Gaussian Distribution

A common approach is to fit a bivariate Gaussian (normal) distribution to the data. The contours show lines of equal probability density. 

**Key Question:** Does this Gaussian assumption adequately capture the true dependence structure? This is one of the central questions that copulas help us address.

In [None]:
# Create scatter plot with bivariate Gaussian contours
plt.figure()
sns.scatterplot(x=xx, y=yy)

# Compute covariance matrix from the data (skip first NaN value)
R = np.cov(np.vstack((xx.values[1:], yy.values[1:])))

# Create bivariate normal distribution with sample mean and covariance
rv = mvn([np.mean(xx), np.mean(yy)], R)

# Create grid for contour plot
N = 200
X = np.linspace(np.min(xx), np.max(xx), N)
Y = np.linspace(np.min(yy), np.max(yy), N)
X, Y = np.meshgrid(X, Y)
pos = np.dstack((X, Y))

# Evaluate PDF on the grid
Z = rv.pdf(pos)

# Plot contours of the fitted Gaussian
plt.contour(X, Y, Z)
plt.xlabel('AAPL Daily Return ($)')
plt.ylabel('XOM Daily Return ($)')
plt.title('Stock Returns with Fitted Bivariate Gaussian')
plt.tight_layout()
plt.show()

---

## Example 2: Scientific Data - Wine Characteristics

Dependence modeling is not limited to finance. Here we examine the relationship between two chemical properties of wine from the UCI Wine dataset:
- **Flavanoids** - a type of phenolic compound
- **Color Intensity** - a measure of the wine's color depth

This example shows how variables from very different domains can exhibit interesting dependence structures.

In [None]:
# Load the wine dataset from scikit-learn
X, y = toy_datasets.load_wine(return_X_y=True)

# Extract two features of interest
xx = X[:, 6]  # Flavanoids measurement
yy = X[:, 9]  # Color intensity measurement

# Create scatter plot
plt.figure()
sns.scatterplot(x=xx, y=yy)

# Fit bivariate Gaussian
R = np.cov(np.vstack((xx, yy)))
rv = mvn([np.mean(xx), np.mean(yy)], R)

# Create grid for contour plot
N = 200
X_grid = np.linspace(np.min(xx), np.max(xx), N)
Y_grid = np.linspace(np.min(yy), np.max(yy), N)
X_grid, Y_grid = np.meshgrid(X_grid, Y_grid)
pos = np.dstack((X_grid, Y_grid))
Z = rv.pdf(pos)

# Plot contours and labels
plt.contour(X_grid, Y_grid, Z)
plt.xlabel('Flavanoids')
plt.ylabel('Color Intensity')
plt.title('Wine Characteristics with Fitted Bivariate Gaussian')
plt.tight_layout()
plt.show()

---

## Key Takeaways

1. **Dependence is everywhere** - From financial markets to scientific measurements, understanding how variables relate is fundamental.

2. **Visualizing dependence** - Scatter plots and joint distributions help us see the relationship between variables.

3. **The Gaussian assumption** - While convenient, assuming bivariate Gaussian distributions may not capture the true dependence structure, especially:
   - In the tails (extreme events)
   - For non-linear relationships
   - When dependence structure varies across the range of values

4. **Why copulas?** - Copulas provide a flexible framework to model dependence separately from marginal distributions, allowing us to capture complex dependence patterns.

---

**Next:** In the following notebooks, we will examine the Pearson correlation coefficient in detail, understand its limitations, and introduce more sophisticated dependence measures.