## Tutorial 5. Correlation and Covariance

This tutorial is located in the `tutorial/05_correlation_and_covariance.ipynb` path of the HEaaN.Stat Docker image.

In `HEaaN.Stat SDK`, you can calculate correlation and covariance in two ways:

1. Directly calculate the correlation or covariance between two columns.
2. Build a covariance or correlation matrix for multiple columns using the `cov()` or `corr()` methods of `HEFrame`.

### Step 1. Import HEaaN.Stat SDK and Create a Context
Let’s start by importing the necessary libraries and setting up the HEaaN.Stat context, which will manage encryption keys and parameters.

In [None]:
import pandas as pd
import heaan_stat

# Initialize the context with default parameters
context = heaan_stat.Context.from_args()

### Step 2: Create the DataFrame and Convert to HEFrame
We will create a Pandas `DataFrame` with three columns of data. The first two columns (Column 1 and Column 2) contain numeric data, while the third column (Column 3) contains categorical data

In [None]:
from heaan_stat import HEFrame

# Set the data values for each column
a = [10, 15, 9, 12, 11, 14, 10]
b = [11, 16, 10, 13, 12, 15, 11]
c = [1, 5, 3, 2, 4, 3, 2]

# Create a Pandas DataFrame
df = pd.DataFrame({
    'Column 1': a,
    'Column 2': b,
    'Column 3': c
})

# Convert the DataFrame to an encrypted HEFrame
hf = HEFrame(context, df, encrypt_columns=True)

### Step 3: Calculate Covariance and Correlation Directly
We can calculate the covariance and correlation between two columns (e.g., Column 1 and Column 2) directly using the `cov()` and `corr()` methods on `HESeries`.

In [None]:
col1 = hf["Column 1"]
col2 = hf["Column 2"]

# Calculate covariance and correlation
cov = col1.cov(col2)
corr = col1.corr(col2)

# Decrypt and display the results
print(f"Covariance: {cov.decrypt_decode()}")
print(f"Correlation: {corr.decrypt_decode()}")

### Step 4: Create Covariance and Correlation Matrix
We can also generate a covariance or correlation matrix across multiple columns by applying the `cov()` and `corr()` methods directly on the `HEFrame`. This is particularly useful when dealing with multi-column data.

In [None]:
# Use cov() and corr() methods of HEFrame
cov_frame = hf.cov()
corr_frame = hf.corr()

### Step 5: Visualize Covariance and Correlation Matrix
HEaaN.Stat SDK supports visualization of the covariance and correlation matrices using the `plot()` method. After decryption, the results can also be displayed in a table format.

Covariance Matrix:

In [None]:
# Decrypt, plot, and display the covariance matrix
cov_frame.decrypt()
cov_frame.plot()
cov_frame.to_frame()

Correlation Matrix:

In [None]:
# Decrypt, plot, and display the correlation matrix
corr_frame.decrypt()
corr_frame.plot()
corr_frame.to_frame()