## Calculating Statistical Values for Encrypted Multi-Column Data

This tutorial demonstrates how to calculate statistical values for encrypted multi-column data using HEaaN.Stat. You can run this tutorial in the `tutorial/04_heframe_stats.ipynb` path of the HEaaN.Stat Docker image.

### Step 1. Import HEaaN.Stat SDK and Create a Context
We start by importing the necessary libraries and creating a context for HEaaN.Stat. This context manages the encryption keys and internal parameters for homomorphic encryption.

In [None]:
import numpy as np
import pandas as pd
import heaan_stat

# Initialize context with default parameters
context = heaan_stat.Context.from_args()

### Step 2: Statistical Functions on Multi-Column Data
HEaaN.Stat  supports statistical functions such as sum, mean, variance, standard deviation, standard error, coefficient of variation, skewness, and kurtosis for multi-column data stored in an `HEFrame.` When applying these functions, only numerical columns are processed.
Let’s create a sample `HEFrame` from a Pandas `DataFrame`:

In [None]:
from heaan_stat import HEFrame

# Create a sample DataFrame
df = pd.DataFrame({
    "Column 1": pd.Series([-5, -4, -3, -2, -1]),
    "Column 2": pd.Series([9, 7, 8, 0, 6]),
    "Column 3": pd.Series(["a", "a", "b", "c", "b"], dtype="category")
})

df  # Display the DataFrame

Next, convert this `DataFrame` into an `HEFrame` and apply some basic statistical functions like sum and variance.

In [None]:
hf = HEFrame(context, df, encrypt_columns=True)
sum = hf.sum() # Get sum

### Step 3: Visualizing Results
HEaaN.Stat  allows you to visualize the results of statistical functions using the `plot()` function. Here is how to visualize the sum:

In [None]:
sum.decrypt()  # Decrypt the results
sum.plot()  # Plot the sum

Similarly, let’s calculate and visualize the variance:

In [None]:
var = hf.var()  # Calculate variance
var.decrypt()  # Decrypt the results
var.plot()  # Plot the variance

### Step 4: Grouping Data with groupby()
HEaaN.Stat supports the `groupby()` operation, which is used to split an `HEFrame` into groups based on the values of one or more columns. After grouping, you can perform aggregate functions such as mean, variance, standard deviation, standard error, skewness, and kurtosis on each group.
Let’s group the data based on Column 3:

In [None]:
hg = hf.groupby("Column 3")  # Group columns based on values in Column 3

Now, let’s review the original DataFrame before calculating the group-wise statistics:

In [None]:
df  # Display the original DataFrame

For `hg.sum()`, the sum of the groups is calculated based on the indices of Column 3. For example:

    Group a consists of rows with indices [0, 1].
    Group b consists of rows with indices [2, 4].
    Group c consists of the row with index [3].

Let’s calculate and visualize the sum for each group:

In [None]:
sum = hg.sum()  # Calculate the sum for each group
sum.decrypt()  # Decrypt the results
sum.plot()  # Plot the group-wise sum