# Custom Loss
The SEMNAN Solver tries to minimize the distance between the sample covariance
and the covariance induced by the model. There are more than one way to compute
this distance.

We will go through different methods for computing the distance between the sample covariance and the induced one.
These functions will be called the loss functions.
Let's start by taking the example from the [introduction](introduction.ipynb).


In [1]:
import torch
import semnan_cuda as sc

device = torch.device("cuda")

struct = torch.tensor([
        [1, 1, 1, 0],
        [0, 1, 0, 1],
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1],
        [0, 1, 1, 1],  # V_X
        [0, 0, 1, 0],  # V_BP
        [0, 0, 0, 1],  # V_BMI
        [0, 0, 0, 0],  # V_Y
    ], dtype=torch.bool)

sample_covariance = torch.tensor([
        [2,  3,  6,  8],
        [3,  7, 12, 16],
        [6, 12, 23, 30],
        [8, 16, 30, 41],
    ], device=device)

Before anything, we encapsulate our training code for multiple use.

In [2]:
def parametrize(semnan, max_iterations, min_error):
    optim = torch.optim.Adamax([semnan.weights], lr=0.001)

    for i in range(max_iterations):
        semnan.forward()
        error = semnan.loss().item()

        if error < min_error:
            break

        semnan.backward()
        optim.step()

        if i % (max_iterations / 10) == 0:
            print(f"iteration={i:<10} loss={error:<15.5}")
    else:
        print("Did not converge in the maximum number of iterations!")

We also configure the parametrization options.

In [3]:
max_iterations = 10000
min_error = 1.0e-7

We can use different built-in loss functions, namely `KullbackLeibler` and `Bhattacharyya`.
`KullbackLeibler` is the default loss function and the AMASEM parametrized using this loss function will be the
maximum likelihood estimation of the AMASEM.

In [4]:
semnan = sc.SEMNANSolver(
    struct,
    sample_covariance=sample_covariance,
    loss=sc.loss.KullbackLeibler()
)

parametrize(semnan, max_iterations=max_iterations, min_error=min_error)

iteration=0          loss=19.082         
iteration=1000       loss=2.37           
iteration=2000       loss=1.4224         
iteration=3000       loss=0.91027        
iteration=4000       loss=0.27608        
iteration=5000       loss=0.00018692     


We could use the `Bhattacharyya` loss function, and it would give us the same paramterized AMASEM.


In [5]:
semnan = sc.SEMNANSolver(
    struct,
    sample_covariance=sample_covariance,
    loss=sc.loss.Bhattacharyya()
)

parametrize(semnan, max_iterations=max_iterations, min_error=min_error)

iteration=0          loss=0.94143        
iteration=1000       loss=0.11201        
iteration=2000       loss=0.016577       
iteration=3000       loss=0.00033135     


This prints the same covariance matrix as before.


In [6]:
print(semnan.visible_covariance_)

tensor([[ 1.9952,  2.9896,  5.9794,  7.9739],
        [ 2.9896,  6.9746, 11.9532, 15.9395],
        [ 5.9794, 11.9532, 22.9124, 29.8867],
        [ 7.9739, 15.9395, 29.8867, 40.8538]], device='cuda:0')


## Custom Loss Function

When non of the two built-in custom functions are desirable, one could simply define an arbitrary custom loss function.
This is done by subclassing the `LossBase` class.

The signature of the loss function is as follows. We use the methods that give the
Kullback-Leibler divergence from the sample covariange. However, these methods are arbitrary.

In [7]:
class MyLoss(sc.loss.LossBase):
    def loss_proxy(self, visible_covariance):
        return torch.trace(self.sample_covariance_inv @ visible_covariance) - torch.logdet(visible_covariance)

    def loss(self, visible_covariance):
        return (self.loss_proxy(visible_covariance) - self.size + self.sample_covariance_logdet) / 2

    def loss_backward(self, visible_covariance, visible_covariance_grad):
        visible_covariance_grad.copy_(self.sample_covariance_inv)
        visible_covariance_grad.subtract_(torch.inverse(visible_covariance))

Notice that there are three functions to be overloaded: `loss`, `loss_proxy`, and `loss_backward`.
- `loss` computes the actual distance between the sample covariance and the induced covariance.
- `loss_proxy` is a computationally more effective function that is a monotonic counterpart of the `loss` function
- `loss_backward` computes the derivative of the distance with respect t the induced covariance.

Among these functions, only `loss_backward` is required for the parametrization to work. The other two can be used
to get the loss or loss proxy values when one desires.

Now, let's re-parametrize the AMASEM using this new custom function.

In [8]:
semnan = sc.SEMNANSolver(
    struct,
    sample_covariance=sample_covariance,
    loss=MyLoss()
)

parametrize(semnan, max_iterations=max_iterations, min_error=min_error)

iteration=0          loss=11.452         
iteration=1000       loss=2.701          
iteration=2000       loss=2.0007         
iteration=3000       loss=0.597          
iteration=4000       loss=0.0043001      


If you notice, this gives similar convergence results to our initial `KullbackLeibler` loss function.
Now, let's print the resulting visible covariance matrix.

In [9]:
print(semnan.visible_covariance_)

tensor([[ 1.9972,  2.9954,  5.9911,  7.9881],
        [ 2.9954,  6.9923, 11.9852, 15.9802],
        [ 5.9911, 11.9852, 22.9715, 29.9619],
        [ 7.9881, 15.9802, 29.9619, 40.9491]], device='cuda:0')
