# Replacing weights of a traced/loaded model on-the-fly

There are some scenarios where you need to change the weights of a loaded model on-the-fly to consume less time with I/O, hardware initialization, etc. In PyTorch, there is a well defined way of manipulating weights of a model and we're going to explore that as well. The only thing we need to pay attention is on the way we load the new weights and to which device we should move the tensors befor trying to replace them in a model loaded into Inferentia2/Trainium HBM.

In [None]:
import os
os.environ['NEURON_RT_NUM_CORES']='1'
import torch
import torch.nn as nn
import torch_neuronx

### 1) First, let's create a dummy model with a simple linear layer

In [3]:
x = torch.rand(2, 4)
if os.path.isfile("linear.pt"):
    print("Loading model from disk")
    traced_model = torch.jit.load("linear.pt")
else:
    print("Tracing model...")
    model = nn.Linear(4, 4, bias=False)
    _= torch.nn.init.xavier_uniform_(model.weight)
    y = model(x)
    traced_model = torch_neuronx.trace(model, x, inline_weights_to_neff=False) # inline_weights = False is required for replacing weights on-the-fly
    traced_model.save("linear.pt")

Tracing model...
.
Compiler status PASS


#### Special device
Now, we need to use a special device called **privateuseone** where we load our tensors. This special device will make use of Inferentia HBM, so in the end you have a tensor loaded into the accelerated memory, ready to be used.

In [4]:
x = x.to("privateuseone:0")

### 2) Then we execute it to see the results

In [5]:
y = traced_model(x)
y

tensor([[ 0.4002, -0.0553,  0.7019,  0.6522],
        [ 0.0463,  0.1206,  1.0582, -0.6579]])

### 3) Now, let's create a new set of weights and replace the original/loaded ones from our model
In this step, we'll replace all the weights of our model. You'll see in the results completely different values. Please notice we didn't reload our model. Only the weights were replaced.

In [9]:
new_weights = torch.rand(4, 4).to("privateuseone:0")
_= torch.nn.init.xavier_uniform_(new_weights)

In [10]:
torch_neuronx.replace_weights(traced_model, {"weight": new_weights} )

In [11]:
y = traced_model(x)
y.cpu()

tensor([[ 0.1614,  0.1684, -0.1753, -0.0212],
        [ 0.3497,  0.2425, -0.2417, -0.1437]])

### 4) Finally, let's create a new set of weights, but this time we'll replace only a fraction of the model weights

In [12]:
new_weights = torch.rand(1, 2).to("privateuseone:0")
_= torch.nn.init.xavier_uniform_(new_weights)

In [13]:
model_weights = traced_model.weights._parameters['weight']
model_weights.cpu()

tensor([[-0.0263,  0.2754,  0.4246, -0.0604],
        [ 0.5144, -0.3213, -0.2463,  0.3047],
        [-0.3655,  0.2371, -0.0809, -0.1804],
        [-0.4069,  0.3934, -0.4091,  0.0616]])

In [14]:
model_weights[0,:2] = new_weights
traced_model.weights._parameters['weight'] = model_weights
traced_model.weights._parameters['weight'].cpu()

tensor([[-0.0338, -0.5757,  0.4246, -0.0604],
        [ 0.5144, -0.3213, -0.2463,  0.3047],
        [-0.3655,  0.2371, -0.0809, -0.1804],
        [-0.4069,  0.3934, -0.4091,  0.0616]])

As you can see in the printed set of weights above, only the 1st 2 elements of row 0 were replaced. And you get different predictions, off course.

In [16]:
y = traced_model(x)
y.cpu()

tensor([[-0.0507,  0.1684, -0.1753, -0.0212],
        [-0.4138,  0.2425, -0.2417, -0.1437]])