Is there a way to swap a set of parameters inside of an .onnx / .ort graph with an identically shaped set of parameters? #6090

jakemdaly · 2024-04-18T17:33:47Z

Ask a Question

Question

I want to be able to swap params at inference time to facilitate a LoRA deployment.

Eg. in torch, I could do

class myModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4,8)
        self.act = nn.Sigmoid()
    def forward(self, x):
        return self.act( self.linear(x) )

m = myModel()
x = torch.randn(1,4)

pred_1 = m( x )

new_weights = nn.Parameter(torch.randn(8,4))
# swap weights
m.linear.weight = new_weights

# call model with new weights
m( x )

Notes

I am using an ORT file for inference if that matters

gramalingam · 2024-04-18T20:47:25Z

If you use external-data format, you can replace the data file representing external tensors with new values, as you wish.

Alternatively, you can make the weights as input parameters of the model, and then vary them as you wish for each invocation. However, this will incur a performance penalty (potentially huge) if ort has to do things like move the weight to gpu or transpose them etc. (which will be done once at session creation if the weights are not inputs).

jakemdaly · 2024-04-22T16:35:37Z

@gramalingam After reading the docs and tinkering with some of those functions, I am still not sure I quite understand the purpose of the external-data format, or if it would be compatible for the onnxruntime API (as opposed to onnx). What is the purpose of the format, and could you provide psuedo code to show how to load a subset of params with onnxruntime?

gramalingam · 2024-04-22T16:44:30Z

Yes, onnxruntime also supports the external-data format, which is part of the onnx standard. The external-data format serves a couple of purposes.

First, the protobuf format has a limit of 2GB on the size of a protobuf object (in terms of the size of the serialized representation). Models which exceed this size can exploit the external-data format to get around this limitation.

Second, even if the model size is less than 2GB, weights end up dominating the size of the model representation. Hence, it is convenient and efficient to load these weights only if required. It helps analysis/optimization tools that care about the graph, and not so much about the weights.

jakemdaly · 2024-04-22T16:56:07Z

Is there a way to specify which parameters in the graph to load weights into? Or this capability doesn't exist yet

jakemdaly · 2024-05-01T16:14:03Z

In my application I adding an initializer with the AddInitializer method, and then loading the session via the CreateSessionFromArray API. I am getting the following initialization error when I call CreateSessionFromArray

[E:onnxruntime:, inference_session.cc:1935 onnxruntime::InferenceSession::Initialize::<lambda_5a23845ba810e30de3b9e7b450415bf5>::operator ()] Exception during initialization: C:\a\_work\1\s\onnxruntime\core\optimizer\initializer.cc:35 onnxruntime::Initializer::Initializer !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.

Because I am not supplying a model path (I'm initializing from an array), does this imply the two methods or not compatible?

ambroser53 · 2024-05-21T08:55:30Z

This issue is not completed! I still think this is very very necessary functionality that ONNX is completely missing. There have been multiple papers showing the effectiveness of LoRA switching ensembles with LoRALand being just one recent one off the top of my head.

For myself and a couple others this missing functionality is a deal breaker for ONNX so I would not brush it off so lightly! If it really is very difficult to implement please let me know as I will move away from my work in ONNX.

This issue on a different repo appeared to have managed to do it but looking at their code it looks very janky and may require TensorRT.

TalkUHulk · 2024-06-21T09:02:31Z

Loading the onnx model using Lora method，Perhaps you can refer to my code：AIDB

jakemdaly · 2024-06-21T18:44:46Z

Thanks for the link. Because this seems of interest to others, here's what I ended up doing, I hope it helps someone else--

Let's call the base model $Model_{base}$, and the LoRA variant $Model_{lora}$. Also call some parameter set in the base model $W$ which $Model_{lora}$ will use as $W_{lora} = W + BA$.

To prevent from storing $W$ twice, I exported it as external data:

def export_model_as_external_data(model_onnx_path, model_save_path, gt_size = 1024):
    '''Exports the model at `model_onnx_path` to one called `model_save_path`, with all parameter sets
    over size `gt_size` becoming external data'''
    print(f"[LoRA Export] Converting model to external data format...", end=' ')
    loc = Path(model_onnx_path).parent
    model = onnx.load(model_onnx_path)
    onnx.save_model(
        model, 
        f=model_save_path,
        save_as_external_data=True,
        all_tensors_to_one_file=False,
        size_threshold=gt_size
    )
    print(" Success")

Then for creating the $Model_{base}$ session, I stored the $W$ in a struct, converted it to an onnx tensor using m_ort->CreateTensorWithDataAsOrtValue, and added it to a SessionOptions object using m_ort->AddInitializer. Then for the rest of the model I exported it to a binary array and used m_ort->CreateSessionFromArray, passing it the SessionOptions with the initializer.

Similar story for $Model_{lora}$ but I performed the LoRA merge prior to creating the onnx tensor.

Two gotchas I had:

If you are using the loralib python module which is based on the paper, keep in mind there is a scalar that actually makes the previous equation $W_{lora} = W + s \cdot BA$. I folded s into B for efficiency
Also keep in mind that when exporting to onnx, some graph optimizations will be applied such as constant folding. This is relevant if you are for example overwriting additional parameter sets in $Model_{base}$: you might not be able to directly use the set of torch params because you will be overwriting a graph node which had optimizations applied to it with a set that now doesn't have those applied

jakemdaly added the question Questions about ONNX label Apr 18, 2024

thiagocrepaldi closed this as completed May 1, 2024

ambroser53 mentioned this issue May 22, 2024

Way to swap parameters inside .onnx graph (LoRA functionality) #6144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to swap a set of parameters inside of an .onnx / .ort graph with an identically shaped set of parameters? #6090

Is there a way to swap a set of parameters inside of an .onnx / .ort graph with an identically shaped set of parameters? #6090

jakemdaly commented Apr 18, 2024

gramalingam commented Apr 18, 2024

jakemdaly commented Apr 22, 2024

gramalingam commented Apr 22, 2024

jakemdaly commented Apr 22, 2024 •

edited

Loading

jakemdaly commented May 1, 2024 •

edited

Loading

ambroser53 commented May 21, 2024

TalkUHulk commented Jun 21, 2024

jakemdaly commented Jun 21, 2024

Is there a way to swap a set of parameters inside of an .onnx / .ort graph with an identically shaped set of parameters? #6090

Is there a way to swap a set of parameters inside of an .onnx / .ort graph with an identically shaped set of parameters? #6090

Comments

jakemdaly commented Apr 18, 2024

Ask a Question

Question

Notes

gramalingam commented Apr 18, 2024

jakemdaly commented Apr 22, 2024

gramalingam commented Apr 22, 2024

jakemdaly commented Apr 22, 2024 • edited Loading

jakemdaly commented May 1, 2024 • edited Loading

ambroser53 commented May 21, 2024

TalkUHulk commented Jun 21, 2024

jakemdaly commented Jun 21, 2024

jakemdaly commented Apr 22, 2024 •

edited

Loading

jakemdaly commented May 1, 2024 •

edited

Loading