# Export

A notebook for exporting a PyTorch-based CLIPVision model to ONNX with attention-weighted averaging. This notebook was used to create the model variants which power the main `imgbeddings`.

(Note: requires installing `transformers[onnx]` from the master branch until 4.18)

In [1]:
from imgbeddings.models import export_clip_vision_to_onnx

In [2]:
output_folder = "/Users/maxwoolf/Desktop/imgbeddings-test/"

## patch-32

In testing, this performed best using half the layers (6).

In [3]:
export_clip_vision_to_onnx(
    output_folder,
    output_name="patch32_v1.onnx",
    patch_size=32,
    opset=15,
    num_layers=6,
)


Some weights of the model checkpoint at openai/clip-vit-base-patch32 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.6.l

## patch-16

In testing, this model performed best only interacting with the top-most layer.

In [4]:
export_clip_vision_to_onnx(
    output_folder,
    output_name="patch16_v1.onnx",
    patch_size=16,
    opset=15,
    num_layers=1,
)


Some weights of the model checkpoint at openai/clip-vit-base-patch16 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.6.l

## patch-14

Using same logic as patch-16, which is more important as model is 3x the size. (fun fact: it's under the 2GB threshold which would make this a lot more complicated!)

In [5]:
export_clip_vision_to_onnx(
    output_folder,
    output_name="patch14_v1.onnx",
    patch_size=14,
    opset=15,
    num_layers=1,
)


Downloading: 100%|██████████| 4.31k/4.31k [00:00<00:00, 2.02MB/s]
Downloading: 100%|██████████| 316/316 [00:00<00:00, 392kB/s]
Downloading: 100%|██████████| 1.59G/1.59G [00:56<00:00, 30.1MB/s]   
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.q_p