model-conversion : add support for SentenceTransformers #16387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

danbev wants to merge 2 commits into ggml-org:master from danbev:model-conversion-st-support

+284 −149

Member

danbev commented Oct 2, 2025

This commit adds support for models that use SentenceTransformer layers.

The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers.

Usage:
Convert the model that also includes the SentenceTransformer layers:

(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model

Verify the produced embeddings from the converted model against the original model embeddings:

(venv) make embedding-verify-logits-st

The original model can be run using SentenceTransformer:

(venv) make embedding-run-original-model-st

Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization:

(venv) make embedding-run-converted-model-st


          model-conversion : add support for SentenceTransformers

8b552a3

This commit adds support for models that use SentenceTransformer layers.

The motivation for this is that if converted model includes any of the
numbered layers specified in the original models repository then these
changes enable these models to be used and verified. Currently the
model-conversion only support the base model output without any of
the additional transformation layers.

Usage:
Convert the model that also includes the SentenceTransformer layers:
```console
(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model
```

Verify the produced embeddings from the converted model against the
original model embeddings:
```console
(venv) make embedding-verify-logits-st
```

The original model can be run using SentenceTransformer:
```console
(venv) make embedding-run-original-model-st
```

Run the converted model using "SentenceTransformer" layers whic
enables pooling and normalization:
```console
(venv) make embedding-run-converted-model-st
```

github-actions bot added examples python labels


          add model-conversion example requirements

f1315fa

danbev mentioned this pull request

model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules #16367

Open

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

    
                                  printf("%9.6f ", embeddings[j * n_embd + i]);

                              }

                          }

tommarques56 Oct 4, 2025

Issue: The embd_norm parameter is not validated to ensure it's within acceptable range before being used

Suggestion: Add validation to ensure embd_norm is a valid value (>= -1) before using it in normalization

Code to review:

printf("Embeddings size: %d\n", n_embeddings);

Suggested fix:

if (embd_norm < -1) { fprintf(stderr, "Invalid embedding normalization value: %d\n", embd_norm); return 1; }

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

    
                          } else if (strcmp(argv[i], "-pooling") == 0) {

                              pooling_enabled = true;

                          } else if (strcmp(argv[i], "-embd-norm") == 0) {

                              if (i + 1 < argc) {

tommarques56 Oct 4, 2025

Issue: The code parses -embd-norm flag without validating the range of embd_norm values, which may cause unexpected behavior for invalid normalization types.

Suggestion: Add validation to ensure embd_norm is within acceptable range (-1 to 2) or is a positive integer for p-norm.

Code to review:

                        embd_norm = std::stoi(argv[++i]);

Suggested fix:

if (embd_norm < -1) { fprintf(stderr, "Invalid embd_norm value: %d\n", embd_norm); return 1; }

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

    
              static void print_usage(int, char ** argv) {

                  printf("\nexample usage:\n");

                  printf("\n    %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [prompt]\n", argv[0]);

                  printf("\n    %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [-pooling] [-embd-norm <norm>] [prompt]\n", argv[0]);

tommarques56 Oct 4, 2025

Issue: Unnecessary code duplication in parameter parsing

Suggestion: Refactor duplicated code into a helper function to reduce redundancy

Code to review:

printf("%s: error: unable to load model\n" , __func__);

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

    
                  printf("Saving logits to %s\n", bin_filename);

                  printf("Saving data to %s\n", bin_filename);

                  FILE * f = fopen(bin_filename, "wb");

tommarques56 Oct 4, 2025

Issue: Missing error checking for file operations when writing binary data

Suggestion: Add checks for fwrite return value to ensure data was written successfully

Code to review:

fwrite(data_ptr, sizeof(float), data_size, f);

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

@@ -211,27 +239,27 @@

tommarques56 Oct 4, 2025

Issue: Missing error checking for file operations when writing text data

Suggestion: Add checks for fprintf return values to ensure data was written successfully

Code to review:

fprintf(f, "%d: %.6f\n", i, data_ptr[i]);

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/logits.cpp

    
                  printf("\nexample usage:\n");

                  printf("\n    %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [prompt]\n", argv[0]);

                  printf("\n    %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [-pooling] [-embd-norm <norm>] [prompt]\n", argv[0]);

                  printf("\n");

tommarques56 Oct 4, 2025

Issue: The usage message for command-line arguments is updated but doesn't fully reflect the new options.

Suggestion: Ensure usage message includes all new command-line flags and their descriptions.

Code to review:

    printf("  -embd-norm: normalization type for pooled embeddings (default: 2)\n");

Suggested fix:

printf("  -embd-norm: normalization type for pooled embeddings (default: 2)\n");

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/scripts/embedding/run-converted-model.sh

    
              cmake --build ../../build --target llama-logits -j8

              # TODO: update logits.cpp to accept a --file/-f option for the prompt

              ../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode "$PROMPT"

tommarques56 Oct 4, 2025

Issue: The script does not validate that the model file exists before attempting to use it, which could result in a cryptic error message.

Suggestion: Add a check to ensure the model file exists before proceeding.

Code to review:

if [ -n "$USE_POOLING" ]; then

Suggested fix:

if [ ! -f "$CONVERTED_MODEL" ]; then echo "Error: Model file '$CONVERTED_MODEL' not found" >&2; exit 1; fi

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/scripts/embedding/run-converted-model.sh

    
              # TODO: update logits.cpp to accept a --file/-f option for the prompt

              ../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode "$PROMPT"

              if [ -n "$USE_POOLING" ]; then

                  ../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode -pooling "$PROMPT"

tommarques56 Oct 4, 2025

Issue: The script does not handle the case where cmake build fails, potentially leading to an execution of a non-existent binary.

Suggestion: Add error handling to ensure cmake build succeeds before proceeding.

Code to review:

cmake --build ../../build --target llama-logits -j8

Suggested fix:

cmake --build ../../build --target llama-logits -j8 || { echo "Error: Build failed" >&2; exit 1; }

Automated review by PR Code Review Service

tommarques56 reviewed

View reviewed changes

examples/model-conversion/scripts/embedding/run-converted-model.sh

    
              # Parse command line arguments

              CONVERTED_MODEL=""

              PROMPTS_FILE=""

              USE_POOLING=""

tommarques56 Oct 4, 2025

Issue: Variable initialization for USE_POOLING is a good practice but should be done in a way that's consistent with other variable declarations.

Suggestion: Initialize USE_POOLING with empty string at the same time as other variables for consistency.

Code to review:

+USE_POOLING=""

Suggested fix:

CONVERTED_MODEL=""
PROMPTS_FILE=""
USE_POOLING=""

Automated review by PR Code Review Service

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

CISC Awaiting requested review from CISC CISC will be requested when the pull request is marked ready for review CISC is a code owner

1 more reviewer

tommarques56 tommarques56 left review comments

At least 1 approving review is required to merge this pull request.

Labels

examples python