-
Notifications
You must be signed in to change notification settings - Fork 13.2k
model-conversion : add support for SentenceTransformers #16387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit adds support for models that use SentenceTransformer layers. The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers. Usage: Convert the model that also includes the SentenceTransformer layers: ```console (venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M" (venv) make embedding-convert-model ``` Verify the produced embeddings from the converted model against the original model embeddings: ```console (venv) make embedding-verify-logits-st ``` The original model can be run using SentenceTransformer: ```console (venv) make embedding-run-original-model-st ``` Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization: ```console (venv) make embedding-run-converted-model-st ```
printf("%9.6f ", embeddings[j * n_embd + i]); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The embd_norm parameter is not validated to ensure it's within acceptable range before being used
Suggestion: Add validation to ensure embd_norm is a valid value (>= -1) before using it in normalization
Code to review:
printf("Embeddings size: %d\n", n_embeddings);
Suggested fix:
if (embd_norm < -1) { fprintf(stderr, "Invalid embedding normalization value: %d\n", embd_norm); return 1; }
Automated review by PR Code Review Service
} else if (strcmp(argv[i], "-pooling") == 0) { | ||
pooling_enabled = true; | ||
} else if (strcmp(argv[i], "-embd-norm") == 0) { | ||
if (i + 1 < argc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The code parses -embd-norm flag without validating the range of embd_norm values, which may cause unexpected behavior for invalid normalization types.
Suggestion: Add validation to ensure embd_norm is within acceptable range (-1 to 2) or is a positive integer for p-norm.
Code to review:
embd_norm = std::stoi(argv[++i]);
Suggested fix:
if (embd_norm < -1) { fprintf(stderr, "Invalid embd_norm value: %d\n", embd_norm); return 1; }
Automated review by PR Code Review Service
static void print_usage(int, char ** argv) { | ||
printf("\nexample usage:\n"); | ||
printf("\n %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [prompt]\n", argv[0]); | ||
printf("\n %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [-pooling] [-embd-norm <norm>] [prompt]\n", argv[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Unnecessary code duplication in parameter parsing
Suggestion: Refactor duplicated code into a helper function to reduce redundancy
Code to review:
printf("%s: error: unable to load model\n" , __func__);
Automated review by PR Code Review Service
printf("Saving logits to %s\n", bin_filename); | ||
printf("Saving data to %s\n", bin_filename); | ||
|
||
FILE * f = fopen(bin_filename, "wb"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Missing error checking for file operations when writing binary data
Suggestion: Add checks for fwrite return value to ensure data was written successfully
Code to review:
fwrite(data_ptr, sizeof(float), data_size, f);
Automated review by PR Code Review Service
@@ -211,27 +239,27 @@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Missing error checking for file operations when writing text data
Suggestion: Add checks for fprintf return values to ensure data was written successfully
Code to review:
fprintf(f, "%d: %.6f\n", i, data_ptr[i]);
Automated review by PR Code Review Service
printf("\nexample usage:\n"); | ||
printf("\n %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [prompt]\n", argv[0]); | ||
printf("\n %s -m model.gguf [-ngl n_gpu_layers] -embd-mode [-pooling] [-embd-norm <norm>] [prompt]\n", argv[0]); | ||
printf("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The usage message for command-line arguments is updated but doesn't fully reflect the new options.
Suggestion: Ensure usage message includes all new command-line flags and their descriptions.
Code to review:
printf(" -embd-norm: normalization type for pooled embeddings (default: 2)\n");
Suggested fix:
printf(" -embd-norm: normalization type for pooled embeddings (default: 2)\n");
Automated review by PR Code Review Service
|
||
cmake --build ../../build --target llama-logits -j8 | ||
# TODO: update logits.cpp to accept a --file/-f option for the prompt | ||
../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode "$PROMPT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The script does not validate that the model file exists before attempting to use it, which could result in a cryptic error message.
Suggestion: Add a check to ensure the model file exists before proceeding.
Code to review:
if [ -n "$USE_POOLING" ]; then
Suggested fix:
if [ ! -f "$CONVERTED_MODEL" ]; then echo "Error: Model file '$CONVERTED_MODEL' not found" >&2; exit 1; fi
Automated review by PR Code Review Service
# TODO: update logits.cpp to accept a --file/-f option for the prompt | ||
../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode "$PROMPT" | ||
if [ -n "$USE_POOLING" ]; then | ||
../../build/bin/llama-logits -m "$CONVERTED_MODEL" -embd-mode -pooling "$PROMPT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The script does not handle the case where cmake build fails, potentially leading to an execution of a non-existent binary.
Suggestion: Add error handling to ensure cmake build succeeds before proceeding.
Code to review:
cmake --build ../../build --target llama-logits -j8
Suggested fix:
cmake --build ../../build --target llama-logits -j8 || { echo "Error: Build failed" >&2; exit 1; }
Automated review by PR Code Review Service
# Parse command line arguments | ||
CONVERTED_MODEL="" | ||
PROMPTS_FILE="" | ||
USE_POOLING="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Variable initialization for USE_POOLING is a good practice but should be done in a way that's consistent with other variable declarations.
Suggestion: Initialize USE_POOLING with empty string at the same time as other variables for consistency.
Code to review:
+USE_POOLING=""
Suggested fix:
CONVERTED_MODEL=""
PROMPTS_FILE=""
USE_POOLING=""
Automated review by PR Code Review Service
This commit adds support for models that use SentenceTransformer layers.
The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers.
Usage:
Convert the model that also includes the SentenceTransformer layers:
Verify the produced embeddings from the converted model against the original model embeddings:
(venv) make embedding-verify-logits-st
The original model can be run using SentenceTransformer:
(venv) make embedding-run-original-model-st
Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization:
(venv) make embedding-run-converted-model-st