Skip to content

Comments

Fix: Ensure proper embedding dimensions#54

Merged
djannot merged 6 commits intokagent-dev:mainfrom
Daniel-Vaz:embedding-dimensions-fix
Feb 16, 2026
Merged

Fix: Ensure proper embedding dimensions#54
djannot merged 6 commits intokagent-dev:mainfrom
Daniel-Vaz:embedding-dimensions-fix

Conversation

@Daniel-Vaz
Copy link
Contributor

Notes

  • Fixes silent INSERT failures by replacing NULL with empty strings for branch and repo in vec0 TEXT columns.
  • Embedding dimension is now set dynamically based on model name (e.g., 1536 for text-embedding-3-small).

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

@Daniel-Vaz I'm already fixing it in #53

@Daniel-Vaz
Copy link
Contributor Author

@djannot
Thank you !
I was just today noticing this behaviour. Could you please also address then the static dimension sizes in that same PR ? If not I can at least change this PR to just focus on that instead.

Locally I'm testing using Azure Open AI text-embedding-3-small model, and the dimension size difference from the one statically defined makes all my queries fail even when using that same model to build the db.

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

I'll ping you as soon as the other PR is merged, so you can rebase yours from main at that time, ok ?

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

Done, you can rebase

@Daniel-Vaz Daniel-Vaz force-pushed the embedding-dimensions-fix branch from 0ae049d to 921188a Compare February 13, 2026 15:56
@Daniel-Vaz
Copy link
Contributor Author

@djannot Done. Feel free to have a look whenever you can.

@Daniel-Vaz Daniel-Vaz changed the title Fix: Prevent NULL values in vec0 TEXT columns, ensure proper embedding dimension Fix: Ensure proper embedding dimensions Feb 13, 2026
@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

npm test is returning errors

@Daniel-Vaz
Copy link
Contributor Author

My bad. Updated the test accordingly:

 Test Files  7 passed (7)
      Tests  456 passed (456)
   Start at  17:00:21
   Duration  6.72s (transform 1.48s, setup 0ms, import 7.33s, tests 7.30s, environment 1ms)

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

Thanks. I used AI to analyze it and got some good feedback:

  1. Default value discrepancy - getEmbeddingDimension defaults to 1536 for unknown models, but all the function signatures default embeddingDimension parameter to 3072. This inconsistency could cause mismatches if someone passes a default through one path vs. the other. I'd recommend making the default consistent everywhere, or removing defaults from the function signatures to force callers to always pass the value explicitly.
  2. console.warn in a library function - utils.ts:115 uses console.warn directly instead of the project's Logger class. This is inconsistent with the rest of the codebase and won't be captured by the logging infrastructure.
  3. Hardcoded model knowledge is fragile - The model-to-dimension mapping will need updates whenever new models appear. Consider:
    • Making this configurable in the YAML/config (e.g., an embedding_dimensions field)
    • Falling back to a test API call to determine dimensions at startup
    • At minimum, documenting which models are supported
  4. getMetadataValue in Qdrant makes an extra API call (database.ts:281-290) - Every metadata read now calls getCollection to discover the dimension. This adds latency. It would be better to pass the dimension as a parameter here too, like the other methods.

@Daniel-Vaz
Copy link
Contributor Author

Thanks will look into this next week !

@Daniel-Vaz
Copy link
Contributor Author

@djannot

Just finished applying some changes.
I fixed the logging mismatch on utils.ts, and changed the whole dimension discovery approach to solely rely on a new config\env var parameter to mention the size. To keep things working as they were, I'm assuming the 3072 default size from the large models.
Whenever you get a chance please have a look.

Thank you !

@djannot
Copy link
Collaborator

djannot commented Feb 16, 2026

The changes look good, but I merge another PR I was working on so you need to fix a small conflict and to increase the version. Thanks !

Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
@Daniel-Vaz Daniel-Vaz force-pushed the embedding-dimensions-fix branch from 2238184 to 27200c8 Compare February 16, 2026 16:18
@Daniel-Vaz
Copy link
Contributor Author

@djannot
Done 😄 !
Thank you !

@djannot djannot merged commit d9c4cf6 into kagent-dev:main Feb 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants