feat: tests and example of custom embedding function by Icemap · Pull Request #167 · pingcap/pytidb

Icemap · 2025-08-07T02:59:23Z

No description provided.

examples/custom_embedding_function/reqs.txt

tests/test_custom_embedding_function.py

examples/custom_embedding_function/custom_embedding.py

examples/custom_embedding_function/README.md

Mini256 · 2025-08-07T09:03:53Z

examples/custom_embedding_function/README.md

+- **First Run**: Model download and loading may take a few minutes
+- **GPU Acceleration**: BGE-M3 will automatically use GPU if available
+- **Memory Usage**: BGE-M3 requires ~2GB GPU memory or ~4GB RAM
+- **Batch Size**: Larger batches improve throughput but require more memory


I couldn’t find a parameter to control the batch size in current code.

It means you can input like 5 items into get_source_embeddings() function in one go. It gives you better throughput but higher memory usage.

examples/custom_embedding_function/custom_embedding.py

examples/custom_embedding_function/main.py

Mini256 · 2025-08-07T09:20:51Z

examples/custom_embedding_function/README.md

+for automatic embedding generation and vector search capabilities.
+```
+
+## Understanding the Code


This section can be removed as we can go to pytidb's documentation to explain how to go about defining a custom function step by step.

Should we? If a user goes here directly, at least they will have an overview of how many functions they should overwrite to make an embedding function class.

examples/custom_embedding_function/custom_embedding.py

Mini256

Encounter an error:

Traceback (most recent call last):
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/main.py", line 25, in
embed_func = BGEM3EmbeddingFunction()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/custom_embedding.py", line 46, in init
self._init_model()
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/custom_embedding.py", line 68, in _init_model
actual_dims = test_output["dense_vecs"].shape[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range

Co-authored-by: Mini256 <minianter@foxmail.com>

Icemap · 2025-08-07T13:59:14Z

actual_dims = test_output["dense_vecs"].shape[1]

Removed the dimensions' judge logic due to the BGE-M3 only has one fixed dimension, which is 1024.

Icemap · 2025-08-07T16:44:50Z

And another thing is, it seems like we ran out of Jina AI's token:

ERROR tests/test_auto_embedding_image.py::test_image_search_with_query_text - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Jina_aiException - litellm.Timeout: Connection timed out after None seconds.
ERROR tests/test_auto_embedding_image.py::test_image_search_with_image_path - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Jina_aiException - litellm.Timeout: Connection timed out after None seconds.
ERROR tests/test_auto_embedding_image.py::test_image_search_with_pil_image - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Jina_aiException - litellm.Timeout: Connection timed out after None seconds.

examples/custom_embedding_function/README.md

examples/custom_embedding_function/main.py

examples/custom_embedding_function/custom_embedding.py

examples/custom_embedding_function/README.md

Co-authored-by: Mini256 <minianter@foxmail.com>

Mini256 · 2025-08-08T06:41:30Z

examples/custom_embedding_function/main.py

+import dotenv
+from custom_embedding import BGEM3EmbeddingFunction
+from pytidb.schema import TableModel, Field
+from pytidb.datatype import Text


Please ensure you have upgraded to pytidb==0.0.11, because in the new version, Text has been replaced with TEXT.

Let me delete the venv environment and reinstall it again.

Changed:

Text to TEXT

create_table(..., mode='overwrite') to create_table(..., mode='overwrite')

And ran main.py successfully.

Mini256

LGTM

feat: tests and example of custom embedding function

e4f0b20

Mini256 reviewed Aug 7, 2025

View reviewed changes

examples/custom_embedding_function/custom_embedding.py Show resolved Hide resolved

Mini256 reviewed Aug 7, 2025

View reviewed changes

Icemap and others added 5 commits August 7, 2025 20:48

Update tests/test_custom_embedding_function.py

6fe69a9

Co-authored-by: Mini256 <minianter@foxmail.com>

Update examples/custom_embedding_function/custom_embedding.py

f17a3f0

Co-authored-by: Mini256 <minianter@foxmail.com>

feat: apply all comments

315bd28

Merge branch 'main' into feat-custom-embed-func

7d9dd62

feat: bge-m3 has a fixed dimension 1024

5de9662

Mini256 reviewed Aug 8, 2025

View reviewed changes

examples/custom_embedding_function/README.md Outdated Show resolved Hide resolved

Merge branch 'main' into feat-custom-embed-func

0893ce7

Mini256 requested changes Aug 8, 2025

View reviewed changes

Icemap and others added 5 commits August 8, 2025 12:38

Apply suggestions from code review

4f9c04d

Co-authored-by: Mini256 <minianter@foxmail.com>

Update examples/custom_embedding_function/README.md

55ea5b9

Co-authored-by: Mini256 <minianter@foxmail.com>

Update examples/custom_embedding_function/README.md

d948e39

Co-authored-by: Mini256 <minianter@foxmail.com>

feat: apply suggestions

af12fa7

lint: make lints god happy

18dc876

Mini256 reviewed Aug 8, 2025

View reviewed changes

fix: fit for v0.0.11

9b66ed2

Mini256 approved these changes Aug 8, 2025

View reviewed changes

Mini256 merged commit 828a34a into main Aug 8, 2025
3 checks passed

Mini256 deleted the feat-custom-embed-func branch August 8, 2025 15:33

Conversation

Icemap commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mini256 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Icemap Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Mini256 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Icemap Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mini256 left a comment

Choose a reason for hiding this comment

Uh oh!

Icemap commented Aug 7, 2025

Uh oh!

Icemap commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mini256 Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Icemap Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Icemap Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Mini256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mini256 Aug 8, 2025 •

edited

Loading