feat: tests and example of custom embedding function#167
Conversation
| - **First Run**: Model download and loading may take a few minutes | ||
| - **GPU Acceleration**: BGE-M3 will automatically use GPU if available | ||
| - **Memory Usage**: BGE-M3 requires ~2GB GPU memory or ~4GB RAM | ||
| - **Batch Size**: Larger batches improve throughput but require more memory |
There was a problem hiding this comment.
I couldn’t find a parameter to control the batch size in current code.
There was a problem hiding this comment.
It means you can input like 5 items into get_source_embeddings() function in one go. It gives you better throughput but higher memory usage.
| for automatic embedding generation and vector search capabilities. | ||
| ``` | ||
|
|
||
| ## Understanding the Code |
There was a problem hiding this comment.
This section can be removed as we can go to pytidb's documentation to explain how to go about defining a custom function step by step.
There was a problem hiding this comment.
Should we? If a user goes here directly, at least they will have an overview of how many functions they should overwrite to make an embedding function class.
Mini256
left a comment
There was a problem hiding this comment.
Encounter an error:
Traceback (most recent call last):
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/main.py", line 25, in
embed_func = BGEM3EmbeddingFunction()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/custom_embedding.py", line 46, in init
self._init_model()
File "/Users/xxxx/Projects/pytidb/examples/custom_embedding_function/custom_embedding.py", line 68, in _init_model
actual_dims = test_output["dense_vecs"].shape[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range
Co-authored-by: Mini256 <minianter@foxmail.com>
Co-authored-by: Mini256 <minianter@foxmail.com>
Removed the dimensions' judge logic due to the BGE-M3 only has one fixed dimension, which is 1024. |
|
And another thing is, it seems like we ran out of Jina AI's token: |
Co-authored-by: Mini256 <minianter@foxmail.com>
Co-authored-by: Mini256 <minianter@foxmail.com>
Co-authored-by: Mini256 <minianter@foxmail.com>
| import dotenv | ||
| from custom_embedding import BGEM3EmbeddingFunction | ||
| from pytidb.schema import TableModel, Field | ||
| from pytidb.datatype import Text |
There was a problem hiding this comment.
Please ensure you have upgraded to pytidb==0.0.11, because in the new version, Text has been replaced with TEXT.
There was a problem hiding this comment.
Let me delete the venv environment and reinstall it again.
There was a problem hiding this comment.
Changed:
TexttoTEXTcreate_table(..., mode='overwrite')tocreate_table(..., mode='overwrite')
And ran main.py successfully.
No description provided.