-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][VectorStore] Support StarRocks as vector db #6119
[Feature][VectorStore] Support StarRocks as vector db #6119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example notebook would be very helpful!
config: Optional[StarRocksSettings] = None, | ||
**kwargs: Any, | ||
) -> None: | ||
"""StarRocks Wrapper to LangChain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better docstring would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated docstring and added a example notebook to use starrocks as vectordb.
OK. I'll fix that. |
dc1bb62
to
a3ea08f
Compare
@dirtysalt is attempting to deploy a commit to the LangChain Team on Vercel. A member of the Team first needs to authorize it. |
Thanks! I'll have a look at the notebook tomorrow, and help get this merged. |
Thanks. I've added a example notebook to use starrocks as vectordb. |
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com>
@dirtysalt thanks! (1) In the notebook, please add a header (2) Please run
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@rlancemartin I'm quite a newbie to this community, and thanks for pointing out problems. I've fixed lint and format problems. And I've updated the notebook and added some descriptions about the background of StarRocks. Really appreciate your review and time. |
Thanks! No problem. Lint errors are always a bit annoying :) I kicked off tests and will look again tomorrow. I can resolve any remaining ones quickly and get this in. |
OK, Thanks. I've checked lint error. Looks like you have to install a python package in lint python env. Because I've used
|
def __init__( | ||
self, | ||
embedding: Embeddings, | ||
config: Optional[StarRocksSettings] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make config optional and give it default None if we're gonna assert that it's non-null later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert statement is right after handling the null case
if config is not None:
self.config = config
else:
self.config = StarRocksSettings()
assert self.config
if config
is None, we give it a default setting.
So in theory, we can remove assert self.config
this stmt, because it will always true.
Looks like checks are now passing. Good work. |
* master: (28 commits) [Feature][VectorStore] Support StarRocks as vector db (langchain-ai#6119) Relax string input mapper check (langchain-ai#6544) bump to ver 208 (langchain-ai#6540) Harrison/multi tool (langchain-ai#6518) Infino integration for simplified logs, metrics & search across LLM data & token usage (langchain-ai#6218) Update model token mappings/cost to include 0613 models (langchain-ai#6122) Fix issue with non-list `To` header in GmailSendMessage Tool (langchain-ai#6242) Integrate Rockset as Vectorstore (langchain-ai#6216) Feat: Add a prompt template parameter to qa with structure chains (langchain-ai#6495) Add async support for HuggingFaceTextGenInference (langchain-ai#6507) Be able to use Codey models on Vertex AI (langchain-ai#6354) Add KuzuQAChain (langchain-ai#6454) Update index.mdx (langchain-ai#6326) Export trajectory eval fn (langchain-ai#6509) typo(llamacpp.ipynb): 'condiser' -> 'consider' (langchain-ai#6474) Fix typo in docstring of format_tool_to_openai_function (langchain-ai#6479) Make streamlit import optional (langchain-ai#6510) Fixed: 'readible' -> readable (langchain-ai#6492) Documentation Fix: Correct the example code output in the prompt templates doc (langchain-ai#6496) Fix link (langchain-ai#6501) ...
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167)
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167)
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167) # Conflicts: # be/src/exprs/vectorized/math_functions.cpp # be/src/exprs/vectorized/math_functions.h # gensrc/script/vectorized/vectorized_functions.py
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167) Signed-off-by: dirtysalt <dirtysalt1987@gmail.com>
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167) Signed-off-by: dirtysalt <dirtysalt1987@gmail.com>
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167)
Add [`langchain`](langchain-ai/langchain#6119) extension you can test the function with the following SQLs ``` create table t1 (id int, data array<float>) engine = olap distributed by hash(id) properties ("replication_num" = "1"); insert into t1 values(1, array<float>[0.1, 0.2, 0.3]), (2, array<float>[0.2, 0.1, 0.3]), (3, array<float>[0.3, 0.2, 0.1]); select cosine_similarity(array<float>[0.1, 0.2, 0.3], data) as dist, id from t1; ``` Signed-off-by: yanz <dirtysalt1987@gmail.com> (cherry picked from commit 4253167)
Fixes # (issue)
Before submitting
Here are some examples to use StarRocks as vectordb
Who can review?
Tag maintainers/contributors who might be interested:
@dev2049