-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Closed as not planned
Labels
bugRelated to a bug, vulnerability, unexpected error with an existing featureRelated to a bug, vulnerability, unexpected error with an existing feature
Description
System Info
Python==3.11.5
langchain==0.0.300
llama_cpp_python==0.2.6
chromadb==0.4.12
Running on Windows and on CPU
Who can help?
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
Embeddings can't be stored in a chroma/FAISS vectorstore using llama-cpp-python.
I found that embedding the contents of very very simple websites, such as "http://example.org" work fine.
Here is the code I am executing in my notebook:
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import LlamaCppEmbeddings
from langchain.vectorstores import Chroma
llama = LlamaCppEmbeddings(model_path="../models/openorca_stx.gguf")
loader = WebBaseLoader("https://www.bbc.com/")
pages = loader.load()
vectordb = Chroma.from_documents(
documents=pages,
embedding=llama,
persist_directory='../data/vectorstores/'
)
Here is the Traceback:
ValueError Traceback (most recent call last)
[p:\git_repos\langchain-test\src_langchain\04_vectorstore.ipynb](file:///P:/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb) Cell 8 line 1
[7](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=6) loader = WebBaseLoader("https://www.bbc.com/")
[8](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=7) pages = loader.load()
---> [10](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=9) vectordb = Chroma.from_documents(
[11](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=10) documents=pages,
[12](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=11) embedding=llama,
[13](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=12) persist_directory='../data/vectorstores/'
[14](vscode-notebook-cell:/p%3A/git_repos/langchain-test/src_langchain/04_vectorstore.ipynb#X12sZmlsZQ%3D%3D?line=13) )
File [d:\miniconda3\envs\langchain\Lib\site-packages\langchain\vectorstores\chroma.py:646](file:///D:/miniconda3/envs/langchain/Lib/site-packages/langchain/vectorstores/chroma.py:646), in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
644 texts = [doc.page_content for doc in documents]
645 metadatas = [doc.metadata for doc in documents]
--> 646 return cls.from_texts(
647 texts=texts,
648 embedding=embedding,
649 metadatas=metadatas,
650 ids=ids,
651 collection_name=collection_name,
652 persist_directory=persist_directory,
653 client_settings=client_settings,
654 client=client,
655 collection_metadata=collection_metadata,
656 **kwargs,
...
--> 510 self.input_ids[self.n_tokens : self.n_tokens + n_tokens] = batch
511 # Save logits
512 rows = n_tokens if self.params.logits_all else 1
ValueError: could not broadcast input array from shape (8,) into shape (0,)
Expected behavior
I expect this function Chroma.from_documents( documents=pages, embedding=llama, persist_directory='../data/vectorstores/' ) to create document embeddings and store them in the persistent chromadb
Metadata
Metadata
Assignees
Labels
bugRelated to a bug, vulnerability, unexpected error with an existing featureRelated to a bug, vulnerability, unexpected error with an existing feature