-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChromaDB 0.4+ is no longer compatible with client config #7887
Comments
Answer generated by a 🤖 AnswerThank you for bringing this issue to our attention. It seems that the latest version of ChromaDB is no longer compatible with the client config, causing an error when creating a locally persisted Chroma store. Based on your description, it appears that you have already identified a potential solution by removing the line We appreciate your effort in finding a solution to this problem. If you would like to contribute your proposed fix to the LangChain project, we encourage you to create a pull request (PR) with your changes. The maintainers will review your contribution and decide if it should be merged into the project. Thank you for your support and for helping to improve LangChain! This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
Answer generated by a 🤖 AnswerI understand that you're encountering an issue with ChromaDB 0.4+ compatibility in the LangChain Python framework. The error message you're receiving suggests that the ChromaDB configuration you're using is deprecated and needs to be upgraded. You've found a workaround by removing the line To address this issue, one potential solution could be to use a conditional statement to check the version of ChromaDB being used and then apply the appropriate configuration. Here's how you could implement this: import chromadb
# Get the version of ChromaDB
chroma_version = chromadb.__version__
# Check if the version is 0.4 or later
if float(chroma_version) >= 0.4:
# Use the new configuration
_client_settings = chromadb.config.Settings(
chroma_db_impl="new_configuration",
persist_directory=persist_directory,
)
else:
# Use the old configuration
_client_settings = chromadb.config.Settings(
chroma_db_impl="duckdb+parquet",
persist_directory=persist_directory,
) In this code, This solution should allow you to use ChromaDB 0.4+ in a backwards compatible way. However, please note that you may still need to handle the migration process as suggested by the error message. This might involve using the This is just one potential solution and it might need to be adjusted based on your specific code and requirements. If you find this solution helpful and believe it's not covered by existing documentation, I encourage you to make a pull request to update the documentation for the benefit of other users. SourcesThis response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
Thanks for helping me solve this problem |
- Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the issue #7887 For attention of - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
Thanks for taking quick action on this issue. I was trying to figure out how to use the Chroma class with this new update, and found out that this code works: import chromadb
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
client = chromadb.PersistentClient(path="./db")
embeddings = SentenceTransformerEmbeddings(
model_name="all-mpnet-base-v2",
)
vector_db = Chroma(
client=client,
collection_name="your-collection-name",
embedding_function=embeddings,
)
print(f"Documents Loaded: {vector_db._collection.count()}") Sharing in case someone else needs it. |
Hi @kevinknights29, But how can I instead save embeddings and docs into into? Intuitively, I would use the following code: import chromadb
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(openai_api_key = key)
client = chromadb.PersistentClient(path="db_metadata_v5")
vector_db = Chroma(
client=client,
embedding_function=embeddings,
)
vector_db = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)
vector_db.persist() But now, according to this, the function is no longer supported. So the final two lines throw an error, and indeed, I have "TypeError: 'NoneType' object is not iterable". |
@aevedis |
@maheeppartap Eh, but it still does not work. I get @jeffchuber, maybe we need something on langchain side? Do you have any proposals? :) |
I also meet this problem, so I can only installed the older version(chromadb==0.3.21) to solve that.
…---Original---
From: "Andrey E. ***@***.***>
Date: Wed, Jul 19, 2023 23:29 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [hwchase17/langchain] ChromaDB 0.4+ is no longer compatible withclient config (Issue #7887)
db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)
Eh, but it still does not work. I get TypeError: 'NoneType' object is not iterable .
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Based on: https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py You could: import chromadb
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(openai_api_key = key)
client = chromadb.PersistentClient(path="db_metadata_v5")
vector_db = Chroma.from_documents(
client=client,
documents=chunks,
embedding=embeddings,
persist_directory=output_dir
)
vector_db.persist() This is possible because the from_documents method has a client argument. Could you give it a try? I don't see why it wouldn't work. |
Hi everyone, https://github.com/hwchase17/langchain/releases/tag/v0.0.236 was just released this morning and is backwards compatible. @kevinknights29 @Maipengfei please upgrade and if you have any issues, please ask in our discord - https://discord.gg/MMeYNTmh3x - we are responding very quickly right now! |
@jeffchuber, Thanks for the heads-up! |
@jeffchuber, thank you a lot! |
Hey @kevinknights29, this is just to notify you that in the end the issue has been fixed! The below code suffices, as it is backwards compatible as stated by @jeffchuber. from langchain.vectorstores import Chroma
output_dir = "./db_metadata_v5"
db = Chroma.from_documents(chunks, embeddings, persist_directory=output_dir) The difference is that now the embeddings, the text and various metadata are being stored in Note: I am on chromadb v0.4.2 and langchain v0.0.237. |
@gpapp thanks again for making the backwards compatibility PR! |
This comment was marked as resolved.
This comment was marked as resolved.
@vincentwang79 this worked for me... https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893 chroma==0.4.2 |
Yes, your code also works in my environment. I must have got something wrong. Thanks! |
persist_dir='/Users/mac/Documents/Python_Scripts/python_scripts_researchpak/' |
Hey everyone, Edit: Adding more context here: That fixed the error for me. |
@abdullah-028, just try: https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893. It should work fine. Only make sure to be on chromadb 0.4.2 and langchain 0.0.237. |
It is working for me with: chromadb 0.4.2 and langchain 0.0.237 |
- Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the issue langchain-ai#7887 For attention of - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
How to save and load chromadb using langchain ? |
Same issue here today.
Still buggy. What is the final solution then? |
@vdedourek2 |
This is what works for me
Create new db
Read existing db
Retriever from existing db
|
Thank you so much.... It's working now... ^_^ I blieve you should be able provide a pull request over there. At least one more file to be modified: privateGPT.py accordingly... Thanks... |
thank you @victorlee0505 ! |
Hello. I am trying to use chromadb with the PersistentClient way. The problem is I am getting the error: ValueError: You are using a deprecated configuration of Chroma. If you do not have data you wish to migrate, you only need to change how you construct If you do have data you wish to migrate, we have a migration tool you can use in order to See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help! The thing is! I am already using the new version, and I do not have data to migrate. Can someone help me with issue please? I can't solve this problem |
This seems to have been an error with migration. Such an error occurs in the case where you have some data that you were using with an older version of chromaDB and then you have updated the version. So basically, now it's asking you to migrate the data to the newer version. For an immediate fix, you can try changing the path of the persistent storage for the vector database and give it a try. Maybe try this:
Here note that I have mentioned the new_path which is essentially different from the old path (the one you are currently using). |
Thanks for your answer! It is not working though... I get the same error. Like I mentioned, I do not have any data to migrate. I am using the version: 0.4.14. Anyone, how should I proceed? Thanks! |
this is my setting
so my guess of |
Also, for those coming from deeplearning.ai and trying to run the syntax in their juputerworkbook, you might as a workaround for now until this is fixed downgrade chroma until langchain supports 0.4+ |
@lszyba1 langchain does not support 0.4 chroma? can you provide more info/repro? this is concerning |
Hello Jeff steps to reproduce: |
@lszyba1 perhaps you can help me diagnose more? i ran the notebook here - https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/4/vectorstores-and-embedding- and it looks like it ran ok? thanks! |
Hello Now copy/download the notebook onto your computer. open the notebook and try to run the "75" from your screenshot (you will need openai key, + few of prior lines) . |
@lszyba1 very strange. I pulled down the notebook locally and Can you provide more info on the failure? sorry about the trouble |
@jeffchuber That is weird...here is a test.py file that we can be on a same page. test.py
|
@lszyba1 let's try on a minimal example... this could be a version issue with python on your computer (very common!) import langchain
import chromadb
print(langchain.__version__)
print(chromadb.__version__)
#0.0.324
#0.4.15
import os
import getpass
os.environ['OPENAI_API_KEY'] = "<key>"
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('sotu.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
|
Hello Breaking it down: org: yours: The difference is that I'm passing persistent_directory...you are not. This does now work: #vectordb.persist() but what is the new syntax for making sure the chromadb is saved in a folder. And if I look at chroma docs, they use client. Note we are importing: Thanks |
using the same script i posted above, and adding db = Chroma.from_documents(
documents,
OpenAIEmbeddings(),
persist_directory='sotu_db' # new
) still executes perfectly. here is the integration to prove that it should support this sorry im not sure what else to say |
Thank you!!!!!!!!!!!!!!!!!!!! Well, this was crazy ride, but the final answer is : langchain 0.0.228 does not have any issues. So the issue was langchain all along even do the error is coming from chroma. I guess implementation in in 228 had the problem. Its unclear to me why pip install langchain (maybe early in a week when I re-created virtualenv would pull 0.228 version from July instead of any of the Oct versions). nonetheless Thank you for the time. The imports of version lines were the most critical. guess next time I'll start there first. solves the issue and
now works without a problem in a same-way as deeplearning.ai tutorial. |
@lszyba1 glad to hear :) |
Should this ticket be closed? Thanks |
Still getting:
langchain==0.0.329, chromadb==0.4.15. File that fails is super simple
Nothing else. If I do something similar with Pinecone, it doesn't fail. Any clue or where to look for? |
@rtomaf, im sorry im not able to reproduce this. |
Sorry, had an .env being loaded with:
That made it complain |
Hi, @gpapp, I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems like there has been a lot of discussion and troubleshooting around the compatibility issue with ChromaDB. Users have reported encountering errors related to deprecated configurations and attribute issues when using the latest versions of ChromaDB and LangChain. Some users have found success by downgrading to an older version of ChromaDB, while others have shared their settings and code snippets that have worked for them. The LangChain team has also provided guidance and suggestions for resolving the issue, including checking the versions of the libraries being used and ensuring proper configuration settings. The issue appears to be ongoing, with users continuing to seek assistance and share their experiences with different approaches. Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
System Info
Python 3.9.13
langchain-0.0.235-py3-none-any.whl
chromadb-0.4.0-py3-none-any.whl
Who can help?
No response
Information
Related Components
Reproduction
Steps to reproduce:
You are using a deprecated configuration of Chroma. Please pip install chroma-migrate and run
chroma-migrateto upgrade your configuration. See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help!
Expected behavior
The issue:
Starting chromadb 0.40 the chroma_db_impl is no longer a supported parameter, it uses sqlite instead.
Removing the line
chroma_db_impl="duckdb+parquet",
from langchain.vectorstores/chroma.py solves the issue, but the earlier DB cannot be used or migrated.
The text was updated successfully, but these errors were encountered: