Using VLLM with Langchain for RAG purposes #22947

sadrafh · 2024-06-16T00:18:45Z

sadrafh
Jun 16, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from vllm import LLM

llm_vllm = LLM(
    model="/home/ubuntu/Rag_23ai/Llama-2-7b-chat-hf",
    device="cuda",  # or "cpu" depending on your setup
)

template = """Answer the question based only on the following context:
{context} Question: {question}
"""
prompt =PromptTemplate.from_template(template)
retriever = knowledge_base.as_retriever()

chain = (
{"context": retriever, "question": RunnablePassthrough()} | prompt | llm_vllm | StrOutputParser()
)

response = chain.invoke(user_question)

Description

Hello,

I am using VLLM to use Llama models for RAG purposes. However, I am constantly facing a runnable error. This is my VLLM model initialization:

llm_vllm = LLM(
model="Llama-2-7b-chat-hf",
device="cuda"
)

When I try to create a chain with:

chain = (
{"context": retriever, "question": RunnablePassthrough()} | prompt | llm_vllm | StrOutputParser()
)
response = chain.invoke(user_question)

I get the following error: TypeError: Expected a Runnable, callable or dict. Instead got an unsupported type: <class 'vllm.entrypoints.llm.LLM'>

Similarly, if I use:

from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm_vllm, chain_type="stuff")

I get an error:

llm instance of Runnable expected.

System Info

I using an ubuntu machine

Answered by dosubot[bot]

Jun 16, 2024

Hey there, @sadrafh! I'm here to help you out with any issues or questions you have. Let's squash those bugs together! 😊

To resolve the errors and properly integrate VLLM with Langchain for Retrieval-Augmented Generation (RAG) purposes, follow these steps:

Install the vllm package:
Ensure that the vllm package is installed in your environment:
```
pip install vllm
```

Use the VLLM class from LangChain:
Instead of directly using the LLM class from vllm, use the VLLM class provided by LangChain. Here is an example:

from langchain_community.llms.vllm import VLLM
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_…

View full answer

@sadrafh · 2024-06-16T00:23:10Z

dosubot[bot]
bot Jun 16, 2024

Hey there, @sadrafh! I'm here to help you out with any issues or questions you have. Let's squash those bugs together! 😊

To resolve the errors and properly integrate VLLM with Langchain for Retrieval-Augmented Generation (RAG) purposes, follow these steps:

Install the vllm package:
Ensure that the vllm package is installed in your environment:
```
pip install vllm
```

Use the VLLM class from LangChain:
Instead of directly using the LLM class from vllm, use the VLLM class provided by LangChain. Here is an example:

from langchain_community.llms.vllm import VLLM
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Initialize the VLLM model
vllm_model = VLLM(
    model="/home/ubuntu/Rag_23ai/Llama-2-7b-chat-hf",
    tensor_parallel_size=1,
    trust_remote_code=False,
    n=1,
    best_of=None,
    presence_penalty=0.0,
    frequency_penalty=0.0,
    temperature=1.0,
    top_p=1.0,
    top_k=-1,
    use_beam_search=False,
    stop=None,
    ignore_eos=False,
    max_new_tokens=512,
    logprobs=None,
    dtype="auto",
    download_dir=None,
    vllm_kwargs={}
)

# Define the prompt template
template = """Answer the question based only on the following context:
{context} Question: {question}
"""
prompt = PromptTemplate.from_template(template)

# Assuming `knowledge_base` is already defined and set up
retriever = knowledge_base.as_retriever()

# Create the chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()} | prompt | vllm_model | StrOutputParser()
)

# Invoke the chain with a user question
user_question = "Your question here"
response = chain.invoke(user_question)
print(response)

By using the VLLM class from LangChain, you ensure compatibility with the LangChain framework and avoid the TypeError you encountered ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

1 reply

sadrafh Jun 16, 2024
Author

Thank you so much for your answer. it seems that I was importing vllm from the wrong package

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using VLLM with Langchain for RAG purposes #22947

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using VLLM with Langchain for RAG purposes #22947

sadrafh Jun 16, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 1 reply

dosubot[bot] bot Jun 16, 2024

sadrafh Jun 16, 2024 Author

sadrafh
Jun 16, 2024

Replies: 1 comment 1 reply

dosubot[bot]
bot Jun 16, 2024

sadrafh Jun 16, 2024
Author