# RePhraseQueryRetriever

Simple retriever that applies an LLM between the user input and the query pass the to retriever.

It can be used to pre-process the user input in any way.

The default prompt:

```
DEFAULT_QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with taking a natural languge query from a user
    and converting it into a query for a vectorstore. In this process, you strip out
    information that is not relevant for the retrieval task. Here is the user query: {question} """
)
```

Create a vectorstore.

In [1]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())

Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin


objc[5171]: Class GGMLMetalClass is implemented in both /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x297638208) and /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x297a64208). One of the two will be used. Which one is undefined.


In [2]:
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.re_phraser").setLevel(logging.INFO)

In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.re_phraser import RePhraseQueryRetriever

`Using the default prompt`

In [4]:
llm = ChatOpenAI(temperature=0)
retriever_from_llm = RePhraseQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=llm
)

In [6]:
docs = retriever_from_llm.get_relevant_documents(
    "Hi I'm Lance. What are the approaches to Task Decomposition?"
)

INFO:langchain.retrievers.re_phraser:Re-phrased question: approaches to Task Decomposition


In [7]:
docs = retriever_from_llm.get_relevant_documents(
    "I live in San Francisco. What are the approaches to Task Decomposition?"
)

INFO:langchain.retrievers.re_phraser:Re-phrased question: approaches to Task Decomposition


`Supply a prompt`

In [10]:
from langchain import LLMChain
from langchain.prompts import PromptTemplate

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with taking a natural languge query from a user
    and converting it into a query for a vectorstore. Strip out information that is not 
    relevant for the retrieval task. Here is the user query: {question} """,
)
llm = ChatOpenAI(temperature=0)
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT)

In [12]:
retriever_from_llm_chain = RePhraseQueryRetriever(
    retriever=vectorstore.as_retriever(), llm_chain=llm_chain
)

In [14]:
docs = retriever_from_llm_chain.get_relevant_documents(
    "Hi I'm Lance. What are the approaches to Task Decomposition?"
)

INFO:langchain.retrievers.re_phraser:Re-phrased question: approaches to Task Decomposition
