Intel Extension for Transformers provides a comprehensive suite of Langchain-based extension APIs, including advanced retrievers, embedding models, and vector stores. These enhancements are carefully crafted to expand the capabilities of the original langchain API, ultimately boosting overall performance. This extension is specifically tailored to enhance the functionality and performance of RAG.

# Prepare Environment

Install intel extension for transformers:

In [5]:
!pip install intel-extension-for-transformers

Install Requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../

# Run LLM with Langchain-extension API

[Chroma](https://docs.trychroma.com/getting-started) stands out as an AI-native, open-source vector database, placing a strong emphasis on boosting developer productivity and satisfaction. It's available under the Apache 2.0 license. Initially, the original Chroma API within langchain was designed to accept settings only once, at the chatbot's startup. This approach lacked flexibility, as it didn't allow users to modify settings post-initialization. To address this limitation, we've revamped the Chroma API. Our updated version introduces enhanced vector store operations, enabling users to adjust and fine-tune their settings even after the chatbot has been initialized, offering a more adaptable and user-friendly experience.

In [None]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from intel_extension_for_transformers.langchain.vectorstores import Chroma
from intel_extension_for_transformers.neural_chat.pipeline.plugins.retrieval.parser.parser import DocumentParser

document_parser = DocumentParser()
input_path="https://lilianweng.github.io/posts/2023-06-23-agent/"
data_collection=document_parser.load(input=input_path)
documents = []
for data, meta in data_collection:
    doc = Document(page_content=data, metadata={"source":meta})
    documents.append(doc)
embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en-v1.5")
knowledge_base = Chroma.from_documents(documents=documents, embedding=embeddings, persist_directory='./output')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
pipe = HuggingFacePipeline(pipeline=pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10))
retriever = VectorStoreRetriever(vectorstore=knowledge_base)
retrievalQA = RetrievalQA.from_llm(llm=pipe, retriever=retriever)

Retrievers play a crucial role for RAG. They are responsible for implementing the basic retrieval configuration and accessing the vectorstore using the specified retrieval method and settings. Currently, we offer two types of retrievers: `VectorStoreRetriever` and `ChildParentRetriever`. Below take `ChildParentRetriever` as an example.

We've specifically designed `ChildParentRetriever` to address challenges in long-context retrieval scenarios. Commonly, in many applications, the documents being retrieved are lengthier than the user's query. This discrepancy leads to an imbalance in context information between the query and the documents, often resulting in reduced retrieval accuracy. The reason is that the documents typically contain a richer semantic expression compared to the brief user query.

An ideal solution would be to segment the user-uploaded documents for the RAG knowledgebase into suitably sized chunks. However, this approach is not always feasible due to the lack of consistent guidelines for automatically and accurately dividing the context. Too short a division can result in partial, contextually incomplete answers to user queries. Conversely, excessively long segments can significantly lower retrieval accuracy.

To navigate this challenge, we've developed a unique solution involving the `ChildParentRetriever` to optimize the RAG process. Our strategy involves initially splitting the user-uploaded files into larger chunks, termed 'parent chunks', to preserve the integrity of each concept. Then, these parent chunks are further divided into smaller 'child chunks'. Both child and parent chunks are interconnected using a unique identification ID. This approach enhances the likelihood and precision of matching the user query with a relevant, concise context chunk. When a highly relevant child chunk is identified, we use the ID to trace back to its parent chunk. The context from this parent chunk is then utilized in the RAG process, thereby improving the overall effectiveness and accuracy of retrieval.

In [None]:
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from intel_extension_for_transformers.neural_chat.pipeline.plugins.retrieval.retrieval_agent import document_transfer, document_append_id
from intel_extension_for_transformers.neural_chat.pipeline.plugins.retrieval.parser.parser import DocumentParser
from intel_extension_for_transformers.langchain.retrievers import ChildParentRetriever
 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512)
document_parser = DocumentParser()
input_path="https://lilianweng.github.io/posts/2023-06-23-agent/"
data_collection=document_parser.load(input=input_path)
langchain_documents = document_transfer(data_collection)
child_documents = text_splitter.split_documents(langchain_documents)
langchain_documents = document_append_id(langchain_documents)
embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en-v1.5")
knowledge_base = Chroma.from_documents(documents=langchain_documents, embedding=embeddings, persist_directory='./parent')
child_knowledge_base = Chroma.from_documents(documents=child_documents, embedding=embeddings, persist_directory='./child')
retriever = ChildParentRetriever(vectorstore=knowledge_base, parentstore=child_knowledge_base)
docs=retriever.get_relevant_documents("Self-Reflection")