## Combining BM25 and Semantic Search

#### In this notebook we will compare the context relevance between normal retriever (Semantic search) and Hybrid Search (Which is the combination of Keyword based search and semantic search) . We will use Uptrain for finding relevance score .

In [61]:
from langchain.embeddings import  OpenAIEmbeddings  
from langchain.vectorstores import FAISS 
from langchain.retrievers import BM25Retriever ,EnsembleRetriever
from langchain.chat_models import ChatOpenAI 
from langchain.text_splitter import RecursiveCharacterTextSplitter  
from langchain.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub

In [2]:
import  os
os.environ['OPENAI_API_KEY'] = ""

In [44]:
# Load data 
url = "https://medium.com/@singhjoban495/linear-regression-in-details-c0e27c4f24aa"
loader  = WebBaseLoader(url)
docs = loader.load()

In [45]:
# Split data into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000 ,chunk_overlap = 100)
texts = text_splitter.split_documents(docs)

In [46]:
len(texts)

6

In [47]:
# Create vectorstore using FAISS
embeddings = OpenAIEmbeddings() 
faiss_vectorstore = FAISS.from_documents(texts , embeddings)

In [55]:
# create sparse embedding , create a comibation of keyword based and semantcic search based retriever using EnsembleRetriever     Best Match25
bm25 =  BM25Retriever.from_documents(texts)
bm25.k=2
faiss_retriever2 = faiss_vectorstore.as_retriever(search_kwargs={'k':3})
ensemble_retriever = EnsembleRetriever(retrievers=[bm25 , faiss_retriever2] , weights = [0.5,0.5])

#####  Retrieving Hybrid based retriever /EnsembleRetriever

In [56]:
query = "What is Linear regression ?"

In [57]:
%%time
docs = ensemble_retriever.invoke(query)

CPU times: user 27.6 ms, sys: 4.58 ms, total: 32.1 ms
Wall time: 523 ms


In [58]:
len(docs)

3

In [59]:
docs

[Document(page_content='between the independent variables (x-axis) and dependent variable (y-axis).Types of Linear RegressionSimple Linear Regression: If there is only a single independent variable. Ex: In our dataset, cgpa (independent variable) and we have to predict package(dependent variable).Multiple Linear Regression: If there is more one independent variable is present in the dataset, then we can apply multiple linear regression. eg: cgpa,studytime are two independent variables or even more can exists.Regression is defined as the line or curve that passes through all the data points on the target-predictor graph with the shortest vertical distance between the data points and regression line.Completely linear dataThis is completely linear data, but in real world data,it si difficult to find completely linear data,we have a sort of linear data . You can see the image below.Y=mx+b is the equation of the line.Y-Dependent variablem=slopex=independent variable(data point)b=interceptMa

In [62]:
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": ensemble_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

  warn_deprecated(


In [63]:
rag_chain.invoke(query)

'Linear regression is a supervised machine learning algorithm used for predicting continuous values by showing the linear relationship between independent and dependent variables. There are two types: simple linear regression with one independent variable and multiple linear regression with more than one independent variable. The main goal of linear regression is to find the best-fit line that minimizes the distance between predicted and actual values.'